使用dplyr和stringr替换所有值以

问题描述 投票:4回答:4

我的朋友

> df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape", "bread", "meat"), sold = rnorm(5, 100))
>   df
          food      sold
1 fruit banana  99.47171
2  fruit apple  99.40878
3  fruit grape  99.28727
4        bread  99.15934
5         meat 100.53438

现在我想要替换以“水果”开头的食物中的所有值,然后按食物分组并总结出售的总和。

> df %>%
+     mutate(food = replace(food, str_detect(food, "fruit"), "fruit")) %>% 
+     group_by(food) %>% 
+     summarise(sold = sum(sold))
Source: local data frame [3 x 2]

    food      sold
  (fctr)     (dbl)
1  bread  99.15934
2   meat 100.53438
3     NA 298.16776

为什么这个命令不起作用?它给了我NA而不是水果?

r dplyr stringr
4个回答
8
投票

这对我有用,我认为你的数据是因素:

在制作如下数据时使用stringsAsFactors=FALSE,或者您可以在R环境中运行options(stringsAsFactors=FALSE)以避免相同的情况:

df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape", "bread", "meat"), sold = rnorm(5, 100),stringsAsFactors = FALSE)

df %>%
mutate(food = replace(food, str_detect(food, "fruit"), "fruit")) %>% 
group_by(food) %>% 
summarise(sold = sum(sold))

输出:

 # A tibble: 3 × 2
       food      sold
      <chr>     <dbl>
    1 bread  99.67661
    2 fruit 300.28520
    3  meat  99.88566

3
投票

我们可以使用base R而不转换为character类,将levels与'fruit'分配给'fruit'并使用aggregate获取sum

levels(df$food)[grepl("fruit", levels(df$food))] <- "fruit"
aggregate(sold~food, df, sum)
#   food      sold
#1 bread  99.41637
#2 fruit 300.41033
#3  meat 100.84746

data

set.seed(24)
df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape", 
                 "bread", "meat"), sold = rnorm(5, 100))

0
投票

replace不能按预期工作,因为food列是一个因子变量,而fruit是未知的水平。

一种可能的解决方案是使用正确的因子级别定义数据帧列food

df <- data.frame(food = 
  factor(c("fruit banana", "fruit apple", "fruit grape", "bread", "meat"), 
    levels =c("fruit banana", "fruit apple", "fruit grape", "bread", "meat", "fruit") ), 
    sold = rnorm(5, 100))

更容易设置stringsAsFactors = FALSE

df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape", "bread", "meat"),
             sold = rnorm(5, 100), 
             stringsAsFactors = FALSE)

0
投票

虽然Q用dplyrstringr标记,但我想提出一个使用data.table的替代解决方案,因为data.table以方便和直接的方式处理因素:

library(data.table)
setDT(df)[food %like% "^fruit", food := "fruit"][, .(sold = sum(sold)), by = food]
#    food      sold
#1: fruit 300.41033
#2: bread  99.41637
#3:  meat 100.84746

Data

set.seed(24)
df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape", "bread", "meat"), 
                 sold = rnorm(5, 100))
© www.soinside.com 2019 - 2024. All rights reserved.