我的朋友
> df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape", "bread", "meat"), sold = rnorm(5, 100))
> df
food sold
1 fruit banana 99.47171
2 fruit apple 99.40878
3 fruit grape 99.28727
4 bread 99.15934
5 meat 100.53438
现在我想要替换以“水果”开头的食物中的所有值,然后按食物分组并总结出售的总和。
> df %>%
+ mutate(food = replace(food, str_detect(food, "fruit"), "fruit")) %>%
+ group_by(food) %>%
+ summarise(sold = sum(sold))
Source: local data frame [3 x 2]
food sold
(fctr) (dbl)
1 bread 99.15934
2 meat 100.53438
3 NA 298.16776
为什么这个命令不起作用?它给了我NA而不是水果?
这对我有用,我认为你的数据是因素:
在制作如下数据时使用stringsAsFactors=FALSE
,或者您可以在R环境中运行options(stringsAsFactors=FALSE)
以避免相同的情况:
df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape", "bread", "meat"), sold = rnorm(5, 100),stringsAsFactors = FALSE)
df %>%
mutate(food = replace(food, str_detect(food, "fruit"), "fruit")) %>%
group_by(food) %>%
summarise(sold = sum(sold))
输出:
# A tibble: 3 × 2
food sold
<chr> <dbl>
1 bread 99.67661
2 fruit 300.28520
3 meat 99.88566
我们可以使用base R
而不转换为character
类,将levels
与'fruit'分配给'fruit'并使用aggregate
获取sum
levels(df$food)[grepl("fruit", levels(df$food))] <- "fruit"
aggregate(sold~food, df, sum)
# food sold
#1 bread 99.41637
#2 fruit 300.41033
#3 meat 100.84746
set.seed(24)
df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape",
"bread", "meat"), sold = rnorm(5, 100))
replace
不能按预期工作,因为food
列是一个因子变量,而fruit
是未知的水平。
一种可能的解决方案是使用正确的因子级别定义数据帧列food
df <- data.frame(food =
factor(c("fruit banana", "fruit apple", "fruit grape", "bread", "meat"),
levels =c("fruit banana", "fruit apple", "fruit grape", "bread", "meat", "fruit") ),
sold = rnorm(5, 100))
更容易设置stringsAsFactors = FALSE
df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape", "bread", "meat"),
sold = rnorm(5, 100),
stringsAsFactors = FALSE)
虽然Q用dplyr
和stringr
标记,但我想提出一个使用data.table
的替代解决方案,因为data.table
以方便和直接的方式处理因素:
library(data.table)
setDT(df)[food %like% "^fruit", food := "fruit"][, .(sold = sum(sold)), by = food]
# food sold
#1: fruit 300.41033
#2: bread 99.41637
#3: meat 100.84746
set.seed(24)
df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape", "bread", "meat"),
sold = rnorm(5, 100))