这是StackOverflow问题-Subset Data Based On Elements In List的扩展,它回答了如何创建新df列表的问题,每个dfs都是通过基于分组因子变量对原始的dfs进行子集构造的。
我遇到的挑战是我需要使用多个分组变量来创建dfs
[为了概括这个问题,我创建了这个玩具数据集-该数据集具有每天下雨的量作为响应变量,并分类了当天的温度范围和阴天。
rain <- c(2, 0, 4, 25, 3, 9, 4, 0, 4, 0, 8, 35)
temp <- as.factor(c("Warm","Cold","Hot","Cold","Warm","Cold","Cold","Warm","Warm","Hot","Cold", "Cold"))
clouds <- as.factor(c("Some","Lots","None","Lots","None","None","Lots","Some","Some","Lots","None", "Some"))
df <- data.frame(rain, temp, clouds)
使用以下代码,我可以在temp变量上生成三个新的数据帧,将它们组合成一个列表(df_1A):
temp_levels <- unique(as.character(df$temp))
df_1A <- lapply(temp_levels, function(x){subset(df, temp == x)})
和同为三个新的数据帧的阴天分组
cloud_levels <- unique(as.character(df$clouds))
df_1B <- lapply(cloud_levels, function(x){subset(df, clouds == x)})
但是,我一直无法想出一种简单,优雅的方法来生成9个数据帧,每个数据帧都具有临时性和混浊性的独特组合
谢谢
您的问题暗含对lapply
的偏爱,但如果您不介意使用dplyr
,则有一个很好的解决方案。
library(dplyr)
df_list <-
df %>%
group_by(temp, clouds) %>%
group_split()
# df_list
df_list[[1]]
#> # A tibble: 3 x 3
#> rain temp clouds
#> <dbl> <fct> <fct>
#> 1 0 Cold Lots
#> 2 25 Cold Lots
#> 3 4 Cold Lots
您的数据
rain <- c(2, 0, 4, 25, 3, 9, 4, 0, 4, 0, 8, 35)
temp <- as.factor(c("Warm","Cold","Hot","Cold","Warm","Cold","Cold","Warm","Warm","Hot","Cold", "Cold"))
clouds <- as.factor(c("Some","Lots","None","Lots","None","None","Lots","Some","Some","Lots","None", "Some"))
df <- data.frame(rain, temp, clouds)