使用基于两个或多个因子变量的子集数据框

问题描述 投票:1回答:1

这是StackOverflow问题-Subset Data Based On Elements In List的扩展,它回答了如何创建新df列表的问题,每个dfs都是通过基于分组因子变量对原始的dfs进行子集构造的。

我遇到的挑战是我需要使用多个分组变量来创建dfs

[为了概括这个问题,我创建了这个玩具数据集-该数据集具有每天下雨的量作为响应变量,并分类了当天的温度范围和阴天。

rain <- c(2, 0, 4, 25, 3, 9, 4, 0, 4, 0, 8, 35)
temp <- as.factor(c("Warm","Cold","Hot","Cold","Warm","Cold","Cold","Warm","Warm","Hot","Cold", "Cold"))
clouds <- as.factor(c("Some","Lots","None","Lots","None","None","Lots","Some","Some","Lots","None", "Some"))
df <- data.frame(rain, temp, clouds)

使用以下代码,我可以在temp变量上生成三个新的数据帧,将它们组合成一个列表(df_1A):

temp_levels <- unique(as.character(df$temp))
df_1A <- lapply(temp_levels, function(x){subset(df, temp == x)})

和同为三个新的数据帧的阴天分组

cloud_levels <- unique(as.character(df$clouds))
df_1B <- lapply(cloud_levels, function(x){subset(df, clouds == x)})

但是,我一直无法想出一种简单,优雅的方法来生成9个数据帧,每个数据帧都具有临时性和混浊性的独特组合

谢谢

r dataframe subset apply
1个回答
0
投票

您的问题暗含对lapply的偏爱,但如果您不介意使用dplyr,则有一个很好的解决方案。


library(dplyr)

df_list <- 
   df %>% 
   group_by(temp, clouds) %>% 
   group_split()

# df_list

df_list[[1]]
#> # A tibble: 3 x 3
#>    rain temp  clouds
#>   <dbl> <fct> <fct> 
#> 1     0 Cold  Lots  
#> 2    25 Cold  Lots  
#> 3     4 Cold  Lots

您的数据

rain <- c(2, 0, 4, 25, 3, 9, 4, 0, 4, 0, 8, 35)
temp <- as.factor(c("Warm","Cold","Hot","Cold","Warm","Cold","Cold","Warm","Warm","Hot","Cold", "Cold"))
clouds <- as.factor(c("Some","Lots","None","Lots","None","None","Lots","Some","Some","Lots","None", "Some"))
df <- data.frame(rain, temp, clouds)

© www.soinside.com 2019 - 2024. All rights reserved.