请考虑以下数据框:
set.seed(123)
dat <- data.frame(Region = rep(c("a","b"), each=100),
State =rep(c("NY","MA","FL","GA"), each = 50),
Loc = rep(letters[1:20], each = 5),
ID = 1:200,
count1 = sample(4, 200, replace=T),
count2 = sample(4, 200, replace=T))
Region
,State
和Loc
是用于各个测量的分组变量,每个变量都有唯一的ID
编号。对于每个分组变量,我想总结一下count1
和count2
各个级别的观察次数。通常我将对每对执行以下操作:
#example for count1 and region:
library(tidyverse)
dat%>%
dplyr::select(Region,count1)%>%
group_by(count1,Region)%>%
count()
##or
with(dat, table(Region, count1))
我如何对所有组合执行此操作并将它们包装到一个表(或至少几个按等效长度分组的表,因为它们将根据所使用的分组变量而有所不同)
尝试这样的事情:
Region1 <- dat %>% group_by(Region, count1) %>%
summarise(TotalRegion1 = n())
State1 <- dat %>% group_by(State, count1) %>%
summarise(TotalState1 = n())
Loc1 <- dat %>% group_by(Loc, count1) %>%
summarise(TotalLoc1 = n())
您可以尝试通过[]一次获得“全部”(对于count1
)
out <- dat %>%
select(-ID, -count2) %>%
pivot_longer(Region:Loc, names_to = "k", values_to = "v") %>%
group_by(k, v, count1) %>%
tally() %>%
ungroup()
out %>%
filter(k == "Region")
# # A tibble: 8 x 4
# k v count1 n
# <chr> <fct> <int> <int>
# 1 Region a 1 26
# 2 Region a 2 27
# 3 Region a 3 20
# 4 Region a 4 27
# 5 Region b 1 20
# 6 Region b 2 30
# 7 Region b 3 30
# 8 Region b 4 20
out
# # A tibble: 101 x 4
# k v count1 n
# <chr> <fct> <int> <int>
# 1 Loc a 2 5
# 2 Loc a 3 1
# 3 Loc a 4 4
# 4 Loc b 1 2
# 5 Loc b 2 2
# 6 Loc b 3 3
# 7 Loc b 4 3
# 8 Loc c 1 2
# 9 Loc c 2 2
# 10 Loc c 3 3
# # ... with 91 more rows