这是我的输入数据帧:
df <- data.frame(Col1=c("A", "B", "C", "B", "C", "A", "A", "C"),Col2=c("Blue", "Red", "Blue", "Blue", "Blue", "Red", "Red", "Blue"),Col3=c("Young", "Old", "Old", "Young", "Young", "Young", "Old", "Old"))
df
Col1 Col2 Col3
1 A Blue Young
2 B Red Old
3 C Blue Old
4 B Blue Young
5 C Blue Young
6 A Red Young
7 A Red Old
8 C Blue Old
我想获得如下的列联表:
Blue Red Young Old
A 1 2 2 1
B 1 1 1 1
C 3 0 1 2
我几乎使用以下命令,但Col2和Col3组合在一起:
as.data.frame(table(df)) %>% dcast(Col1 ~ Col2 + Col3, value.var="Freq")
Col1 Blue_Old Blue_Young Red_Old Red_Young
1 A 0 1 1 1
2 B 0 1 1 0
3 C 2 1 0 0
可以使用任意数量的列的基本R选项可以是,
do.call(cbind, lapply(df[-1], function(i) table(df$Col1, i)))
# Blue Red Old Young
#A 1 2 1 2
#B 1 1 1 1
#C 3 0 2 1
使用table
:
cbind(table(df$Col1,df$Col2),table(df$Col1,df$Col3))
# Blue Red Old Young
#A 1 2 1 2
#B 1 1 1 1
#C 3 0 2 1
一种选择是gather
'Col2','Col3'为长格式,得到'Col1'的count
和'val'列,然后spread
它回到'宽'格式
library(tidyverse)
df %>%
gather(key, val, Col2:Col3) %>%
count(Col1, val) %>%
spread(val, n, fill = 0)
# A tibble: 3 x 5
# Col1 Blue Old Red Young
# <fct> <dbl> <dbl> <dbl> <dbl>
#1 A 1 1 2 2
#2 B 1 1 1 1
#3 C 3 2 0 1
由于OP使用dcast
,紧凑的选择是
library(data.table)
dcast(melt(setDT(df), id.var = 'Col1'), Col1~ value)
# Col1 Blue Old Red Young
#1: A 1 1 2 2
#2: B 1 1 1 1
#3: C 3 2 0 1