来自数据框的列联表,将第一列作为参考

问题描述 投票:2回答:3

这是我的输入数据帧:

df <- data.frame(Col1=c("A", "B", "C", "B", "C", "A", "A", "C"),Col2=c("Blue", "Red", "Blue", "Blue", "Blue", "Red", "Red", "Blue"),Col3=c("Young", "Old", "Old", "Young", "Young", "Young", "Old", "Old"))

df
Col1 Col2  Col3
1    A Blue Young
2    B  Red   Old
3    C Blue   Old
4    B Blue Young
5    C Blue Young
6    A  Red Young
7    A  Red   Old
8    C Blue   Old

我想获得如下的列联表:

   Blue   Red   Young   Old
A     1     2       2     1
B     1     1       1     1
C     3     0       1     2

我几乎使用以下命令,但Col2和Col3组合在一起:

as.data.frame(table(df)) %>% dcast(Col1 ~ Col2 + Col3, value.var="Freq")
  Col1 Blue_Old Blue_Young Red_Old Red_Young
1    A        0          1       1         1
2    B        0          1       1         0
3    C        2          1       0         0
r
3个回答
2
投票

可以使用任意数量的列的基本R选项可以是,

do.call(cbind, lapply(df[-1], function(i) table(df$Col1, i)))
#  Blue Red Old Young
#A    1   2   1     2
#B    1   1   1     1
#C    3   0   2     1

2
投票

使用table

cbind(table(df$Col1,df$Col2),table(df$Col1,df$Col3))

#   Blue Red Old Young
#A    1   2   1     2
#B    1   1   1     1
#C    3   0   2     1

2
投票

一种选择是gather'Col2','Col3'为长格式,得到'Col1'的count和'val'列,然后spread它回到'宽'格式

library(tidyverse)
df %>% 
  gather(key, val, Col2:Col3) %>% 
  count(Col1, val) %>% 
  spread(val, n, fill = 0)
# A tibble: 3 x 5
#  Col1   Blue   Old   Red Young
#  <fct> <dbl> <dbl> <dbl> <dbl>
#1 A         1     1     2     2
#2 B         1     1     1     1
#3 C         3     2     0     1

由于OP使用dcast,紧凑的选择是

library(data.table)
dcast(melt(setDT(df), id.var = 'Col1'), Col1~ value)
#   Col1 Blue Old Red Young
#1:    A    1   1   2     2
#2:    B    1   1   1     1
#3:    C    3   2   0     1
© www.soinside.com 2019 - 2024. All rights reserved.