我在R中有一个数据框,它有很多重复的记录。我有兴趣了解每个数据框中有多少条记录。
例如,我有这个数据框:
Fake Name Fake ID Fake Status Fake Program
June 0003 Green PR1
June 0003 Green PR1
Television 202 Blue PR3
Television 202 Green PR3
Television 202 Green PR3
CRT 12 Red PR0
从上面我想得到类似下面的东西:
Fake Name Fake ID Fake Status Fake Program COUNT
June 0003 Green PR1 2
Television 202 Blue PR3 1
Television 202 Green PR3 2
CRT 12 Red PR0 1
任何帮助,将不胜感激。谢谢。
使用group_by_all
然后用n
计算行数:
df %>% group_by_all() %>% summarise(COUNT = n())
# A tibble: 4 x 5
# Groups: Fake.Name, Fake.ID, Fake.Status [?]
# Fake.Name Fake.ID Fake.Status Fake.Program COUNT
# <fct> <int> <fct> <fct> <int>
#1 CRT 12 Red PR0 1
#2 June 3 Green PR1 2
#3 Television 202 Blue PR3 1
#4 Television 202 Green PR3 2
甚至可以从@ Ryan的评论中得到更好的评价:
df %>% group_by_all %>% count
以下使用duplicated
获取结果data.frame,然后使用rle
获取计数。
res <- dat[!duplicated(dat), ]
d <- duplicated(dat) | duplicated(dat, fromLast = TRUE)
res$COUNT <- rle(d)$lengths
res
# Fake Name Fake ID Fake Status Fake Program COUNT
#1 June 0003 Green PR1 2
#3 Television 202 Blue PR3 1
#4 Television 202 Green PR3 2
#6 CRT 12 Red PR0 1
对于这个问题
如何计算数据框中的唯一行?
然后使用sum
和duplicated
。例如。,
df <- data.frame(
`Fake Name` = c(
"June", "June", "Television", "Television", "Television", "CRT"),
`Fake ID` = c("0003", "0003", "202", "202", "202", "12"),
`Fake Status` = c("Green", "Green", "Blue", "Green", "Green", "Red"),
`Fake Program` = c("PR1", "PR1", "PR3", "PR3", "PR3", "PR0"),
check.names = FALSE)
df
#R Fake Name Fake ID Fake Status Fake Program
#R 1 June 0003 Green PR1
#R 2 June 0003 Green PR1
#R 3 Television 202 Blue PR3
#R 4 Television 202 Green PR3
#R 5 Television 202 Green PR3
#R 6 CRT 12 Red PR0
sum(!duplicated(df))
#R [1] 4
对于您要求的表格,您可以按如下方式使用data.table
library(data.table)
df <- data.table(df)
df[, .(COUNT = .N), by = names(df)]
#R Fake Name Fake ID Fake Status Fake Program COUNT
#R 1: June 0003 Green PR1 2
#R 2: Television 202 Blue PR3 1
#R 3: Television 202 Green PR3 2
#R 4: CRT 12 Red PR0 1