如何计算数据框中的唯一行?

问题描述 投票:4回答:3

我在R中有一个数据框,它有很多重复的记录。我有兴趣了解每个数据框中有多少条记录。

例如,我有这个数据框:

Fake Name       Fake ID    Fake Status   Fake Program
June             0003         Green        PR1
June             0003         Green        PR1
Television       202          Blue         PR3
Television       202          Green        PR3    
Television       202          Green        PR3
CRT              12           Red          PR0

从上面我想得到类似下面的东西:

Fake Name       Fake ID    Fake Status   Fake Program     COUNT
June             0003         Green        PR1              2
Television       202          Blue         PR3              1
Television       202          Green        PR3              2
CRT              12           Red          PR0              1

任何帮助,将不胜感激。谢谢。

r dataframe dplyr aggregate ply
3个回答
10
投票

使用group_by_all然后用n计算行数:

df %>% group_by_all() %>% summarise(COUNT = n())

# A tibble: 4 x 5
# Groups:   Fake.Name, Fake.ID, Fake.Status [?]
#  Fake.Name  Fake.ID Fake.Status Fake.Program COUNT
#  <fct>        <int> <fct>       <fct>        <int>
#1 CRT             12 Red         PR0              1
#2 June             3 Green       PR1              2
#3 Television     202 Blue        PR3              1
#4 Television     202 Green       PR3              2

甚至可以从@ Ryan的评论中得到更好的评价:

df %>% group_by_all %>% count

3
投票

以下使用duplicated获取结果data.frame,然后使用rle获取计数。

res <- dat[!duplicated(dat), ]

d <- duplicated(dat) | duplicated(dat, fromLast = TRUE)
res$COUNT <- rle(d)$lengths

res
#   Fake Name Fake ID Fake Status Fake Program COUNT
#1       June    0003       Green          PR1     2
#3 Television     202        Blue          PR3     1
#4 Television     202       Green          PR3     2
#6        CRT      12         Red          PR0     1

2
投票

对于这个问题

如何计算数据框中的唯一行?

然后使用sumduplicated。例如。,

df <- data.frame(
  `Fake Name` = c(
    "June", "June", "Television", "Television", "Television", "CRT"),
  `Fake ID` = c("0003", "0003", "202", "202", "202", "12"),
  `Fake Status` = c("Green", "Green", "Blue", "Green", "Green", "Red"),
  `Fake Program` = c("PR1", "PR1", "PR3", "PR3", "PR3", "PR0"), 
  check.names = FALSE)
df
#R    Fake Name Fake ID Fake Status Fake Program
#R 1       June    0003       Green          PR1
#R 2       June    0003       Green          PR1
#R 3 Television     202        Blue          PR3
#R 4 Television     202       Green          PR3
#R 5 Television     202       Green          PR3
#R 6        CRT      12         Red          PR0
sum(!duplicated(df))
#R [1] 4

对于您要求的表格,您可以按如下方式使用data.table

library(data.table)
df <- data.table(df)
df[, .(COUNT = .N), by = names(df)]
#R     Fake Name Fake ID Fake Status Fake Program COUNT
#R 1:       June    0003       Green          PR1     2
#R 2: Television     202        Blue          PR3     1
#R 3: Television     202       Green          PR3     2
#R 4:        CRT      12         Red          PR0     1
© www.soinside.com 2019 - 2024. All rights reserved.