我对此有类似的问题:(Sum the duplicate rows of particular columns in dataframe),但该解决方案对我不起作用,或者我不知道如何修改它。
即使参考号和NODCCODE在参考号内不相邻,如果参考号和NODCCODE都匹配,我需要将“数字”列值加在一起。
我有这个:
structure(list(Reference = c("BBM101", "BBM102",
"BBM102", "BBM102", "BBM103", "BBM103",
"BBM104", "BBM105", "BBM105", "BBM105"),
NODCCODE = c("101","301", "201", "201", "201", "401", "401", "201", "102", "201"),
Number = c(2, 1, 3, 1, 3, 14, 3, 24, 2, 1)),
row.names = c(NA, 10L), class = "data.frame")
Reference NODCCODE Number
1 BBM101 101 2
2 BBM102 301 1
3 BBM102 201 3
4 BBM102 201 1
5 BBM103 201 3
6 BBM103 401 14
7 BBM104 401 3
8 BBM105 201 24
9 BBM105 102 2
10 BBM105 201 1
我想要这个:
structure(list(Reference = c("BBM101", "BBM102", "BBM102", "BBM103", "BBM103", "BBM104", "BBM105", "BBM105"),
NODCCODE = c("101","301", "201", "201", "401", "401", "201", "102"),
Number = c(2, 1, 4, 3, 14, 3, 25, 2)),
row.names = c(NA, 8L), class = "data.frame")
Reference NODCCODE Number
1 BBM101 101 2
2 BBM102 301 1
3 BBM102 201 4
4 BBM103 201 3
5 BBM103 401 14
6 BBM104 401 3
7 BBM105 201 25
8 BBM105 102 2
注意,第3行和第4行Reference和NODCCODE已合并,并添加了Number列。即使在201个值之间有102个值,并且第8行和第10行也都具有相同的参考号,所以将它们相加。我不在乎其余行是在那组参考号的开头还是结尾。
我相信tidyverse
这样简单吗?只有一个匹配的NODCCODE的Reference的总和将是唯一值,具有相同reference和NODCCODE的条目将被求和]
library(tidyverse) struct <- structure(list(Reference = c("BBM101", "BBM102", "BBM102", "BBM102", "BBM103", "BBM103", "BBM104", "BBM105", "BBM105", "BBM105"), NODCCODE = c("101","301", "201", "201", "201", "401", "401", "201", "102", "201"), Number = c(2, 1, 3, 1, 3, 14, 3, 24, 2, 1)), row.names = c(NA, 10L), class = "data.frame") result <- struct %>% group_by(Reference,NODCCODE) %>% summarise(Number = sum(Number)) %>% arrange(Reference) %>% ungroup() result #> # A tibble: 8 x 3 #> Reference NODCCODE Number #> <chr> <chr> <dbl> #> 1 BBM101 101 2 #> 2 BBM102 201 4 #> 3 BBM102 301 1 #> 4 BBM103 201 3 #> 5 BBM103 401 14 #> 6 BBM104 401 3 #> 7 BBM105 102 2 #> 8 BBM105 201 25
由reprex package(v0.3.0)在2020-04-24创建
如果加载data.table包,则将data.frame转换为data.table(使用setDT
,则可以执行此操作]