我有一个原始数据集。而且,我正在尝试提供所需的输出。
原始数据集如下:
gender type neg_sentiment neu_sentiment pos_sentiment
1 M rep 7871 3454 7290
2 F rep 841 469 548
3 M rep 23 12 26
4 M rep 211 73 63
5 M rep 2587 868 1251
6 M rep 1273 606 594
7 M rep 374 150 260
8 M rep 30 23 138
9 M rep 95 30 23
10 M rep 22 22 121
使用这个,我想要的输出(带有示例求和值)如下:
gender neg_sentiment neu_sentiment pos_sentiment
M 10000 5000 3000
F 2000 500 7000
我所做的是:
df %>% group_by(gender) %>% summarise_all(sum)
df %>% group_by(type) %>% summarise_all(sum)
但是没有用。
您能帮我做出想要的输出吗?
Dput在下面:
structure(list(gender = structure(c(2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("F", "M"), class = "factor"), type = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("rep", "sen"), class = "factor"), neg_sentiment = c(7871L, 841L, 23L, 211L, 2587L, 1273L, 374L, 30L, 95L, 22L), neu_sentiment = c(3454L, 469L, 12L, 73L, 868L, 606L, 150L, 23L, 30L, 22L), pos_sentiment = c(7290L,
548L, 26L, 63L, 1251L, 594L, 260L, 138L, 23L, 121L)), row.names = c(NA, 10L), class = "data.frame")
我们可以用summarise_if
选择数字列
library(dplyr)
df1 %>%
group_by(gender) %>%
summarise_if(is.numeric, sum)
#or with summarise_at
#summarise_at(vars(ends_with('sentiment')), sum)