统计字符类别列中值的总数和百分比

Question

在大型数据集中，我有一个列（处理），其中包含数字，有时不止一个，但被归类为字符。

treatment <- c("1", "1", "2", "5", "1,2", "2,5", "1,2,5", "3") 
df <- data.frame(treatment)

每个数字代表一种治疗。我想统计每次治疗的总数和百分比。

所需输出：

treatment   number     percent
   1           4          50
   2           4          50
   3           1          12,5
   5           3          37,5

我尝试过总结并使用

group_by(treatment) %>% summarise(percent = 100 *n() / nrow(df)

但我在处理具有多个数字的行时遇到了麻烦，而且它是类字符。有什么建议吗？

Answer 1

您可以使用

tidy::separate_longer_delim

通过

","

分隔组合值 - 即：

library(dplyr)
library(tidyr)

separate_longer_delim(df, treatment, ",")
#   treatment
#1          1
#2          1
#3          2
#4          5
#5          1
#6          2
#7          2
#8          5
#9          1
#10         2
#11         5
#12         3

然后只需使用

dplyr::count()

来计数并使用

dplyr::mutate

来获取百分比。一起（并感谢@DarrenTsai 改进的方法）：

separate_longer_delim(df, treatment, ",") %>%
  count(treatment) %>% 
  mutate(percent = n / sum(n))

输出

 treatment n    percent
1         1 4 0.33333333
2         2 4 0.33333333
3         3 1 0.08333333
4         5 3 0.25000000

统计字符类别列中值的总数和百分比

问题描述投票：0回答：1

1个回答

最新问题

统计字符类别列中值的总数和百分比

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1