如何计算整个变量字符串中每个字符的百分比?

问题描述 投票:0回答:2

这是虚拟数据集:

library(tidyverse)


a <- c("CC", "CCAA", "ABB")

id <- c("a", "b", "c")

data <- data.frame(id, a)
head(data)
#  id    a
#1  a   CC
#2  b CCAA
#3  c  ABB

我们可以计算行中出现的每个字符串的百分比

library(data.table)
data1 <- setDT(data)[, .N, .(a)][, perc := N/sum(N), .()][]
head(data1)
#       a N      perc
# 1:   CC 1 0.3333333
# 2: CCAA 1 0.3333333
# 3:  ABB 1 0.3333333

但是,如何计算整个“a”变量字符串中每个字符的百分比?

[Expected output]

#              a  N   perc
# 1:           A  3   0.33
# 2:           B  2   0.22
# 3:           C  4   0.44 

R base 和 tidyverse 方法更优选。

r tidyverse
2个回答
1
投票

tidyverse
选项-

library(tidyverse)

data %>%
  mutate(a = str_split(a, "")) %>%
  unnest_longer(a) %>%
  count(a, name = "N") %>%
  mutate(perc = prop.table(N))

# A tibble: 3 × 3
#   a         N  perc
#  <chr> <int> <dbl>
#1 A         3 0.333
#2 B         2 0.222
#3 C         4 0.444

基础 R -

a1 <- strsplit(data$a, "") |> unlist()
a2 <- table(a1)
a3 <- prop.table(a2)

data.frame(a = names(a2), 
           N = as.integer(a2), 
           perc = as.numeric(a3))

#  a N      perc
#1 A 3 0.3333333
#2 B 2 0.2222222
#3 C 4 0.4444444

0
投票

使用

strsplit
unlist
table
结果,最后添加
proportions

> strsplit(data$a, '') |> unlist() |> table() |> as.data.frame() |> transform(prop=proportions(Freq))
  Var1 Freq      prop
1    A    3 0.3333333
2    B    2 0.2222222
3    C    4 0.4444444

数据:

> dput(data)
structure(list(id = c("a", "b", "c"), a = c("CC", "CCAA", "ABB"
)), class = "data.frame", row.names = c(NA, -3L))
© www.soinside.com 2019 - 2024. All rights reserved.