如何从允许受试者选择多个答案的问题中获得个体种族变量的百分比细分?

问题描述 投票:0回答:1

我运行此代码来获取样本中的种族细分:

dataset %>% 

group_by(ethnicity) %>%
  summarise(percent = 100 * n()/nrow(datset))

但是,由于受试者可以在问卷中选择多个种族类别,因此结果如下:

 1 "[\"Aboriginal or Torres Strait Islander\",\"Caucasian\",\"Asian (inc. Indian subcontinent)\"]"  0.364 
 2 "[\"Aboriginal or Torres Strait Islander\",\"Caucasian\"]"                                       0.0910
 3 "[\"Aboriginal or Torres Strait Islander\"]"                                                     0.910 
 4 "[\"African\"]"                                                                                  0.637 
 5 "[\"Asian (inc. Indian subcontinent)\"]"                                                                     0.0910
 9 "[\"Caucasian\",\"Latino/Hispanic\"]"                                                            0.182 
10 "[\"Caucasian\",\"Middle Eastern\"]"                                                             0.273 
11 "[\"Caucasian\",\"Not listed\"]"                                                                 0.182 

等等

获取各个(非组合)类别细分的最佳/最有效方法是什么?

我基本上只想要以下的百分比细分:

Caucausian  - 
African -
Latino/Hispanic -
Aboriginal or Torres Strait Islander -
Middle Eastern -

等等

r dplyr data-cleaning
1个回答
0
投票
library(tidyverse)

mydf <- data.frame(id = c(1:5, 9:11),
  eth = c("[\"Aboriginal or Torres Strait Islander\",\"Caucasian\",\"Asian (inc. Indian subcontinent)\"]",
"[\"Aboriginal or Torres Strait Islander\",\"Caucasian\"]"                                      ,
"[\"Aboriginal or Torres Strait Islander\"]"                                                    ,
"[\"African\"]"                                                                                 ,
"[\"Asian (inc. Indian subcontinent)\"]"                                                        ,
"[\"Caucasian\",\"Latino/Hispanic\"]"                                                           ,
 "[\"Caucasian\",\"Middle Eastern\"]"                                                           ,
 "[\"Caucasian\",\"Not listed\"]"                   )
)

mydf |> 
  separate_longer_delim(eth, ",") |> 
  mutate(eth = str_remove_all(eth, "\\[|\\]")) |> 
  count(eth) |> 
  mutate(pct = n / nrow(mydf)) |> 
  arrange(desc(pct))

                                     eth n   pct
1                            "Caucasian" 5 0.625
2 "Aboriginal or Torres Strait Islander" 3 0.375
3     "Asian (inc. Indian subcontinent)" 2 0.250
4                              "African" 1 0.125
5                      "Latino/Hispanic" 1 0.125
6                       "Middle Eastern" 1 0.125
7                           "Not listed" 1 0.125
© www.soinside.com 2019 - 2024. All rights reserved.