在R中创建顺序不重要的列的唯一组合的df

问题描述 投票:0回答:4

我想用三列的所有唯一组合创建一个df,其中值的顺序无关紧要。在我的示例中,我想创建一个清单,列出三个人可能拥有的所有意识形态组合。

在我的示例中,“无意见”,“中度”,“保守”与“保守”,“无意见”,“中度”与“中度”,“无意见”,“保守”相同,等等。所有这些组合都应该用一行来表示。

[我已经看到过类似的threads,有关将其用作主场和客队的distinct,但我认为这不适用于此问题。

library(tidyverse)

political_spectrum_values = 
  factor(c("Far left",
           "Liberal",
           "Moderate", 
           "Conservative",
           "Far right",
           "No opinion"), 
           ordered = T)


political_groups_of_3 <- 
crossing(first_person = political_spectrum_values, 
         second_person = political_spectrum_values,
         third_person = political_spectrum_values)

我已经考虑过通过在此行中添加管道来制作某种组合变量,但是我不确定如何从这里开始使用它

unite(col = "group_composition", c(first_person, second_person, third_person), sep = "_")

编辑:在解决了这个问题之后,我已经以一种可能更容易的方式重塑了数据

crossing(first_person = political_spectrum_values, 
         second_person = political_spectrum_values,
         third_person = political_spectrum_values) %>% 
  mutate(group_n = row_number()) %>% 
  pivot_longer(cols = c(first_person, second_person, third_person), 
               values_to = "ideology", 
               names_to = "group") %>% 
  select(-group)
r combinations tidyverse
4个回答
1
投票

这是您可以使用的技巧。不要以政治倾向的名称开头,而是以数字5 ^(0:5)开头。注意,任何长度为3的组合的和都是唯一的,因为3乘5 ^ x小于5 ^(x + 1)。因此,如果对三个这样的向量运行expand.grid(相当于crossing)并获取行总和,则唯一总和的位置将与crossing结果中名称的唯一组合的位置相同。

所以您可以只做一个内衬:

political_groups_of_3[!duplicated(rowSums(expand.grid(5^(0:5), 5^(0:5), 5^(0:5)))), ]

给出:

#> # A tibble: 56 x 3
#>    first_person second_person third_person
#>    <ord>        <ord>         <ord>       
#>  1 Conservative Conservative  Conservative
#>  2 Conservative Conservative  Far left    
#>  3 Conservative Conservative  Far right   
#>  4 Conservative Conservative  Liberal     
#>  5 Conservative Conservative  Moderate    
#>  6 Conservative Conservative  No opinion  
#>  7 Conservative Far left      Far left    
#>  8 Conservative Far left      Far right   
#>  9 Conservative Far left      Liberal     
#> 10 Conservative Far left      Moderate    
#> # ... with 46 more rows

这是“更优雅”还是只是一个不透明的hack,这当然是个好习惯……


0
投票

这里是使用更新和unite的组合的答案。我会再打开一点,以防万一有人有更优雅的解决方案

crossing(first_person = political_spectrum_values, 
         second_person = political_spectrum_values,
         third_person = political_spectrum_values) %>% 
  mutate(group_n = row_number()) %>% 
  pivot_longer(cols = c(first_person, second_person, third_person), 
               values_to = "ideology", 
               names_to = "group") %>% 
  select(-group) %>%
  group_by(group_n) %>% 
  arrange(ideology) %>% 
  mutate(person = row_number()) %>% 
  pivot_wider(id_cols = group_n, values_from = ideology, names_from = person) %>% 
  unite(col = "group_composition", c(`1`, `2`, `3`), sep = "_") %>% 
  ungroup() %>% 
  distinct(group_composition)

0
投票

这里是使用gtools::combinationspaste的两步式解决方案。

library(gtools)
#Get all combinations with repeats for the political_spectrum_values in groups of 3
combs<-combinations(nlevels(political_spectrum_values),
                            3,
                            as.character(political_spectrum_values),
                            repeats = T)
#Collapse each row in a single entry and convert it into a data.frame
combs<-data.frame(group_composition = apply(combs, 
                                            1, 
                                            function(x) paste(x, collapse = "_")))

0
投票

基本的R方法是使用political_spectrum_values创建一次取3的expand.grid的所有组合,sort按行逐行选择唯一的行。

df <- expand.grid(first_person = political_spectrum_values, 
                  second_person = political_spectrum_values, 
                  third_person = political_spectrum_values)

df[] <- t(apply(df, 1, sort))
unique(df)
© www.soinside.com 2019 - 2024. All rights reserved.