我想用三列的所有唯一组合创建一个df,其中值的顺序无关紧要。在我的示例中,我想创建一个清单,列出三个人可能拥有的所有意识形态组合。
在我的示例中,“无意见”,“中度”,“保守”与“保守”,“无意见”,“中度”与“中度”,“无意见”,“保守”相同,等等。所有这些组合都应该用一行来表示。
[我已经看到过类似的threads,有关将其用作主场和客队的distinct
,但我认为这不适用于此问题。
library(tidyverse)
political_spectrum_values =
factor(c("Far left",
"Liberal",
"Moderate",
"Conservative",
"Far right",
"No opinion"),
ordered = T)
political_groups_of_3 <-
crossing(first_person = political_spectrum_values,
second_person = political_spectrum_values,
third_person = political_spectrum_values)
我已经考虑过通过在此行中添加管道来制作某种组合变量,但是我不确定如何从这里开始使用它
unite(col = "group_composition", c(first_person, second_person, third_person), sep = "_")
编辑:在解决了这个问题之后,我已经以一种可能更容易的方式重塑了数据
crossing(first_person = political_spectrum_values,
second_person = political_spectrum_values,
third_person = political_spectrum_values) %>%
mutate(group_n = row_number()) %>%
pivot_longer(cols = c(first_person, second_person, third_person),
values_to = "ideology",
names_to = "group") %>%
select(-group)
这是您可以使用的技巧。不要以政治倾向的名称开头,而是以数字5 ^(0:5)开头。注意,任何长度为3的组合的和都是唯一的,因为3乘5 ^ x小于5 ^(x + 1)。因此,如果对三个这样的向量运行expand.grid
(相当于crossing
)并获取行总和,则唯一总和的位置将与crossing
结果中名称的唯一组合的位置相同。
所以您可以只做一个内衬:
political_groups_of_3[!duplicated(rowSums(expand.grid(5^(0:5), 5^(0:5), 5^(0:5)))), ]
给出:
#> # A tibble: 56 x 3
#> first_person second_person third_person
#> <ord> <ord> <ord>
#> 1 Conservative Conservative Conservative
#> 2 Conservative Conservative Far left
#> 3 Conservative Conservative Far right
#> 4 Conservative Conservative Liberal
#> 5 Conservative Conservative Moderate
#> 6 Conservative Conservative No opinion
#> 7 Conservative Far left Far left
#> 8 Conservative Far left Far right
#> 9 Conservative Far left Liberal
#> 10 Conservative Far left Moderate
#> # ... with 46 more rows
这是“更优雅”还是只是一个不透明的hack,这当然是个好习惯……
这里是使用更新和unite
的组合的答案。我会再打开一点,以防万一有人有更优雅的解决方案
crossing(first_person = political_spectrum_values,
second_person = political_spectrum_values,
third_person = political_spectrum_values) %>%
mutate(group_n = row_number()) %>%
pivot_longer(cols = c(first_person, second_person, third_person),
values_to = "ideology",
names_to = "group") %>%
select(-group) %>%
group_by(group_n) %>%
arrange(ideology) %>%
mutate(person = row_number()) %>%
pivot_wider(id_cols = group_n, values_from = ideology, names_from = person) %>%
unite(col = "group_composition", c(`1`, `2`, `3`), sep = "_") %>%
ungroup() %>%
distinct(group_composition)
这里是使用gtools::combinations
和paste
的两步式解决方案。
library(gtools)
#Get all combinations with repeats for the political_spectrum_values in groups of 3
combs<-combinations(nlevels(political_spectrum_values),
3,
as.character(political_spectrum_values),
repeats = T)
#Collapse each row in a single entry and convert it into a data.frame
combs<-data.frame(group_composition = apply(combs,
1,
function(x) paste(x, collapse = "_")))
基本的R方法是使用political_spectrum_values
创建一次取3的expand.grid
的所有组合,sort
按行逐行选择唯一的行。
df <- expand.grid(first_person = political_spectrum_values,
second_person = political_spectrum_values,
third_person = political_spectrum_values)
df[] <- t(apply(df, 1, sort))
unique(df)