如何根据位置条件组合行中的字符串

问题描述 投票:0回答:1

事实证明很难找到这类问题的搜索术语。我需要编写一个脚本,可以在数据框中创建每行的所有字符串组合。它应该使用每个字符串一次,并且只创建与第一个字符串相距两步的字符串组合。第一列和最后一列实际上是彼此相邻的。因此它们也不能组合(实际上它是一个字符串圈)。这个相同的脚本需要应用于不同数量的列的数据帧,这里是一个8的例子。

我只是设法为具有给定数量的列的数据帧手动创建它,但不是一个适用于任意数量列的数据帧的表达式。

这是数据类型:

  Crop_1    Crop_2      Crop_3      Crop_4  Crop_5 Crop_6 Crop_7 Crop_8
1 Potato     Onion   Sugarbeet Grassclover Cabbage Potato  Wheat Carrot
2 Potato Sugarbeet Grassclover      Potato Cabbage  Onion Carrot  Wheat

在这种情况下,期望的结果应该是这6个选项:

                  Pair_1            Pair_2              Pair_3             Pair_4 Crop_1    Crop_2      Crop_3      Crop_4  Crop_5 Crop_6 Crop_7 Crop_8
1   Potato-Sugarbeet Onion-Grassclover       Cabbage-Wheat      Potato-Carrot Potato     Onion   Sugarbeet Grassclover Cabbage Potato  Wheat Carrot
2 Potato-Grassclover  Sugarbeet-Potato      Cabbage-Carrot        Onion-Wheat Potato Sugarbeet Grassclover      Potato Cabbage  Onion Carrot  Wheat
3       Potato-Wheat      Onion-Carrot   Sugarbeet-Cabbage Grassclover-Potato Potato     Onion   Sugarbeet Grassclover Cabbage Potato  Wheat Carrot
4      Potato-Carrot   Sugarbeet-Wheat Grassclover-Cabbage       Potato-Onion Potato Sugarbeet Grassclover      Potato Cabbage  Onion Carrot  Wheat
5     Potato-Cabbage      Onion-Potato     Sugarbeet-Wheat Grassclover-Carrot Potato     Onion   Sugarbeet Grassclover Cabbage Potato  Wheat Carrot
6     Potato-Cabbage   Sugarbeet-Onion  Grassclover-Carrot       Potato-Wheat Potato Sugarbeet Grassclover      Potato Cabbage  Onion Carrot  Wheat

可以在此处检索数据框:

structure(list(Crop_1 = structure(c(1L, 1L), .Label = "Potato", class = "factor"), 
    Crop_2 = structure(1:2, .Label = c("Onion", "Sugarbeet"), class = "factor"), 
    Crop_3 = structure(2:1, .Label = c("Grassclover", "Sugarbeet"
    ), class = "factor"), Crop_4 = structure(1:2, .Label = c("Grassclover", 
    "Potato"), class = "factor"), Crop_5 = structure(c(1L, 1L
    ), .Label = "Cabbage", class = "factor"), Crop_6 = structure(2:1, .Label = c("Onion", 
    "Potato"), class = "factor"), Crop_7 = structure(2:1, .Label = c("Carrot", 
    "Wheat"), class = "factor"), Crop_8 = structure(1:2, .Label = c("Carrot", 
    "Wheat"), class = "factor")), class = "data.frame", row.names = c(NA, 
-2L))
r condition combinations
1个回答
0
投票

这是一个可以解决问题的功能。您需要处理的是偶数可被4整除的数字,以及那些不能被数除的数字。对于那些可被4整除的人,你可以将它们分成四肢并按照你的方式做两对。我们使用seq.int来获取每对的开始,然后使用setdiff来获得目的。对于那些没有的,特别对待前6个(匹配1-4,2-5,3-6),然后像四肢一样做其余的。

其余的复杂性只是确保你可以接受tibble并返回tibble,因为这是nestunnest的预期。

library(tidyverse)
tbl <- structure(list(Crop_1 = c("Potato", "Potato"), Crop_2 = c("Onion", "Sugarbeet"), Crop_3 = c("Sugarbeet", "Grassclover"), Crop_4 = c("Grassclover", "Potato"), Crop_5 = c("Cabbage", "Cabbage"), Crop_6 = c("Potato", "Onion"), Crop_7 = c("Wheat", "Carrot"), Crop_8 = c("Carrot", "Wheat")), class = "data.frame", row.names = c(NA, -2L))

pair_crops <- function(crop_row) {
  crop_set <- as.character(crop_row)
  n_crops <- length(crop_set)
  if (n_crops %% 2 == 1) {
    stop("Odd number of crops!")
  } else if (n_crops %% 4 == 0) {
    starts <- sort(c(seq.int(1, n_crops, 4), seq.int(2, n_crops, 4)))
  } else {
    starts <- sort(c(1:3,seq.int(7, n_crops, 4), seq.int(8, n_crops, 4)))
  }
  ends <- setdiff(1:n_crops, starts)
  tibble(
    pair = str_c(crop_set[starts], "-", crop_set[ends]),
    name = str_c("Pair_", 1:length(starts))
  ) %>%
    spread(name, pair)
}

tbl %>%
  rowid_to_column %>%
  nest(-rowid, .key = "crop") %>%
  mutate(pairs = map(crop, pair_crops)) %>%
  unnest()
#>   rowid Crop_1    Crop_2      Crop_3      Crop_4  Crop_5 Crop_6 Crop_7
#> 1     1 Potato     Onion   Sugarbeet Grassclover Cabbage Potato  Wheat
#> 2     2 Potato Sugarbeet Grassclover      Potato Cabbage  Onion Carrot
#>   Crop_8             Pair_1            Pair_2         Pair_3        Pair_4
#> 1 Carrot   Potato-Sugarbeet Onion-Grassclover  Cabbage-Wheat Potato-Carrot
#> 2  Wheat Potato-Grassclover  Sugarbeet-Potato Cabbage-Carrot   Onion-Wheat

reprex package创建于2019-04-19(v0.2.1)

© www.soinside.com 2019 - 2024. All rights reserved.