如何在数据帧的每一行中找到总和达到特定值的组合

问题描述 投票:0回答:1

我在工作中被要求为 R 提供一个使用并行化的函数,当你向它传递一个数据帧(如下面添加的数据帧)时,它应该查找每行相加等于

Importe_Pendiente
中的值的组合。找到的组合应放置在名为“组合”的新列中,其值将是行的 ID,这些行的 ID 相加后会得出该行的相同
Importe_Pendiente
值。
Importe_Pendiente
的值可以是正值也可以是负值。我是这个世界的新手,所以我希望你能帮助我!非常感谢。

目标数据框是这样的:

datos <- data.frame(
  ID = c("FCR23E015-1625", "E015-23-3583", "E015-23-3584", "E015-23-3585", "FCR23NIEB-0141"),
  Proveedor = c("2192", "6772", "6772", "6772", "7403"),
  Descripcion = c("Factura FCR23E015-1625", "AMAZON BUSINESS EU, S.A.R.L.", "AMAZON BUSINESS EU, S.A.R.L.", "AMAZON BUSINESS EU, S.A.R.L.", "Factura FCR23NIEB-0141"),
  Importe = c(-2330, 54.8, 54.8, 66, -1029),
  Importe_Pendiente = c(-2330, 45, 55, 100, -1029)
)

输出应该是这样的:

 ID             Proveedor Descripcion                  Importe Importe_Pendiente Combinaciones     
  <chr>          <chr>     <chr>                          <dbl>             <dbl> <chr>             
1 FCR23E015-1625 2192      Factura FCR23E015-1625       -2330             -2330 NO_COMBINACIONES   
2 E015-23-3583   6772      AMAZON BUSINESS EU, S.A.R.L.    54.8              45 NO_COMBINACIONES      
3 E015-23-3584   6772      AMAZON BUSINESS EU, S.A.R.L.    54.8              55 NO_COMBINACIONES      
4 E015-23-3585   6772      AMAZON BUSINESS EU, S.A.R.L.    66               100 [E015-23-3583+E015-23-3584]
5 FCR23NIEB-0141 7403      Factura FCR23NIEB-0141       -1028             -1028 NO_COMBINACIONES

如果某些行可能有超过 1 个组合,最好用方括号 [] 分隔组合。

我花了很多时间尝试修改这个从 chat-gpt 获取的脚本,但我不够幸运使其工作。

plan(multisession)

buscar_combinaciones <- function(df) {
  # Casting the column Importe_Pendiente to numeric type
  df$Importe_Pendiente <- as.numeric(df$Importe_Pendiente)
  
  resultado <- df %>%
    group_by(Proveedor) %>%
    mutate(Combinaciones = {
      if (n() < 2) {
        "NO_COMBINACIONES"
      } else {
        idx_comb <- combn(n(), 2)
        combinations_str <- vector("character", ncol(idx_comb))
        
        for (i in seq_along(combinations_str)) {
          combination <- Importe_Pendiente[idx_comb[, i]]
          if (sum(combination) == 0) {
            combinations_str[i] <- paste(ID[idx_comb[, i]], collapse = "+")
          }
        }
        
        valid_combinations <- na.omit(combinations_str)
        
        if (length(valid_combinations) == 0) {
          "NO_COMBINACIONES"
        } else {
          paste(valid_combinations, collapse = ", ")
        }
      }
    }) %>%
    ungroup()
  
  return(resultado)
}

# Adjusting the number of cores of the CPU to be used while parallelizing
# Only multiple cores will be used when there are more than 1 row for each Proveedor
num_nucleos <- ifelse(length(unique(datos$Proveedor)) > 1, 4, 1)
plan(multisession, workers = num_nucleos)
r dplyr parallel-processing combinations economics
1个回答
0
投票

这是一个简单的非并行实现;这不太可能很好地扩展

# https://stackoverflow.com/questions/76787219/how-to-find-combinations-that-sum-up-to-a-certain-value-in-each-row-of-a-datafra
C
datos <- data.frame(
  ID = c("FCR23E015-1625", "E015-23-3583", "E015-23-3584", "E015-23-3585", "FCR23NIEB-0141"),
  Proveedor = c("2192", "6772", "6772", "6772", "7403"),
  Descripcion = c("Factura FCR23E015-1625", "AMAZON BUSINESS EU, S.A.R.L.", "AMAZON BUSINESS EU, S.A.R.L.", "AMAZON BUSINESS EU, S.A.R.L.", "Factura FCR23NIEB-0141"),
  Importe = c(-2330, 54.8, 54.8, 66, -1029),
  Importe_Pendiente = c(-2330, 45, 55, 100, -1029)
)

(simpler_start <- datos |> select(ID, Importe_Pendiente))

(next_step <- rowwise(simpler_start) |>
  mutate(
    others_raw = list({
      \(x){
        simpler_start |>
          filter(ID != x) |>
          deframe()
      }
    }(ID)),
    len = length(unlist(others_raw)),
    combinations = list(flatten(map(seq_len(len), \(x){
      combn(others_raw, m = x, simplify = FALSE)
    }))),
    test_combs = list(map(combinations, \(x){
      sum(x) == Importe_Pendiente
    })),
    combs_solve = ({
      temp <- map2_chr(combinations, test_combs, \(x, y){
        if (y) {
          paste0(names(x), collapse = ",")
        } else {
          ""
        }
      })
      paste0(temp[nzchar(temp)], collapse = ";")
    })
  ))
© www.soinside.com 2019 - 2024. All rights reserved.