按行匹配的字符串过滤数据帧

问题描述 投票:0回答:2

假设以下数据框:

data.frame(Var1=c("1  2  3","1  6","2  5  9","1  5  3"),Var2 = c("1  2","1  6  0  5","3  7","1  5"),Var3=c("2  8","1  3","6  19","1  3"))

     Var1         Var2        Var3
1 1  2  3         1  2       2  8
2    1  6 1  6    0  5       1  3
3 2  5  9         3  7       6  19
4 1  5  3         1  5       1  3

我想保留不包含任何按行和按列重复的数字的行。
因此,如果任何数字出现在特定行的至少两列中,则应删除该行。 所以,在这种情况下,结果应该是:

     Var1       Var2     Var3
3 2  5  9       3  7    6  19

随着 df 中的列数增加,我想使用 across() 函数来过滤这些行。

非常感谢。

r dplyr tidyverse stringr
2个回答
0
投票

奇怪的数据结构。在这里,我将其转换为长格式,将字符串分成行以获得更长的格式,消除重复,然后将其放回起始格式:

library(dplyr)
library(tidyr)
df |>
  mutate(
    row = row_number()
  ) |>
  pivot_longer(-row) |>
  separate_longer_delim(value, delim = stringr::regex(" +")) |>
  filter(!anyDuplicated(value), .by = row) |>
  summarize(value = paste(value, collapse = "  "), .by = name) |>
  pivot_wider(names_from = name, values_from = value)
# # A tibble: 1 × 3
#   Var1    Var2  Var3 
#   <chr>   <chr> <chr>
# 1 2  5  9 3  7  6  19

0
投票
library(tidyverse)

df <- data.frame(
  Var1 = c(
    "1  2  3", "1  6",
    "2  5  9", "1  5  3"
  ),
  Var2 = c(
    "1  2", "1  6  0  5",
    "3  7", "1  5"
  ),
  Var3 = c("2  8", "1  3", "6  19", "1  3")
)
# take a string of numbers and spaces and get a vector of numbers
myfunc <- function(x) {
  parse_number(unlist(strsplit(x, split = " ", fixed = TRUE))) |>
    na.omit() |>
    as.integer()
}
# test 
myfunc("1 2 3") |> str()

  # calculate
  (df2 <- mutate(rowwise(df),
                 fstring = paste0(c_across(everything()), collapse = " "),
                 nums = list(myfunc(fstring))
  ))
  
  # analyse
  (keepvec <- map_lgl(df2$nums, \(x)!anyDuplicated(x)))
  
  # final_result
  df |> filter(keepvec) 
© www.soinside.com 2019 - 2024. All rights reserved.