假设以下数据框:
data.frame(Var1=c("1 2 3","1 6","2 5 9","1 5 3"),Var2 = c("1 2","1 6 0 5","3 7","1 5"),Var3=c("2 8","1 3","6 19","1 3"))
Var1 Var2 Var3
1 1 2 3 1 2 2 8
2 1 6 1 6 0 5 1 3
3 2 5 9 3 7 6 19
4 1 5 3 1 5 1 3
我想保留不包含任何按行和按列重复的数字的行。
因此,如果任何数字出现在特定行的至少两列中,则应删除该行。
所以,在这种情况下,结果应该是:
Var1 Var2 Var3
3 2 5 9 3 7 6 19
随着 df 中的列数增加,我想使用 across() 函数来过滤这些行。
非常感谢。
奇怪的数据结构。在这里,我将其转换为长格式,将字符串分成行以获得更长的格式,消除重复,然后将其放回起始格式:
library(dplyr)
library(tidyr)
df |>
mutate(
row = row_number()
) |>
pivot_longer(-row) |>
separate_longer_delim(value, delim = stringr::regex(" +")) |>
filter(!anyDuplicated(value), .by = row) |>
summarize(value = paste(value, collapse = " "), .by = name) |>
pivot_wider(names_from = name, values_from = value)
# # A tibble: 1 × 3
# Var1 Var2 Var3
# <chr> <chr> <chr>
# 1 2 5 9 3 7 6 19
library(tidyverse)
df <- data.frame(
Var1 = c(
"1 2 3", "1 6",
"2 5 9", "1 5 3"
),
Var2 = c(
"1 2", "1 6 0 5",
"3 7", "1 5"
),
Var3 = c("2 8", "1 3", "6 19", "1 3")
)
# take a string of numbers and spaces and get a vector of numbers
myfunc <- function(x) {
parse_number(unlist(strsplit(x, split = " ", fixed = TRUE))) |>
na.omit() |>
as.integer()
}
# test
myfunc("1 2 3") |> str()
# calculate
(df2 <- mutate(rowwise(df),
fstring = paste0(c_across(everything()), collapse = " "),
nums = list(myfunc(fstring))
))
# analyse
(keepvec <- map_lgl(df2$nums, \(x)!anyDuplicated(x)))
# final_result
df |> filter(keepvec)