从R中的数据帧中删除一行中的重复元素

问题描述 投票:1回答:2

[删除数据帧中的重复行为relatively easy。但是,删除数据框中一行的重复元素是一个更具挑战性的问题。

让我们开始这个df:

df <- structure(list(V1 = c("B1182", "B1182", "B1182", "B1182", "B1182", 
"B1182", "B1182", "B1182", NA, NA, "B1182", "B1182", "B1182", 
NA, NA, NA, NA, "P2000", "P2000", NA), V2 = c("B124D", "B124D", 
"B124D", "B124D", "B124D", "B124D", "B124D", "B124D", NA, NA, 
"B124D", "B124D", "B124D", NA, NA, NA, NA, "P2000", "P2000", 
NA), V3 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, "U3003", "U3003", NA), V4 = c(NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "P2000", 
"P2000", NA), V5 = c(NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_), V6 = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_), V7 = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_), 
    V8 = c(NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    )), .Names = c("V1", "V2", "V3", "V4", "V5", "V6", "V7", 
"V8"), row.names = c(NA, 20L), class = "data.frame")

这是df的输出:

      V1    V2    V3    V4   V5   V6   V7   V8
1  B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
2  B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
3  B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
4  B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
5  B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
6  B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
7  B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
8  B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
9   <NA>  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA>
10  <NA>  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA>
11 B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
12 B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
13 B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
14  <NA>  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA>
15  <NA>  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA>
16  <NA>  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA>
17  <NA>  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA>
18 P2000 P2000 U3003 P2000 <NA> <NA> <NA> <NA>
19 P2000 P2000 U3003 P2000 <NA> <NA> <NA> <NA>
20  <NA>  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA>

如您所见,第18和19行包含重复的代码(P2000)。我想删除这些重复的元素,并仅保留该行中出现的第一个元素。请注意,这是我原始df的摘录,因此它必须适用于所有情况。

期望的输出可能像这样:

      V1    V2    V3    V4   V5   V6   V7   V8
1  B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
2  B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
3  B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
4  B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
5  B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
6  B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
7  B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
8  B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
9   <NA>  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA>
10  <NA>  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA>
11 B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
12 B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
13 B1182 B124D  <NA>  <NA> <NA> <NA> <NA> <NA>
14  <NA>  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA>
15  <NA>  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA>
16  <NA>  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA>
17  <NA>  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA>
18 P2000  <NA> U3003  <NA> <NA> <NA> <NA> <NA>
19 P2000  <NA> U3003  <NA> <NA> <NA> <NA> <NA>
20  <NA>  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA>

我不在乎变量本身,因为它们将是rearranged and transformed later

所以,如何在此df中删除一行中的重复元素?

r list dataframe duplicates
2个回答
1
投票

您可以在行上使用tapply,并将重复项替换为NA:

df[t(apply(df,  1, duplicated))] <- NA

> df
      V1    V2    V3   V4   V5   V6   V7   V8
1  B1182 B124D  <NA> <NA> <NA> <NA> <NA> <NA>
2  B1182 B124D  <NA> <NA> <NA> <NA> <NA> <NA>
3  B1182 B124D  <NA> <NA> <NA> <NA> <NA> <NA>
4  B1182 B124D  <NA> <NA> <NA> <NA> <NA> <NA>
5  B1182 B124D  <NA> <NA> <NA> <NA> <NA> <NA>
6  B1182 B124D  <NA> <NA> <NA> <NA> <NA> <NA>
7  B1182 B124D  <NA> <NA> <NA> <NA> <NA> <NA>
8  B1182 B124D  <NA> <NA> <NA> <NA> <NA> <NA>
9   <NA>  <NA>  <NA> <NA> <NA> <NA> <NA> <NA>
10  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA> <NA>
11 B1182 B124D  <NA> <NA> <NA> <NA> <NA> <NA>
12 B1182 B124D  <NA> <NA> <NA> <NA> <NA> <NA>
13 B1182 B124D  <NA> <NA> <NA> <NA> <NA> <NA>
14  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA> <NA>
15  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA> <NA>
16  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA> <NA>
17  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA> <NA>
18 P2000  <NA> U3003 <NA> <NA> <NA> <NA> <NA>
19 P2000  <NA> U3003 <NA> <NA> <NA> <NA> <NA>
20  <NA>  <NA>  <NA> <NA> <NA> <NA> <NA> <NA>

0
投票

似乎您的其他问题都包含tidyverse,因此这是同时使用pivot_longerpivot_wider的替代方法:

library(tidyverse)

df %>%
  mutate(rn = row_number()) %>%
  pivot_longer(cols = -rn, names_to = "var", values_to = "value") %>%
  group_by(rn) %>%
  mutate(value = ifelse(duplicated(value), NA, value)) %>%
  pivot_wider(id_cols = rn, names_from = "var", values_from = "value")

输出

# A tibble: 20 x 9
# Groups:   rn [20]
      rn V1    V2    V3    V4    V5    V6    V7    V8   
   <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1     1 B1182 B124D NA    NA    NA    NA    NA    NA   
 2     2 B1182 B124D NA    NA    NA    NA    NA    NA   
 3     3 B1182 B124D NA    NA    NA    NA    NA    NA   
 4     4 B1182 B124D NA    NA    NA    NA    NA    NA   
 5     5 B1182 B124D NA    NA    NA    NA    NA    NA   
 6     6 B1182 B124D NA    NA    NA    NA    NA    NA   
 7     7 B1182 B124D NA    NA    NA    NA    NA    NA   
 8     8 B1182 B124D NA    NA    NA    NA    NA    NA   
 9     9 NA    NA    NA    NA    NA    NA    NA    NA   
10    10 NA    NA    NA    NA    NA    NA    NA    NA   
11    11 B1182 B124D NA    NA    NA    NA    NA    NA   
12    12 B1182 B124D NA    NA    NA    NA    NA    NA   
13    13 B1182 B124D NA    NA    NA    NA    NA    NA   
14    14 NA    NA    NA    NA    NA    NA    NA    NA   
15    15 NA    NA    NA    NA    NA    NA    NA    NA   
16    16 NA    NA    NA    NA    NA    NA    NA    NA   
17    17 NA    NA    NA    NA    NA    NA    NA    NA   
18    18 P2000 NA    U3003 NA    NA    NA    NA    NA   
19    19 P2000 NA    U3003 NA    NA    NA    NA    NA   
20    20 NA    NA    NA    NA    NA    NA    NA    NA 
© www.soinside.com 2019 - 2024. All rights reserved.