如果数据框内容不唯一;子集、合并和重命名

问题描述 投票:0回答:2

我有以下模式的动态数据框:

structure(list(Date = structure(c(19304, 19305, 19311, 
19311, 19312), class = "Date"), Category = c("4", 
"6", "1", "0", "3"), Units_Sold = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), Raised = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_), Method = c("Trad", 
"Trad", "Unknown", "Trad", "Unknown"), Day = c(8, 9, 15, 15, 16)), row.names = c(NA, 
-5L), class = c("tbl_df", "tbl", "data.frame"))

您可能会看到,有两个类别具有相同的日期。我想做的是创建一个条件:如果有两行具有相同的日期,则 df 将被子集化(称其为 df_copy),并且在新的 df 中,其中一行将被删除,内容将被删除“类别”列的将更改为“检查数据框”,“方法”列将更改为“注意”。

在回答这个问题时,我的数据框看起来像这样:

tibble [5 x 6] (S3: tbl_df/tbl/data.frame)
 $ Date : Date[1:5], format: "2022-11-08" "2022-11-09" "2022-11-15" "2022-11-16"
 $ Category: chr [1:5] "4" "6" "Check Dataframe" "3"
 $ Units_Sold: num [1:5] NA NA NA NA
 $ Raised: num [1:5] NA NA NA NA
 $ Method : chr [1:5] "Trade" "Trad" "Attention" "Unknown"
 $ Day: num [1:5] 8 9 15 15 16

如果可能的话,是否可以创建一个布尔对象来检查,因此如果有超过 1 行具有相同的日期,“检查器”对象将 = 1?

r dataframe match
2个回答
3
投票

使用 dplyr,按

Date
分组,将警告标志添加到
n() > 1
所在的行,然后使用
distinct()
删除重复行:

library(dplyr)

df_copy <- df_orig %>%
  group_by(Date) %>%
  mutate(
    Category = ifelse(n() > 1, "Check Dataframe", Category),
    Method = ifelse(n() > 1, "Attention", Method)
  ) %>%
  ungroup() %>%
  distinct(Date, .keep_all = TRUE)

df_copy

输出:

# A tibble: 4 × 6
  Date       Category        Units_Sold Raised Method      Day
  <date>     <chr>                <dbl>  <dbl> <chr>     <dbl>
1 2022-11-08 4                       NA     NA Trad          8
2 2022-11-09 6                       NA     NA Trad          9
3 2022-11-15 Check Dataframe         NA     NA Attention    15
4 2022-11-16 3                       NA     NA Unknown      16

2
投票

我们可以按“日期”分组,使用

if/else
条件更改“类别”、“方法”列、
slice
第一行和
ungroup

中的值
library(dplyr)
df2 <-  df1 %>%
    group_by(Date) %>% 
    mutate(Category = if(n() > 1) "Check Dataframe" else Category,
           Method = if(n() > 1) "Attention" else Method) %>%
    slice(n=1) %>%
    ungroup

-输出

 df2
# A tibble: 4 × 6
  Date       Category        Units_Sold Raised Method      Day
  <date>     <chr>                <dbl>  <dbl> <chr>     <dbl>
1 2022-11-08 4                       NA     NA Trad          8
2 2022-11-09 6                       NA     NA Trad          9
3 2022-11-15 Check Dataframe         NA     NA Attention    15
4 2022-11-16 3                       NA     NA Unknown      16

或与

data.table

library(data.table)
setDT(df1)[, c("Category", "Method") := if(.N > 1)
    .("Check Dataframe", "Attention") else .(Category, Method), Date]
df2 <- unique(df1, by  = 'Date')

-输出

> df2
         Date        Category Units_Sold Raised    Method   Day
       <Date>          <char>      <num>  <num>    <char> <num>
1: 2022-11-08               4         NA     NA      Trad     8
2: 2022-11-09               6         NA     NA      Trad     9
3: 2022-11-15 Check Dataframe         NA     NA Attention    15
4: 2022-11-16               3         NA     NA   Unknown    16
© www.soinside.com 2019 - 2024. All rights reserved.