我有以下模式的动态数据框:
structure(list(Date = structure(c(19304, 19305, 19311,
19311, 19312), class = "Date"), Category = c("4",
"6", "1", "0", "3"), Units_Sold = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), Raised = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), Method = c("Trad",
"Trad", "Unknown", "Trad", "Unknown"), Day = c(8, 9, 15, 15, 16)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
您可能会看到,有两个类别具有相同的日期。我想做的是创建一个条件:如果有两行具有相同的日期,则 df 将被子集化(称其为 df_copy),并且在新的 df 中,其中一行将被删除,内容将被删除“类别”列的将更改为“检查数据框”,“方法”列将更改为“注意”。
在回答这个问题时,我的数据框看起来像这样:
tibble [5 x 6] (S3: tbl_df/tbl/data.frame)
$ Date : Date[1:5], format: "2022-11-08" "2022-11-09" "2022-11-15" "2022-11-16"
$ Category: chr [1:5] "4" "6" "Check Dataframe" "3"
$ Units_Sold: num [1:5] NA NA NA NA
$ Raised: num [1:5] NA NA NA NA
$ Method : chr [1:5] "Trade" "Trad" "Attention" "Unknown"
$ Day: num [1:5] 8 9 15 15 16
如果可能的话,是否可以创建一个布尔对象来检查,因此如果有超过 1 行具有相同的日期,“检查器”对象将 = 1?
使用 dplyr,按
Date
分组,将警告标志添加到 n() > 1
所在的行,然后使用 distinct()
删除重复行:
library(dplyr)
df_copy <- df_orig %>%
group_by(Date) %>%
mutate(
Category = ifelse(n() > 1, "Check Dataframe", Category),
Method = ifelse(n() > 1, "Attention", Method)
) %>%
ungroup() %>%
distinct(Date, .keep_all = TRUE)
df_copy
输出:
# A tibble: 4 × 6
Date Category Units_Sold Raised Method Day
<date> <chr> <dbl> <dbl> <chr> <dbl>
1 2022-11-08 4 NA NA Trad 8
2 2022-11-09 6 NA NA Trad 9
3 2022-11-15 Check Dataframe NA NA Attention 15
4 2022-11-16 3 NA NA Unknown 16
我们可以按“日期”分组,使用
if/else
条件更改“类别”、“方法”列、slice
第一行和 ungroup
中的值
library(dplyr)
df2 <- df1 %>%
group_by(Date) %>%
mutate(Category = if(n() > 1) "Check Dataframe" else Category,
Method = if(n() > 1) "Attention" else Method) %>%
slice(n=1) %>%
ungroup
-输出
df2
# A tibble: 4 × 6
Date Category Units_Sold Raised Method Day
<date> <chr> <dbl> <dbl> <chr> <dbl>
1 2022-11-08 4 NA NA Trad 8
2 2022-11-09 6 NA NA Trad 9
3 2022-11-15 Check Dataframe NA NA Attention 15
4 2022-11-16 3 NA NA Unknown 16
或与
data.table
library(data.table)
setDT(df1)[, c("Category", "Method") := if(.N > 1)
.("Check Dataframe", "Attention") else .(Category, Method), Date]
df2 <- unique(df1, by = 'Date')
-输出
> df2
Date Category Units_Sold Raised Method Day
<Date> <char> <num> <num> <char> <num>
1: 2022-11-08 4 NA NA Trad 8
2: 2022-11-09 6 NA NA Trad 9
3: 2022-11-15 Check Dataframe NA NA Attention 15
4: 2022-11-16 3 NA NA Unknown 16