我的工作内容如下 df
我的问题有一个心理障碍,我的想法是在考虑到标准的情况下放弃一组行。如果一个观察结果在 rrp_nsw
, rrp_qld
, rrp_sa
, rrp_tas
或 rrp_vic
是负数,我想删除所有在 year
, month
和 day
.
year month day fivemin rrp_nsw rrp_qld rrp_sa rrp_tas rrp_vic
2009 7 1 1 16.9 17.6 16.7 15.7 15.5
2009 7 1 2 17.7 18.8 17.8 16.1 15.5
2009 7 1 3 17.7 18.6 18.1 15.9 15.4
2009 7 1 4 16.7 18.6 17.6 14.3 12.8
2009 7 2 1 -15.6 17.6 16.3 13.2 11.8
2009 7 2 2 13.7 15.7 12.0 -11.1 -12.9
2009 7 2 3 13.7 15.8 11.9 11.1 12.9
2009 7 2 4 -13.9 16.1 -12.1 11.2 12.9
2009 8 1 1 13.8 16.0 12.2 11.2 12.8
2009 8 1 2 13.7 16.3 11.6 10.6 12.6
2009 8 1 3 13.7 -15.8 11.9 11.0 12.7
2009 8 1 4 13.8 16.0 12.1 11.2 12.9
2009 8 2 1 17.6 17.6 17.3 16.5 17.1
2009 8 2 2 17.7 17.6 17.3 16.8 17.4
2009 8 2 3 15.8 16.0 15.1 15.0 15.5
2009 8 2 4 15.4 15.6 14.5 14.6 15.1
2009 9 1 1 14.7 15.0 13.8 14.0 14.5
2009 9 1 2 15.3 15.4 14.3 14.6 15.0
2009 9 1 3 15.3 15.6 14.4 14.5 15.0
2009 9 1 4 14.9 15.7 13.7 13.8 14.5
例如我想要的 df
对我来说,将是。
year month day fivemin rrp_nsw rrp_qld rrp_sa rrp_tas rrp_vic
2009 7 1 1 16.9 17.6 16.7 15.7 15.5
2009 7 1 2 17.7 18.8 17.8 16.1 15.5
2009 7 1 3 17.7 18.6 18.1 15.9 15.4
2009 7 1 4 16.7 18.6 17.6 14.3 12.8
2009 8 2 1 17.6 17.6 17.3 16.5 17.1
2009 8 2 2 17.7 17.6 17.3 16.8 17.4
2009 8 2 3 15.8 16.0 15.1 15.0 15.5
2009 8 2 4 15.4 15.6 14.5 14.6 15.1
2009 9 1 1 14.7 15.0 13.8 14.0 14.5
2009 9 1 2 15.3 15.4 14.3 14.6 15.0
2009 9 1 3 15.3 15.6 14.4 14.5 15.0
2009 9 1 4 14.9 15.7 13.7 13.8 14.5
如果有人能帮我,我将感激不尽
使用 dplyr
为解决方案。
library(dplyr)
df1 <- df %>%
group_by(year, month, day) %>%
filter(!any(rrp_nsw<0|rrp_qld<0|rrp_sa<0|rrp_tas<0|rrp_vic<0))
>df1
# Groups: year, month, day [3]
year month day fivemin rrp_nsw rrp_qld rrp_sa rrp_tas rrp_vic
<int> <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2009 7 1 1 16.9 17.6 16.7 15.7 15.5
2 2009 7 1 2 17.7 18.8 17.8 16.1 15.5
3 2009 7 1 3 17.7 18.6 18.1 15.9 15.4
4 2009 7 1 4 16.7 18.6 17.6 14.3 12.8
5 2009 8 2 1 17.6 17.6 17.3 16.5 17.1
6 2009 8 2 2 17.7 17.6 17.3 16.8 17.4
7 2009 8 2 3 15.8 16 15.1 15 15.5
8 2009 8 2 4 15.4 15.6 14.5 14.6 15.1
9 2009 9 1 1 14.7 15 13.8 14 14.5
10 2009 9 1 2 15.3 15.4 14.3 14.6 15
11 2009 9 1 3 15.3 15.6 14.4 14.5 15
12 2009 9 1 4 14.9 15.7 13.7 13.8 14.5
在基地,我们可以使用
splitdata <- lapply(split(df,with(df,paste0(year,month,day))), function(x) x[all(x[,5:9] > 0)])
new_data <- do.call(rbind, splitdata[lengths(splitdata) >0])
row.names(new_data) <- NULL
其实它很聪明.我们把数据按年月日的粘贴进行拆分,做成组,然后只调用这些组的列中所有正值 [,5:9]
. 最后,我们将分割后的数据重新绑定在一起。
输出
> new_data
year month day fivemin rrp_nsw rrp_qld rrp_sa rrp_tas rrp_vic
1 2009 7 1 1 16.9 17.6 16.7 15.7 15.5
2 2009 7 1 2 17.7 18.8 17.8 16.1 15.5
3 2009 7 1 3 17.7 18.6 18.1 15.9 15.4
4 2009 7 1 4 16.7 18.6 17.6 14.3 12.8
5 2009 8 2 1 17.6 17.6 17.3 16.5 17.1
6 2009 8 2 2 17.7 17.6 17.3 16.8 17.4
7 2009 8 2 3 15.8 16.0 15.1 15.0 15.5
8 2009 8 2 4 15.4 15.6 14.5 14.6 15.1
9 2009 9 1 1 14.7 15.0 13.8 14.0 14.5
10 2009 9 1 2 15.3 15.4 14.3 14.6 15.0
11 2009 9 1 3 15.3 15.6 14.4 14.5 15.0
12 2009 9 1 4 14.9 15.7 13.7 13.8 14.5
数据。
df <- structure(list(year = c(2009L, 2009L, 2009L, 2009L, 2009L, 2009L,
2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L,
2009L, 2009L, 2009L, 2009L, 2009L), month = c(7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L
), day = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L), fivemin = c(1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L),
rrp_nsw = c(16.9, 17.7, 17.7, 16.7, -15.6, 13.7, 13.7, -13.9,
13.8, 13.7, 13.7, 13.8, 17.6, 17.7, 15.8, 15.4, 14.7, 15.3,
15.3, 14.9), rrp_qld = c(17.6, 18.8, 18.6, 18.6, 17.6, 15.7,
15.8, 16.1, 16, 16.3, -15.8, 16, 17.6, 17.6, 16, 15.6, 15,
15.4, 15.6, 15.7), rrp_sa = c(16.7, 17.8, 18.1, 17.6, 16.3,
12, 11.9, -12.1, 12.2, 11.6, 11.9, 12.1, 17.3, 17.3, 15.1,
14.5, 13.8, 14.3, 14.4, 13.7), rrp_tas = c(15.7, 16.1, 15.9,
14.3, 13.2, -11.1, 11.1, 11.2, 11.2, 10.6, 11, 11.2, 16.5,
16.8, 15, 14.6, 14, 14.6, 14.5, 13.8), rrp_vic = c(15.5,
15.5, 15.4, 12.8, 11.8, -12.9, 12.9, 12.9, 12.8, 12.6, 12.7,
12.9, 17.1, 17.4, 15.5, 15.1, 14.5, 15, 15, 14.5)), class = "data.frame", row.names = c(NA,
-20L))
这个方案和@DanielO的方案不太一样,但很好用,你可以添加一列,在满足条件的情况下,创建NA,然后过滤那些包含NA的行,只选择年、月、日。过滤那些包含NA的行,只选择年、月、日。有了这个方法,就可以对原来的df做一个反连接。
library(tidyverse)
df <- tibble::tribble(
~year, ~month, ~day, ~fivemin, ~rrp_nsw, ~rrp_qld, ~rrp_sa, ~rrp_tas, ~rrp_vic,
2009, 7, 1, 1, 16.9 , 17.6 , 16.7 , 15.7 , 15.5,
2009, 7, 1, 2, 17.7 , 18.8 , 17.8 , 16.1 , 15.5,
2009, 7, 1, 3, 17.7 , 18.6 , 18.1 , 15.9 , 15.4,
2009, 7, 1, 4, 16.7 , 18.6 , 17.6 , 14.3 , 12.8,
2009, 7, 2, 1, -15.6 , 17.6, 16.3, 13.2 ,11.8,
2009, 7, 2, 2, 13.7 , 15.7 , 12.0 , -11.1 , -12.9,
2009, 7, 2, 3, 13.7 , 15.8 , 11.9 , 11.1 , 12.9,
2009, 7, 2, 4, -13.9 , 16.1, -12.1, 11.2, 12.9,
2009, 8, 1, 1, 13.8 , 16.0 , 12.2 , 11.2 , 12.8,
2009, 8, 1, 2, 13.7 , 16.3 , 11.6 , 10.6 , 12.6,
2009, 8, 1, 3, 13.7 , -15.8, 11.9, 11.0 ,12.7,
2009, 8, 1, 4, 13.8 , 16.0 , 12.1 , 11.2 , 12.9,
2009, 8, 2, 1, 17.6 , 17.6 , 17.3 , 16.5 , 17.1,
2009, 8, 2, 2, 17.7 , 17.6 , 17.3 , 16.8 , 17.4,
2009, 8, 2, 3, 15.8 , 16.0 , 15.1 , 15.0 , 15.5,
2009, 8, 2, 4, 15.4 , 15.6 , 14.5 , 14.6 , 15.1,
2009, 9, 1, 1, 14.7 , 15.0 , 13.8 , 14.0 , 14.5,
2009, 9, 1, 2, 15.3 , 15.4 , 14.3 , 14.6 , 15.0,
2009, 9, 1, 3, 15.3 , 15.6 , 14.4 , 14.5 , 15.0,
2009, 9, 1, 4, 14.9 , 15.7 , 13.7 , 13.8 , 14.5
)
df %>%
mutate(toDrop = ifelse(rrp_nsw < 0 | rrp_qld < 0 | rrp_sa < 0 |
rrp_tas < 0 | rrp_vic <0 , NA, 0)) %>%
dplyr::filter(is.na(toDrop)) %>%
select(year:day)-> dff
anti_join(df, dff)
#> Joining, by = c("year", "month", "day")
#> # A tibble: 12 x 9
#> year month day fivemin rrp_nsw rrp_qld rrp_sa rrp_tas rrp_vic
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2009 7 1 1 16.9 17.6 16.7 15.7 15.5
#> 2 2009 7 1 2 17.7 18.8 17.8 16.1 15.5
#> 3 2009 7 1 3 17.7 18.6 18.1 15.9 15.4
#> 4 2009 7 1 4 16.7 18.6 17.6 14.3 12.8
#> 5 2009 8 2 1 17.6 17.6 17.3 16.5 17.1
#> 6 2009 8 2 2 17.7 17.6 17.3 16.8 17.4
#> 7 2009 8 2 3 15.8 16 15.1 15 15.5
#> 8 2009 8 2 4 15.4 15.6 14.5 14.6 15.1
#> 9 2009 9 1 1 14.7 15 13.8 14 14.5
#> 10 2009 9 1 2 15.3 15.4 14.3 14.6 15
#> 11 2009 9 1 3 15.3 15.6 14.4 14.5 15
#> 12 2009 9 1 4 14.9 15.7 13.7 13.8 14.5
创建于2020-06-10 重读包 (v0.3.0)
这似乎是一个简单的2步过程
1)筛选出数据,找到唯一的年月日,其中任何一个观测值为负值的地方。
2)根据步骤1中找到的日期删除数据。
希望能帮到你:)