如何使用R使用多个条件来丢弃行?

问题描述 投票:0回答:2

我的工作内容如下 df 我的问题有一个心理障碍,我的想法是在考虑到标准的情况下放弃一组行。如果一个观察结果在 rrp_nsw, rrp_qld, rrp_sa, rrp_tasrrp_vic 是负数,我想删除所有在 year, monthday.

year month   day fivemin rrp_nsw rrp_qld rrp_sa rrp_tas rrp_vic
2009     7     1       1    16.9    17.6   16.7    15.7    15.5
2009     7     1       2    17.7    18.8   17.8    16.1    15.5
2009     7     1       3    17.7    18.6   18.1    15.9    15.4
2009     7     1       4    16.7    18.6   17.6    14.3    12.8
2009     7     2       1    -15.6    17.6   16.3    13.2    11.8
2009     7     2       2    13.7    15.7   12.0    -11.1    -12.9
2009     7     2       3    13.7    15.8   11.9    11.1    12.9
2009     7     2       4    -13.9    16.1   -12.1    11.2    12.9
2009     8     1       1    13.8    16.0   12.2    11.2    12.8
2009     8     1       2    13.7    16.3   11.6    10.6    12.6
2009     8     1       3    13.7    -15.8   11.9    11.0    12.7
2009     8     1       4    13.8    16.0   12.1    11.2    12.9
2009     8     2       1    17.6    17.6   17.3    16.5    17.1
2009     8     2       2    17.7    17.6   17.3    16.8    17.4
2009     8     2       3    15.8    16.0   15.1    15.0    15.5
2009     8     2       4    15.4    15.6   14.5    14.6    15.1
2009     9     1       1    14.7    15.0   13.8    14.0    14.5
2009     9     1       2    15.3    15.4   14.3    14.6    15.0
2009     9     1       3    15.3    15.6   14.4    14.5    15.0
2009     9     1       4    14.9    15.7   13.7    13.8    14.5

例如我想要的 df 对我来说,将是。

year month   day fivemin rrp_nsw rrp_qld rrp_sa rrp_tas rrp_vic
2009     7     1       1    16.9    17.6   16.7    15.7    15.5
2009     7     1       2    17.7    18.8   17.8    16.1    15.5
2009     7     1       3    17.7    18.6   18.1    15.9    15.4
2009     7     1       4    16.7    18.6   17.6    14.3    12.8
2009     8     2       1    17.6    17.6   17.3    16.5    17.1
2009     8     2       2    17.7    17.6   17.3    16.8    17.4
2009     8     2       3    15.8    16.0   15.1    15.0    15.5
2009     8     2       4    15.4    15.6   14.5    14.6    15.1
2009     9     1       1    14.7    15.0   13.8    14.0    14.5
2009     9     1       2    15.3    15.4   14.3    14.6    15.0
2009     9     1       3    15.3    15.6   14.4    14.5    15.0
2009     9     1       4    14.9    15.7   13.7    13.8    14.5

如果有人能帮我,我将感激不尽

r dataframe row
2个回答
1
投票

使用 dplyr 为解决方案。

library(dplyr)

df1 <- df %>%
  group_by(year, month, day) %>%
  filter(!any(rrp_nsw<0|rrp_qld<0|rrp_sa<0|rrp_tas<0|rrp_vic<0))

>df1
# Groups:   year, month, day [3]
    year month   day fivemin rrp_nsw rrp_qld rrp_sa rrp_tas rrp_vic
   <int> <int> <int>   <int>   <dbl>   <dbl>  <dbl>   <dbl>   <dbl>
 1  2009     7     1       1    16.9    17.6   16.7    15.7    15.5
 2  2009     7     1       2    17.7    18.8   17.8    16.1    15.5
 3  2009     7     1       3    17.7    18.6   18.1    15.9    15.4
 4  2009     7     1       4    16.7    18.6   17.6    14.3    12.8
 5  2009     8     2       1    17.6    17.6   17.3    16.5    17.1
 6  2009     8     2       2    17.7    17.6   17.3    16.8    17.4
 7  2009     8     2       3    15.8    16     15.1    15      15.5
 8  2009     8     2       4    15.4    15.6   14.5    14.6    15.1
 9  2009     9     1       1    14.7    15     13.8    14      14.5
10  2009     9     1       2    15.3    15.4   14.3    14.6    15  
11  2009     9     1       3    15.3    15.6   14.4    14.5    15  
12  2009     9     1       4    14.9    15.7   13.7    13.8    14.5

2
投票

在基地,我们可以使用

 splitdata <- lapply(split(df,with(df,paste0(year,month,day))), function(x) x[all(x[,5:9] > 0)])
new_data <- do.call(rbind, splitdata[lengths(splitdata) >0])
row.names(new_data) <- NULL

其实它很聪明.我们把数据按年月日的粘贴进行拆分,做成组,然后只调用这些组的列中所有正值 [,5:9]. 最后,我们将分割后的数据重新绑定在一起。

输出

> new_data
   year month day fivemin rrp_nsw rrp_qld rrp_sa rrp_tas rrp_vic
1  2009     7   1       1    16.9    17.6   16.7    15.7    15.5
2  2009     7   1       2    17.7    18.8   17.8    16.1    15.5
3  2009     7   1       3    17.7    18.6   18.1    15.9    15.4
4  2009     7   1       4    16.7    18.6   17.6    14.3    12.8
5  2009     8   2       1    17.6    17.6   17.3    16.5    17.1
6  2009     8   2       2    17.7    17.6   17.3    16.8    17.4
7  2009     8   2       3    15.8    16.0   15.1    15.0    15.5
8  2009     8   2       4    15.4    15.6   14.5    14.6    15.1
9  2009     9   1       1    14.7    15.0   13.8    14.0    14.5
10 2009     9   1       2    15.3    15.4   14.3    14.6    15.0
11 2009     9   1       3    15.3    15.6   14.4    14.5    15.0
12 2009     9   1       4    14.9    15.7   13.7    13.8    14.5

数据。

df <- structure(list(year = c(2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 
2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 2009L, 
2009L, 2009L, 2009L, 2009L, 2009L), month = c(7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L
), day = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L), fivemin = c(1L, 2L, 3L, 4L, 1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), 
    rrp_nsw = c(16.9, 17.7, 17.7, 16.7, -15.6, 13.7, 13.7, -13.9, 
    13.8, 13.7, 13.7, 13.8, 17.6, 17.7, 15.8, 15.4, 14.7, 15.3, 
    15.3, 14.9), rrp_qld = c(17.6, 18.8, 18.6, 18.6, 17.6, 15.7, 
    15.8, 16.1, 16, 16.3, -15.8, 16, 17.6, 17.6, 16, 15.6, 15, 
    15.4, 15.6, 15.7), rrp_sa = c(16.7, 17.8, 18.1, 17.6, 16.3, 
    12, 11.9, -12.1, 12.2, 11.6, 11.9, 12.1, 17.3, 17.3, 15.1, 
    14.5, 13.8, 14.3, 14.4, 13.7), rrp_tas = c(15.7, 16.1, 15.9, 
    14.3, 13.2, -11.1, 11.1, 11.2, 11.2, 10.6, 11, 11.2, 16.5, 
    16.8, 15, 14.6, 14, 14.6, 14.5, 13.8), rrp_vic = c(15.5, 
    15.5, 15.4, 12.8, 11.8, -12.9, 12.9, 12.9, 12.8, 12.6, 12.7, 
    12.9, 17.1, 17.4, 15.5, 15.1, 14.5, 15, 15, 14.5)), class = "data.frame", row.names = c(NA, 
-20L))

1
投票

这个方案和@DanielO的方案不太一样,但很好用,你可以添加一列,在满足条件的情况下,创建NA,然后过滤那些包含NA的行,只选择年、月、日。过滤那些包含NA的行,只选择年、月、日。有了这个方法,就可以对原来的df做一个反连接。

library(tidyverse)
df <- tibble::tribble(
  ~year, ~month,   ~day, ~fivemin, ~rrp_nsw,  ~rrp_qld,  ~rrp_sa, ~rrp_tas, ~rrp_vic,
  2009, 7, 1, 1, 16.9   , 17.6  , 16.7  ,  15.7   , 15.5,
  2009, 7, 1, 2, 17.7   , 18.8  , 17.8  ,  16.1   , 15.5,
  2009, 7, 1, 3, 17.7   , 18.6  , 18.1  ,  15.9   , 15.4,
  2009, 7, 1, 4, 16.7   , 18.6  , 17.6  ,  14.3   , 12.8,
  2009, 7, 2, 1, -15.6  ,  17.6,   16.3,    13.2    ,11.8,
  2009, 7, 2, 2, 13.7   , 15.7  , 12.0  ,  -11.1  ,  -12.9,
  2009, 7, 2, 3, 13.7   , 15.8  , 11.9  ,  11.1   , 12.9,
  2009, 7, 2, 4, -13.9  ,  16.1,   -12.1,    11.2,    12.9,
  2009, 8, 1, 1, 13.8   , 16.0  , 12.2  ,  11.2   , 12.8,
  2009, 8, 1, 2, 13.7   , 16.3  , 11.6  ,  10.6   , 12.6,
  2009, 8, 1, 3, 13.7   , -15.8,   11.9,    11.0    ,12.7,
  2009, 8, 1, 4, 13.8   , 16.0  , 12.1  ,  11.2   , 12.9,
  2009, 8, 2, 1, 17.6   , 17.6  , 17.3  ,  16.5   , 17.1,
  2009, 8, 2, 2, 17.7   , 17.6  , 17.3  ,  16.8   , 17.4,
  2009, 8, 2, 3, 15.8   , 16.0  , 15.1  ,  15.0   , 15.5,
  2009, 8, 2, 4, 15.4   , 15.6  , 14.5  ,  14.6   , 15.1,
  2009, 9, 1, 1, 14.7   , 15.0  , 13.8  ,  14.0   , 14.5,
  2009, 9, 1, 2, 15.3   , 15.4  , 14.3  ,  14.6   , 15.0,
  2009, 9, 1, 3, 15.3   , 15.6  , 14.4  ,  14.5   , 15.0,
  2009, 9, 1, 4, 14.9   , 15.7  , 13.7  ,  13.8   , 14.5
)

df %>% 
  mutate(toDrop = ifelse(rrp_nsw < 0 | rrp_qld < 0 | rrp_sa < 0 |
                           rrp_tas < 0 | rrp_vic <0 , NA, 0)) %>% 
  dplyr::filter(is.na(toDrop)) %>% 
  select(year:day)-> dff
anti_join(df, dff)
#> Joining, by = c("year", "month", "day")
#> # A tibble: 12 x 9
#>     year month   day fivemin rrp_nsw rrp_qld rrp_sa rrp_tas rrp_vic
#>    <dbl> <dbl> <dbl>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>   <dbl>
#>  1  2009     7     1       1    16.9    17.6   16.7    15.7    15.5
#>  2  2009     7     1       2    17.7    18.8   17.8    16.1    15.5
#>  3  2009     7     1       3    17.7    18.6   18.1    15.9    15.4
#>  4  2009     7     1       4    16.7    18.6   17.6    14.3    12.8
#>  5  2009     8     2       1    17.6    17.6   17.3    16.5    17.1
#>  6  2009     8     2       2    17.7    17.6   17.3    16.8    17.4
#>  7  2009     8     2       3    15.8    16     15.1    15      15.5
#>  8  2009     8     2       4    15.4    15.6   14.5    14.6    15.1
#>  9  2009     9     1       1    14.7    15     13.8    14      14.5
#> 10  2009     9     1       2    15.3    15.4   14.3    14.6    15  
#> 11  2009     9     1       3    15.3    15.6   14.4    14.5    15  
#> 12  2009     9     1       4    14.9    15.7   13.7    13.8    14.5

创建于2020-06-10 重读包 (v0.3.0)


-1
投票

这似乎是一个简单的2步过程

1)筛选出数据,找到唯一的年月日,其中任何一个观测值为负值的地方。

2)根据步骤1中找到的日期删除数据。

希望能帮到你:)

© www.soinside.com 2019 - 2024. All rights reserved.