R: 为数据添加质量控制标志

问题描述 投票:0回答:1

我有一个包含传感器数据的数据框架,需要对其进行质量控制。我需要能够跟踪对数据所做的更改,因此在数据框架中添加了一列额外的注释。我希望 i)用 NA 值替换 -7999 值,ii)添加一列名为'QC_flag'的列,其中包含一个注释,如果 -7999 值被改为 NA,则该注释将被删除。

有没有一种方法可以在一行中完成,而不用调用两次mutate? 即作为na_if参数的一部分或类似的东西。

require(tidyverse) 

dat <- tibble(sensor_a = c(5, 3, 5, 4, 5, -7999, 3, 5, 4, 4),
              sensor_b = c(300, 290, 370, 400, -7999, 200, 350, 480, 120, 280),
              sensor_c = c(-7999, -7999, -7999, 1500, 1600, 1700, 1800, 1700, 1600, 1200))

dat2 <- dat %>% 
  mutate(QC_flag = case_when(sensor_a == -7999 ~ '7999 error [Sensor A]',
                           sensor_b == -7999 ~ '7999 error [Sensor B]',
                           sensor_c == -7999 ~ '7999 error [Sensor C]')) %>% 
  mutate(sensor_a = na_if(sensor_a, -7999), 
         sensor_b = na_if(sensor_b, -7999), 
         sensor_c = na_if(sensor_c, -7999))     

初始数据框是这样的。

> dat
# A tibble: 10 x 3
   sensor_a sensor_b sensor_c
      <dbl>    <dbl>    <dbl>
 1        5      300    -7999
 2        3      290    -7999
 3        5      370    -7999
 4        4      400     1500
 5        5    -7999     1600
 6    -7999      200     1700
 7        3      350     1800
 8        5      480     1700
 9        4      120     1600
10        4      280     1200

而结果是这样的

> dat2
# A tibble: 10 x 4
   sensor_a sensor_b sensor_c QC_flag              
      <dbl>    <dbl>    <dbl> <chr>                
 1        5      300       NA 7999 error [Sensor C]
 2        3      290       NA 7999 error [Sensor C]
 3        5      370       NA 7999 error [Sensor C]
 4        4      400     1500 NA                   
 5        5       NA     1600 7999 error [Sensor B]
 6       NA      200     1700 7999 error [Sensor A]
 7        3      350     1800 NA                   
 8        5      480     1700 NA                   
 9        4      120     1600 NA                   
10        4      280     1200 NA    
r tidyverse mutate case-when
1个回答
0
投票

可能最好是长格式的数据 。

library(dplyr)
library(tidyr)

dat %>%
  #Create a row index
  mutate(row = row_number()) %>%
  #Get data in long format
  pivot_longer(cols = -row) %>%
  #Add QC_flag name if value is -7999 or else keep NA
  mutate(QC_flag = ifelse(value == -7999, paste0('7999 error [', name, ']'),NA), 
         #Replace -7999 with NA
         value = na_if(value, -7999)) %>%
  group_by(row) %>%
  #Fill NA values in QC_flag
  fill(QC_flag, .direction = "updown") %>%
  #Get data in wide format
  pivot_wider() %>%
  ungroup() %>%
  #Select relevant columns in order.
  select(starts_with('sensor'), QC_flag)

#       sensor_a sensor_b sensor_c QC_flag              
#      <dbl>    <dbl>    <dbl> <chr>                
# 1        5      300       NA 7999 error [sensor_c]
# 2        3      290       NA 7999 error [sensor_c]
# 3        5      370       NA 7999 error [sensor_c]
# 4        4      400     1500 NA                   
# 5        5       NA     1600 7999 error [sensor_b]
# 6       NA      200     1700 7999 error [sensor_a]
# 7        3      350     1800 NA                   
# 8        5      480     1700 NA                   
# 9        4      120     1600 NA                   
#10        4      280     1200 NA                   
© www.soinside.com 2019 - 2024. All rights reserved.