我有一个包含传感器数据的数据框架,需要对其进行质量控制。我需要能够跟踪对数据所做的更改,因此在数据框架中添加了一列额外的注释。我希望 i)用 NA 值替换 -7999 值,ii)添加一列名为'QC_flag'的列,其中包含一个注释,如果 -7999 值被改为 NA,则该注释将被删除。
有没有一种方法可以在一行中完成,而不用调用两次mutate? 即作为na_if参数的一部分或类似的东西。
require(tidyverse)
dat <- tibble(sensor_a = c(5, 3, 5, 4, 5, -7999, 3, 5, 4, 4),
sensor_b = c(300, 290, 370, 400, -7999, 200, 350, 480, 120, 280),
sensor_c = c(-7999, -7999, -7999, 1500, 1600, 1700, 1800, 1700, 1600, 1200))
dat2 <- dat %>%
mutate(QC_flag = case_when(sensor_a == -7999 ~ '7999 error [Sensor A]',
sensor_b == -7999 ~ '7999 error [Sensor B]',
sensor_c == -7999 ~ '7999 error [Sensor C]')) %>%
mutate(sensor_a = na_if(sensor_a, -7999),
sensor_b = na_if(sensor_b, -7999),
sensor_c = na_if(sensor_c, -7999))
初始数据框是这样的。
> dat
# A tibble: 10 x 3
sensor_a sensor_b sensor_c
<dbl> <dbl> <dbl>
1 5 300 -7999
2 3 290 -7999
3 5 370 -7999
4 4 400 1500
5 5 -7999 1600
6 -7999 200 1700
7 3 350 1800
8 5 480 1700
9 4 120 1600
10 4 280 1200
而结果是这样的
> dat2
# A tibble: 10 x 4
sensor_a sensor_b sensor_c QC_flag
<dbl> <dbl> <dbl> <chr>
1 5 300 NA 7999 error [Sensor C]
2 3 290 NA 7999 error [Sensor C]
3 5 370 NA 7999 error [Sensor C]
4 4 400 1500 NA
5 5 NA 1600 7999 error [Sensor B]
6 NA 200 1700 7999 error [Sensor A]
7 3 350 1800 NA
8 5 480 1700 NA
9 4 120 1600 NA
10 4 280 1200 NA
可能最好是长格式的数据 。
library(dplyr)
library(tidyr)
dat %>%
#Create a row index
mutate(row = row_number()) %>%
#Get data in long format
pivot_longer(cols = -row) %>%
#Add QC_flag name if value is -7999 or else keep NA
mutate(QC_flag = ifelse(value == -7999, paste0('7999 error [', name, ']'),NA),
#Replace -7999 with NA
value = na_if(value, -7999)) %>%
group_by(row) %>%
#Fill NA values in QC_flag
fill(QC_flag, .direction = "updown") %>%
#Get data in wide format
pivot_wider() %>%
ungroup() %>%
#Select relevant columns in order.
select(starts_with('sensor'), QC_flag)
# sensor_a sensor_b sensor_c QC_flag
# <dbl> <dbl> <dbl> <chr>
# 1 5 300 NA 7999 error [sensor_c]
# 2 3 290 NA 7999 error [sensor_c]
# 3 5 370 NA 7999 error [sensor_c]
# 4 4 400 1500 NA
# 5 5 NA 1600 7999 error [sensor_b]
# 6 NA 200 1700 7999 error [sensor_a]
# 7 3 350 1800 NA
# 8 5 480 1700 NA
# 9 4 120 1600 NA
#10 4 280 1200 NA