我有一个年龄变量,有非常奇数的数据,比如1000,6666。现在很明显这个数据对任何分析都是不好的。我想保留明显的年龄,但想用NA来代替奇怪的数字。例如,0,1,2,3 4,......。100,我应该保留。但从>100开始,我想把它们作为NA。然而,只想用tidyverse来实现。我看了int几个函数,比如na_if,但不能实现我想要的。
这是一个我的数据的例子。看看第66行,你会明白我在说什么。
age_dput <- structure(list(Age = c(63, 19, 23, 28, 40, 31, 60, 26, 35, 44,
30, 47, 26, 45, 21, 38, 40, 28, 26, 40, 60, 33, 72, 40, 32, 32,
43, 24, 25, 39, 50, 22, 37, 53, 51, 42, 52, 29, 19, 42, 58, 61,
29, 26, 45, 29, 20, 26, 28, 43, 2, 42, 40, 33, 43, 53, 55, 27,
36, 41, 30, 54, 55, 6222, 21, 26, 38, 23, 48, 29, 44, 42, 35,
27, 28, 20, 59, 80, 35, 36, 24, 29, 34, 31, 25, 37, 30, 31, 48,
28, 30, 65, 45, 27, 39, 29, 34, 29, 76, 40)), row.names = c(NA,
-100L), class = c("tbl_df", "tbl", "data.frame"), problems = structure(list(
row = c(2910L, 35958L), col = c("how_unwell", "how_unwell"
), expected = c("a double", "a double"), actual = c("How Unwell",
"How Unwell"), file = c("'/Users/gabrielburcea/Rprojects/data/data_lev_categorical_no_sev.csv'",
"'/Users/gabrielburcea/Rprojects/data/data_lev_categorical_no_sev.csv'"
)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
)))
你可以使用 replace
或 if_else
:
library(dplyr)
age_dput %>%
mutate(clean_age_replace = replace(Age, Age > 100, NA_real_),
clean_age_if_else = if_else(Age > 100, NA_real_, Age))
使用 na_if()
:
library(dplyr)
age_dput %>%
mutate(Age = na_if(Age, Age[Age > 100]))