根据条件增加NA的值,仅有tidyverse,R[重复]。

问题描述 投票:-1回答:2

我有一个年龄变量,有非常奇数的数据,比如1000,6666。现在很明显这个数据对任何分析都是不好的。我想保留明显的年龄,但想用NA来代替奇怪的数字。例如,0,1,2,3 4,......。100,我应该保留。但从>100开始,我想把它们作为NA。然而,只想用tidyverse来实现。我看了int几个函数,比如na_if,但不能实现我想要的。

这是一个我的数据的例子。看看第66行,你会明白我在说什么。

age_dput <- structure(list(Age = c(63, 19, 23, 28, 40, 31, 60, 26, 35, 44, 
    30, 47, 26, 45, 21, 38, 40, 28, 26, 40, 60, 33, 72, 40, 32, 32, 
    43, 24, 25, 39, 50, 22, 37, 53, 51, 42, 52, 29, 19, 42, 58, 61, 
    29, 26, 45, 29, 20, 26, 28, 43, 2, 42, 40, 33, 43, 53, 55, 27, 
    36, 41, 30, 54, 55, 6222, 21, 26, 38, 23, 48, 29, 44, 42, 35, 
    27, 28, 20, 59, 80, 35, 36, 24, 29, 34, 31, 25, 37, 30, 31, 48, 
    28, 30, 65, 45, 27, 39, 29, 34, 29, 76, 40)), row.names = c(NA, 
    -100L), class = c("tbl_df", "tbl", "data.frame"), problems = structure(list(
        row = c(2910L, 35958L), col = c("how_unwell", "how_unwell"
        ), expected = c("a double", "a double"), actual = c("How Unwell", 
        "How Unwell"), file = c("'/Users/gabrielburcea/Rprojects/data/data_lev_categorical_no_sev.csv'", 
        "'/Users/gabrielburcea/Rprojects/data/data_lev_categorical_no_sev.csv'"
        )), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
    )))
r tidyverse na
2个回答
3
投票

你可以使用 replaceif_else :

library(dplyr)
age_dput %>%
  mutate(clean_age_replace = replace(Age, Age > 100, NA_real_), 
         clean_age_if_else = if_else(Age > 100, NA_real_, Age))

1
投票

使用 na_if():

library(dplyr)
age_dput %>% 
  mutate(Age = na_if(Age, Age[Age > 100]))
© www.soinside.com 2019 - 2024. All rights reserved.