我正在尝试清除代码以清除丢失的数据。我有一个包含6列的数据集,如果我像这样单独进行处理,则代码可以正常工作:
mammographic_masses <- mammographic_masses %>%
mutate(birad = replace(birad, birad== "na", NA)) %>%
mutate(birad = replace(birad, birad== "N/A", NA))
但是当我尝试在这样的for循环中这样做时:
for (i in ncol(mammographic_masses)){
print(class(mammographic_masses[[i]]))
mammographic_masses <- mammographic_masses %>%
mutate(mammographic_masses[[,i]] = replace(mammographic_masses[[,i]], mammographic_masses[[,i]] == "na", NA)) %>%
mutate(mammographic_masses[[,i]] = replace(mammographic_masses[[,i]], mammographic_masses[[,i]] == "N/A", NA))
}
我收到一个错误:
Error: unexpected '=' in:
" mammographic_masses <- mammographic_masses %>%
mutate(mammographic_masses[[,i]] ="
> mutate(mammographic_masses[[,i]] = replace(mammographic_masses[[,i]], mammographic_masses[[,i]] == "N/A", NA))
Error: unexpected '=' in " mutate(mammographic_masses[[,i]] ="
> }
Error: unexpected '}' in "}"
我也正在阅读其他方法,例如套用等,但我想不出一种按列循环的方法
代替循环,使用mutate_all
。
library(dplyr)
mammographic_masses %>%
mutate_all(function(x) {is.na(x) <- x %in% c("na", "N/A"); x})
# V1 V2 V3 V4
#1 d b <NA> c
#2 d b <NA> <NA>
#3 <NA> <NA> d b
#4 a <NA> <NA> b
#5 a b d <NA>
#6 d c b c
#7 b b d <NA>
#8 <NA> <NA> <NA> d
#9 a d d <NA>
#10 <NA> b d <NA>
测试数据创建代码
set.seed(2020)
n <- 10
mammographic_masses <- replicate(4, sample(c(letters[1:4], "na", "N/A"), n, TRUE))
mammographic_masses <- as.data.frame(mammographic_masses)