我有以下名为cars
的数据框
Brand year mpg reputation Luxury
Honda 2010 30 8.5 0.5
Honda 2011 28 8.5 0.6
Dodge 2010 20 6.5 0.6
Dodge 2011 23 7.0 0.7
Mercedes 2010 22 9.5 NA
Mercedes 2011 25 9.0 NA
我想用0.9 and 1.0
之间随机生成的实数替换NA
我正在尝试以下,但它正在用数字0.9替换NA
cars[is.na(cars)] <- sample(0.9:1, sum(is.na(cars)),replace=TRUE)
数据表看起来像这样:
Brand year mpg reputation Luxury
Honda 2010 30 8.5 0.5
Honda 2011 28 8.5 0.6
Dodge 2010 20 6.5 0.6
Dodge 2011 23 7.0 0.7
Mercedes 2010 22 9.5 *0.91*
Mercedes 2011 25 9.0 *0.97*
数据结构代码:
cars <- structure(list(Brand = c("Honda","Honda", "Dodge", "Dodge","Mercedes","Mercedes"),
year = c(2010L, 2011L,2010L, 2011L, 2010L, 2011L),
mpg = c(30L, 28L, 20L, 23L, 22L, 25L), reputation = c(8.5, 8.5, 6.5, 7L, 9.5, 9.5), Luxury = c(5L, 5.5, 6L, 6.5)),
class = "data.frame", row.names = c(NA, -4L))
使用runif
而不是sample
:
cars[is.na(cars)] <- runif(sum(is.na(cars)), min = 0.9, max = 1)
那是因为0.9:1
只给你一个0.9的数字。尝试,
0.9:1
#[1] 0.9
因此,它将这些数字替换为0.9。
假设您需要序列为
vals <- seq(0.9, 1, 0.01)
vals
#[1] 0.90 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
现在,我们可以sample
这个序列
df[is.na(df)] <- sample(vals, sum(is.na(df)), replace = TRUE)
df
# Brand year mpg reputation Luxury
#1 Honda 2010 30 8.5 5.00
#2 Honda 2011 28 8.5 5.50
#3 Dodge 2010 20 6.5 6.00
#4 Dodge 2011 23 7.0 6.50
#5 Mercedes 2010 22 9.5 0.91
#6 Mercedes 2011 25 9.0 0.92
数据
df <- structure(list(Brand = structure(c(2L, 2L, 1L, 1L, 3L, 3L),
.Label = c("Dodge",
"Honda", "Mercedes"), class = "factor"), year = c(2010L, 2011L,
2010L, 2011L, 2010L, 2011L), mpg = c(30L, 28L, 20L, 23L, 22L,
25L), reputation = c(8.5, 8.5, 6.5, 7, 9.5, 9), Luxury = c(5,
5.5, 6, 6.5, NA, NA)), class = "data.frame", row.names = c(NA, -6L))