我正在尝试用 foo$Birth.Year 的平均值回填 NA,然后计算年龄列:
foo$Birth.Year[is.na(foo$Birth.Year)]<-round(mean(foo$Birth.Year,na.rm=TRUE), digits = 0)
foo$Year <- as.numeric(format(as.Date(foo$Start.Time), "%Y"))
which(is.na(foo)) # no values returned
foo$Age <- foo$Year - foo$Birth.Year
Error in `$<-.data.frame`(`*tmp*`, Age, value = numeric(0)): replacement has 0 rows, data has 89051
Traceback:
1. `$<-`(`*tmp*`, Age, value = numeric(0))
2. `$<-.data.frame`(`*tmp*`, Age, value = numeric(0))
3. stop(sprintf(ngettext(N, "replacement has %d row, data has %d",
. "replacement has %d rows, data has %d"), N, nrows), domain = NA)
table(foo$Age)
15 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
1 2 3 12 32 43 58 155 217 324 357 384 388 362 360 350 280 304 249 241
glimpse(foo)
Observations: 42
Variables: 4
$ Start.Time <dttm> 2017-06-23 15:09:32, 2017-05-25 18:19:03…
$ Birth.Year <dbl> 1992, 1992, 1981, 1986, 1975, 1990, 1983, 1981…
$ Year <dbl> 2017, 2017, 2017, 2017, 2017, 2017, 2017, 2017…
$ Age <dbl> 25, 25, 36, 31, 42, 27, 34, 36, 33, 38, 24, 53…
我正在获取年龄数据,但我也收到了该错误。
错误会停止脚本并且必须被跳过。有什么想法吗?
由于您没有提供输入(并且我只从
glimpse()
调用中获得了两行完整的行,因此我创建了自己的输入。您可以根据您自己的示例调整下面的答案:
# We set a seed, for the results to be reproducible
set.seed(0)
library(tidyverse)
df <- tibble(
# get 100 random dates between today and 100 years ago
dob = sample(seq.Date(today() - years(100), today(), by = "day"), 100, replace = TRUE))
# add in some NAs in random places to make it more similar to OP's data
df$dob[sample(1:100, 10)] <- NA
# get the average date, excluding NAs (unsure how much of a good idea this is statistically, but I digress)
average_date <- df |>
filter(!is.na(dob)) |>
pull(dob) |>
mean()
df |>
mutate(dob = if_else(is.na(dob), average_date, dob),
Age = interval(dob, today()) %/% years(1)) # %/% is integer division