我正在尝试向量化一个函数以在
dplyr::mutate
中使用。对于我的一生,我无法让它发挥作用。这就是我一直在做的事情:
str_to_seq <- Vectorize(function(x) {
# This function converts text format year ranges (e.g. "1970 - 1979") to
# numeric ranges. Handily works with single values and edge cases such as
# "- 1920".
res <- stringr::str_extract_all(x, "\\d+") %>%
unlist() %>%
{seq(dplyr::first(.), dplyr::last(.))}
return(res)
}, vectorize.args = "x", SIMPLIFY = F)
year <- c(1970, 1980, 1990, 2000, 2010, 2020)
agegroup <- "1950 - 1959"
testt <- expand.grid(agegroup = agegroup, year = year, stringsAsFactors = F)
testt %>%
as_tibble() %>%
dplyr::mutate(
yearminus50 = year - 50,
statement = all(yearminus50 >= str_to_seq(agegroup)))
statement
列失败并显示错误消息
Error in `dplyr::mutate()`:
ℹ In argument: `statement = all(yearminus50 >= str_to_seq(agegroup))`.
Caused by error:
! 'list' object cannot be coerced to type 'double'
Run `rlang::last_trace()` to see where the error occurred.
我无法让我的函数
str_to_seq
来创建普通向量。输出似乎是一个列表。
statement
应该是 c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE)
,正如我们通过这个暴力代码所看到的:
all(year[1] - 50 >= unlist(str_to_seq(agegroup)[[1]]))
all(year[2] - 50 >= unlist(str_to_seq(agegroup)[[1]]))
all(year[3] - 50 >= unlist(str_to_seq(agegroup)[[1]]))
all(year[4] - 50 >= unlist(str_to_seq(agegroup)[[1]]))
all(year[5] - 50 >= unlist(str_to_seq(agegroup)[[1]]))
all(year[6] - 50 >= unlist(str_to_seq(agegroup)[[1]]))
如何改进我的代码以使
statement = all(yearminus50 >= str_to_seq(agegroup))
行正常工作?
非常感谢。
问题不在于你的函数,而是期望
all(..)
将与列表列一起使用。我们需要在从 sapply
返回时 str_to-seq
(或类似的)。
但是,如果这是您需要的“全部”,我们可以从
agegroup
中提取最大值并进行比较:
testt |>
mutate(
yearminus50 = year - 50,
statement = yearminus50 >=
sapply(strsplit(agegroup, "[- ]+"), function(z) max(as.integer(z)))
)
# agegroup year yearminus50 statement
# 1 1950 - 1959 1970 1920 FALSE
# 2 1950 - 1959 1980 1930 FALSE
# 3 1950 - 1959 1990 1940 FALSE
# 4 1950 - 1959 2000 1950 FALSE
# 5 1950 - 1959 2010 1960 TRUE
# 6 1950 - 1959 2020 1970 TRUE