我有这样的数据:
sample_data <- data.frame(
txtnumbers = c("text stuff +300.5","other stuff 40+ more stuff","text here -30 here too","30- text here","50+","stuff here 500+","400.5-" ),
stringsAsFactors = F
)
我想提取后面带有 + 符号的数字,并将值插入到新列中,忽略文本的其余部分,并在没有数字后跟 + 的情况下返回 NA:
desired_data <- data.frame(
txtnumbers = c("text stuff +300.5","other stuff 40+ more stuff","text here -30 here too","30- text here","50+","stuff here 500+","400.5-" ),
desired_col = c(NA,40,NA,NA,50,500,NA),
stringsAsFactors = F
)
有人可以帮我提供一个有效的功能来做到这一点吗?我可以使用 parse_numeric 解析数字,但仅返回数字后跟 + 会给我带来问题。谢谢!
这是使用
stringr::str_extract
的一种选择
stringr::str_extract(sample_data$txtnumbers, "(\\d+)\\+", group = 1)
#[1] NA "40" NA NA "50" "500" NA
现在,它们被提取为字符串。您可以将
as.integer
包裹起来将它们变成数字。