从阿拉伯字符串变量中提取数字(R)[重复]

问题描述 投票:0回答:0

我正在尝试从包含数值和阿拉伯文本的字符变量中提取数字,该变量存储在“薪水”列下。我使用 Python 找到了这个here 的解决方案,但我只使用 R.

我用下面的代码试过了,它在列只包含英文文本之前运行良好。我实际上是在尝试创建一个新列“salary_numeric”,它从“salary”列中提取所有数值。

df <-
df%>% mutate(salary_numeric=as.numeric(str_split_fixed(job_posts$salary,fixed(","),3)[,2]))

这里是一个数据示例:

dput(df[1:30,c(22,24)])

输出:

structure(list(salary = c("﷼4,000.00", "﷼4,000.00", "﷼5,000.00", 
"﷼12,000.00", "﷼5,000.00", "﷼4,500.00", "﷼100.00", " ", 
" ", " ", " ", "﷼6,000.00", " ", "﷼10,000.00", "﷼5,500.00", 
" ", "﷼25,688.33", " ", "﷼2,500.00", "﷼8,500.00", "﷼10,000.00", 
" ", "﷼4,000.00", "﷼5,000.00", " ", " ", "﷼4,500.00", "﷼10,000.00", 
" ", "﷼6,000.00"), salary_numeric = c(0, 0, 0, 0, 0, 500, NA, 
NA, NA, NA, NA, 0, NA, 0, 500, NA, 688.33, NA, 500, 500, 0, NA, 
0, 0, NA, NA, 500, 0, NA, 0)), row.names = c(NA, -30L), class = c("tbl_df", 
"tbl", "data.frame"))

我的代码在提取逗号后和点 (.) 后的值时效果很好,但由于某种原因我无法获取逗号前的值。例如,值“﷼25,688.33”被提取到列中作为“688.33”,但理想情况下应该是:

25688.33
r string dplyr tidyr stringr
© www.soinside.com 2019 - 2024. All rights reserved.