我有一个单列数据框,其中每一行都是一条语句。语句主要是字母字符,但也有一些数字字符。我试图找到所有数字字符并将其替换为相应的字母字符。
基本上,我想从这里开始
"I looked at the watermelons around 12 today"
"There is a dog on the bench"
"the year is 2017"
"I am not hungry"
"He turned 1 today"
(或类似的东西)
"I looked at the watermelons around twelve today"
"There is a dog on the bench"
"the year is two thousand seventeen"
"I am not hungry"
"He turned one today"
我熟悉一些将数字转化为单词的函数,例如xfun包中的numbers_to_words函数,但是我不知道如何针对整个数据帧系统地执行此操作。
这里是stringr
和english
软件包的一种方法。
library(stringr)
library(english)
data<- c("I looked at the watermelons around 12 today", "There is a dog on the bench", "the year is 2017", "I am not hungry", "He turned 1 today")
Replacement <- lapply(str_extract_all(data,"[0-9]+"),function(x){
as.character(as.english(as.numeric(x)))})
sapply(seq_along(data),
function(i){
ifelse(grepl('[0-9]+',data[i]),
str_replace_all(data[i],"[0-9]+",Replacement[[i]]),
data[i])})
[1] "I looked at the watermelons around twelve today" "There is a dog on the bench"
[3] "the year is two thousand seventeen" "I am not hungry"
[5] "He turned one today"
实际上,我不知道一个简单的功能或类似的功能,但是我可能为您带来一些不好的解决方案:
library(xfun)
a <- "I looked at the watermelons around 12 today"
y <- numeric(nchar(a))
for(i in 1:nchar(a))
{
y[i]<-as.numeric(substr(a,i,i))
}
x <- n2w(as.numeric(paste(na.omit(y), collapse="")))
z <- which(y != "NA")
paste(c(substr(a, 1, z[1]-1), x, substr(a, z[length(z)] + 1, nchar(a))), collapse = "")
并且目前仅适用于一个句子中的一个数字