我有一个字符向量,其中有231个文档(231行乘一列)。每个文档的开头都有大量文本,我想从231个文档中删除每个文本。问题在于此块的长度在文档之间是不同的。
让我们举一个例子,其中每个文本都有以下开头:我希望删除的文本。:
我尝试了以下选项,但没有结果:
x <- c("Text that I wish to remove because I don't like it. I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out.",
"Text that I wish to remove. I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out.",
"Text that I wish to remove and I will remove it because some great data analyst will help me solve it. I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out.",
"Text that I wish to remove and who know whether I manage to make it work, it could be and it could not be. I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out.")
如果要删除的块相等,我将按照某人在上一篇文章中建议的那样简单地执行以下操作:
strings <- substring(x, 60)
但是,由于任何文本的长度不同,我现在陷入困境。
理想情况下,我想获得:
[1] "I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out."
[2] "I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out."
[3] "I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out."
[4] "I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out."
有人可以帮我吗?
非常感谢!
您可以使用以下代码
gsub("^.+\\. ", "", x)
[1] "I hope that stackoverflow will sort it out."
[2] "I hope that stackoverflow will sort it out."
[3] "I hope that stackoverflow will sort it out."
[4] "I hope that stackoverflow will sort it out."
在" ,"
上分割,然后得到最后一句话:
sapply(strsplit(x, ". ", fixed = TRUE), tail, n = 1)
# [1] "I hope that stackoverflow will sort it out."
# [2] "I hope that stackoverflow will sort it out."
# [3] "I hope that stackoverflow will sort it out."
# [4] "I hope that stackoverflow will sort it out."