如何从字符向量中的不同文本中删除具有不同长度*的文本块?

问题描述 投票:1回答:2

我有一个字符向量,其中有231个文档(231行乘一列)。每个文档的开头都有大量文本,我想从231个文档中删除每个文本。问题在于此块的长度在文档之间是不同的。

让我们举一个例子,其中每个文本都有以下开头:我希望删除的文本。:

我尝试了以下选项,但没有结果:

x <- c("Text that I wish to remove because I don't like it. I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out.", 
  "Text that I wish to remove. I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out.", 
  "Text that I wish to remove and I will remove it because some great data analyst will help me solve it. I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out.", 
  "Text that I wish to remove and who know whether I manage to make it work, it could be and it could not be. I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out.")

如果要删除的块相等,我将按照某人在上一篇文章中建议的那样简单地执行以下操作:

strings <- substring(x, 60)

但是,由于任何文本的长度不同,我现在陷入困境。

理想情况下,我想获得:

[1] "I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out."
[2] "I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out."
[3] "I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out."
[4] "I really want to remove the text but I cannot do it. I hope that stackoverflow will sort it out."

有人可以帮我吗?

非常感谢!

r regex
2个回答
2
投票

您可以使用以下代码

  gsub("^.+\\. ", "", x)

[1] "I hope that stackoverflow will sort it out."
[2] "I hope that stackoverflow will sort it out."
[3] "I hope that stackoverflow will sort it out."
[4] "I hope that stackoverflow will sort it out."

0
投票

" ,"上分割,然后得到最后一句话:

sapply(strsplit(x, ". ", fixed = TRUE), tail, n = 1)
# [1] "I hope that stackoverflow will sort it out."
# [2] "I hope that stackoverflow will sort it out."
# [3] "I hope that stackoverflow will sort it out."
# [4] "I hope that stackoverflow will sort it out."
© www.soinside.com 2019 - 2024. All rights reserved.