如何将段落拆分为R语言中的行[关闭]

问题描述 投票:-1回答:1

我是R的初学者,学习基本的东西尝试检索包含一些特定单词的句子我使用readLines()读取文件数据并使用grep尝试检索一些特定的句子但是检索的数据是包含特定单词的完整段落

x<- readLines(filepath)
grep("processor",x,value=TRUE,ignore.case=TRUE)

如果我的单词是“处理器”,则检索包含处理器的完整段落

输出:第五代Corei3处理器,8GB内存,2GB图形处理器,1TB硬盘,15.6英寸720p高清抗反射显示器,这款笔记本电脑是这一领域的优质产品。来自惠普等品牌,为您提供开展业务时可能需要的身份价值和企业服务。

但我只想要一句话,即第五代Corei3处理器,8GB内存,2GB图形处理器,1TB硬盘,15.6英寸720p高清抗反射显示器,这款笔记本电脑是这一领域的优质产品。

如何将段落分割成行。所以我只能得到包含特定单词的句子,并且非常好用

r
1个回答
0
投票

quanteda包可用于将文本输入标记为句子。将文档分成句子后,可以使用grep()将包含文字处理器的句子提取到矢量中。我们将使用原始文本文件在quanteda中解释为2个文档,并提取包含文字处理器的句子。

rawText <- "A 5th Gen Core i3 processor, 8GB RAM, 2GB graphics processor, 1TB HDD, 15.6-inch 720p HD antireflective display, this laptop is a premium offering in this segment. Coming from a brand like HP this offers you the status value and corporate services that you might need while conducting business.
Intel® Celeron® processor N3160. Entry-level quad-core processor for general e-mail, Internet and productivity tasks. 4GB system memory for basic multitasking: Adequate high-bandwidth RAM to smoothly run multiple applications and browser tabs all at once."

library(quanteda)
sentences <- tokens(rawText,"sentence")
unlist(lapply(sentences,function(x){
     grep("processor",x,value=TRUE)
}))

......和输出:

> unlist(lapply(sentences,function(x){
+      grep("processor",x,value=TRUE)
+ }))


text11 

"A 5th Gen Core i3 processor, 8GB RAM, 2GB graphics processor, 1TB HDD, 15.6-inch 720p HD antireflective display, this laptop is a premium offering in this segment." 


text12 


"Intel® Celeron® processor N3160." 


text13 


"Entry-level quad-core processor for general e-mail, Internet and productivity tasks." 
> 

另一种方法是使用stringi::str_detect_fixed()来查找字符串。

# stringi::stri_detect_fixed() approach 
library(stringi)
unlist(lapply(sentences,function(x){
      x[stri_detect_fixed(x,"processor")]
}))
© www.soinside.com 2019 - 2024. All rights reserved.