如何在电子表格中提取带有特定文本的句子？

Question

我有一个看起来像这样的电子表格。我想保留文件列，但只提取带有“印度”一词的句子。有没有办法做到这一点？更喜欢使用 KNIME 或 R，但对任何解决方案都很满意。

只提取带有“印度”的句子，但保留文件栏

Answer 1

这可以使用

dplyr

包中的

str_detect()

和

stringr

来实现。请注意，以下代码中的“India | india”将捕获“India”和语法错误的“india”（如果存在）：

library(dplyr)
library(stringr)

# Some example data
df <- data.frame(File = c(1356, 1548, 1600, 1601),
                 Text = c("Digital India is an initiative by the Government of India to ensure that Government services are made available to citizens electronically by improving online infrastructure and by i",
                          "The textile industry in India traditionally, after agriculture, is the only industry that has generated huge employment for both skilled and unskilled labour. The textile industry conti",
                          "Some other text",
                          "This string has india without a capital I."))

df <- df %>%
  filter(str_detect(Text, "India | india"))

df
#   File   Text
# 1 1356   Digital India is an initiative by the Government of India to ensure that Government services are made available to citizens electronically by improving online infrastructure and by i
# 2 1548   The textile industry in India traditionally, after agriculture, is the only industry that has generated huge employment for both skilled and unskilled labour. The textile industry conti
# 3 1601   This string has india without a capital I.

如何在电子表格中提取带有特定文本的句子？

问题描述投票：0回答：1

1个回答

最新问题

如何在电子表格中提取带有特定文本的句子？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1