将多个单词列表中的元素替换为gsub（）变成语料库

Question

我有一个233个文档的语料库（ecb_corpus）和一个多单词列表（ecb_final）。我想将多词列表中的每个词组和词组替换为我的语料库。

这是我的多词列表：

1   euro_area
2   monetary_policy
3   price_stability
4   interest_rates
5   second_question
6   medium_term
7   first_question
8   central_banks
9   inflation_expectations
10  structural_reforms

我只是通过使用gsub在单个案例中做到了：

ecb_ready <- gsub(pattern = "interest rate", replacement= "interest_rates", ecb_corpus, ignore.case = TRUE, perl = FALSE, fixed = TRUE)

要获得我想要的结果，在[[pattern中应该有语料库的任何词（ecb_corpus），在replacement中我的多词列表（ecb_final）。我一直在尝试完全失败的循环（R相当陌生，但不幸的是还无法执行）。

有没有人可以帮助我循环播放？
非常感谢！

Answer 1

stringr::str_replace_all()可以直接执行此操作。这就是帮助文件试图与“在string，pattern和replacement上矢量化”的简短通讯。

这里我假设您的语料库存储在一个字符向量中，但是它也可以是一个字符列表。如果更复杂（例如，使用JSON ...），则可能需要先进行一些预处理，然后再将其输入str_replace_all()。

请注意，结果删除了输入元素的名称，但是恢复它们很容易。

library(tidyverse) (ecb_corpus <- c( doc_1 = c("lorem ipsum interest rate gobbledygook"), doc_2 = c("lorem dolor central bank foobar") )) #> doc_1 #> "lorem ipsum interest rate gobbledygook" #> doc_2 #> "lorem dolor central bank foobar" replacements <- c("euro_area", "monetary_policy", "price_stability", "interest_rates", "second_question", "medium_term", "first_question", "central_banks", "inflation_expectations", "structural_reforms") targets <- replacements %>% str_replace_all("_", " ") %>% str_remove("s$") (replacement_pairs <- replacements %>% set_names(targets)) #> euro area monetary policy price stability #> "euro_area" "monetary_policy" "price_stability" #> interest rate second question medium term #> "interest_rates" "second_question" "medium_term" #> first question central bank inflation expectation #> "first_question" "central_banks" "inflation_expectations" #> structural reform #> "structural_reforms" (ecb_ready <- ecb_corpus %>% str_replace_all(replacement_pairs)) #> [1] "lorem ipsum interest_rates gobbledygook" #> [2] "lorem dolor central_banks foobar"

由reprex package（v0.3.0）创建于2019-09-28

将多个单词列表中的元素替换为gsub（）变成语料库

问题描述投票：-1回答：1

1个回答

最新问题

将多个单词列表中的元素替换为gsub（）变成语料库

问题描述 投票：-1回答：1

1个回答

最新问题

问题描述投票：-1回答：1