如何为r文本分析创建自定义的贸易/法律词典

Question

我计划按照“交易”与“法律”逻辑，用自己的自定义词典在R中进行文本分析，就像情感分析一样。

我在excel文件中具有字典的所有必需单词。看起来像这样：

> %  1 Trade 2 Law % business   1 exchange  1 industry  1 rule  2
> settlement    2 umpire    2 court 2 tribunal  2 lawsuit   2 bench 2
> courthouse    2 courtroom 2

为了将其转换为适合R的格式并将其应用于我的文本语料库，我必须采取什么步骤？

谢谢您的帮助！

Answer 1

创建具有两列的data.frame，并将其存储为rds，数据库对象或excel。因此，您可以在需要时随时加载它。

一旦您将数据保存在data.frame中，就可以使用联接/字典将其与文本语料库中的单词进行匹配。在评分data.frame中，我使用1和2表示扇区，但是您也可以使用单词。

请参见使用tidytext的示例，但请阅读情感分析并使用所需的任何软件包。

library(tidytext)
library(dplyr)
text_df <- data.frame(id = 1:2,
                      text = c("The business is in the mining industry and has a settlement.",
                               "The court ordered the business owner to settle the lawsuit."))

text_df %>% 
  unnest_tokens(word, text) %>% 
  inner_join(my_scoring_df)

Joining, by = "word"
  id       word sector
1  1   business      1
2  1   industry      1
3  1 settlement      2
4  2      court      2
5  2   business      1
6  2    lawsuit      2

数据：

my_scoring_df <- structure(list(word = c("business", "exchange", "industry", "rule", 
"settlement", "umpire", "court", "tribunal", "lawsuit", "bench", 
"courthouse", "courtroom"), sector = c(1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L)), class = "data.frame", row.names = c(NA, 
-12L))

如何为r文本分析创建自定义的贸易/法律词典

问题描述投票：0回答：1

1个回答

最新问题

如何为r文本分析创建自定义的贸易/法律词典

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1