我无法获取过去的数据(stop_words)来分析文本挖掘中的文本

问题描述 投票:0回答:1

这是我第一次尝试文本挖掘,但我遇到了困难。这就是我到目前为止所做的:

library(tm)
library(tidytext)
library(dplyr)
library(ggplot2)

text1 <- c("Dear land of Guyana, of rivers and plains,
Made rich by the sunshine, and lush by the rains,
Set gem-like and fair between mounts and sea-
Your children salute you. dear land of the free.
Green land of Guyana, our heroes of yore,
Both bondsman and free, laid their bones on your shore,
This soil so they hallowed, and from them are we,
All sons of one mother, Guyana the free
Great land of Guyana, diverse though our strains,
We are born of their sacrifice, heirs of their pains,
And ours is the glory their eyes did not see –
One Land of six peoples, united and free.
Dear Land of Guyana, to you will we give
Our homage, our service each day that we live;
God guard you, great Mother, and make us to be
More worthy our heritage – land of the free.")

text1 
newtext1 <- data_frame(line = 1:16, text = text1)
newtext1

newtext1 %>%
  unnest_tokens(word, text)

data(stop_words)

newtext1 <- newtext1 %>%
  anti_join(newtext1)

newtext1 %>%
  count(newtext1, sort = TRUE)

我一直无法从

data(stop_words)
前进。预先感谢。

罗汉

r dplyr nlp text-mining tidy
1个回答
0
投票

您可以使用

read_lines
将每一行放入数据框中的单独行中(而不是在每行中重复整个文本)。在尝试
anti-join
停止词之前,请确保保存未嵌套的标记。

library(tidyverse)
library(tidytext)

text1 <- c("Dear land of Guyana, of rivers and plains,
Made rich by the sunshine, and lush by the rains,
Set gem-like and fair between mounts and sea-
Your children salute you. dear land of the free.
Green land of Guyana, our heroes of yore,
Both bondsman and free, laid their bones on your shore,
This soil so they hallowed, and from them are we,
All sons of one mother, Guyana the free
Great land of Guyana, diverse though our strains,
We are born of their sacrifice, heirs of their pains,
And ours is the glory their eyes did not see –
One Land of six peoples, united and free.
Dear Land of Guyana, to you will we give
Our homage, our service each day that we live;
God guard you, great Mother, and make us to be
More worthy our heritage – land of the free.")

new_text <- read_lines(text1) %>% 
  as_tibble() %>% 
  unnest_tokens(word, value) %>% 
  anti_join(stop_words)
#> Joining with `by = join_by(word)`

new_text %>% 
  count(word, sort = TRUE)
#> # A tibble: 46 × 2
#>    word         n
#>    <chr>    <int>
#>  1 land         7
#>  2 free         5
#>  3 guyana       5
#>  4 dear         3
#>  5 mother       2
#>  6 bondsman     1
#>  7 bones        1
#>  8 born         1
#>  9 children     1
#> 10 day          1
#> # ℹ 36 more rows

创建于 2024-04-14,使用 reprex v2.1.0

© www.soinside.com 2019 - 2024. All rights reserved.