这是我第一次尝试文本挖掘,但我遇到了困难。这就是我到目前为止所做的:
library(tm)
library(tidytext)
library(dplyr)
library(ggplot2)
text1 <- c("Dear land of Guyana, of rivers and plains,
Made rich by the sunshine, and lush by the rains,
Set gem-like and fair between mounts and sea-
Your children salute you. dear land of the free.
Green land of Guyana, our heroes of yore,
Both bondsman and free, laid their bones on your shore,
This soil so they hallowed, and from them are we,
All sons of one mother, Guyana the free
Great land of Guyana, diverse though our strains,
We are born of their sacrifice, heirs of their pains,
And ours is the glory their eyes did not see –
One Land of six peoples, united and free.
Dear Land of Guyana, to you will we give
Our homage, our service each day that we live;
God guard you, great Mother, and make us to be
More worthy our heritage – land of the free.")
text1
newtext1 <- data_frame(line = 1:16, text = text1)
newtext1
newtext1 %>%
unnest_tokens(word, text)
data(stop_words)
newtext1 <- newtext1 %>%
anti_join(newtext1)
newtext1 %>%
count(newtext1, sort = TRUE)
我一直无法从
data(stop_words)
前进。预先感谢。
罗汉
您可以使用
read_lines
将每一行放入数据框中的单独行中(而不是在每行中重复整个文本)。在尝试 anti-join
停止词之前,请确保保存未嵌套的标记。
library(tidyverse)
library(tidytext)
text1 <- c("Dear land of Guyana, of rivers and plains,
Made rich by the sunshine, and lush by the rains,
Set gem-like and fair between mounts and sea-
Your children salute you. dear land of the free.
Green land of Guyana, our heroes of yore,
Both bondsman and free, laid their bones on your shore,
This soil so they hallowed, and from them are we,
All sons of one mother, Guyana the free
Great land of Guyana, diverse though our strains,
We are born of their sacrifice, heirs of their pains,
And ours is the glory their eyes did not see –
One Land of six peoples, united and free.
Dear Land of Guyana, to you will we give
Our homage, our service each day that we live;
God guard you, great Mother, and make us to be
More worthy our heritage – land of the free.")
new_text <- read_lines(text1) %>%
as_tibble() %>%
unnest_tokens(word, value) %>%
anti_join(stop_words)
#> Joining with `by = join_by(word)`
new_text %>%
count(word, sort = TRUE)
#> # A tibble: 46 × 2
#> word n
#> <chr> <int>
#> 1 land 7
#> 2 free 5
#> 3 guyana 5
#> 4 dear 3
#> 5 mother 2
#> 6 bondsman 1
#> 7 bones 1
#> 8 born 1
#> 9 children 1
#> 10 day 1
#> # ℹ 36 more rows
创建于 2024-04-14,使用 reprex v2.1.0