是什么让这个文本中的文本小写,我怎么能把它变成大写?

问题描述 投票:1回答:1

我正在尝试在R中构建一个词云,但它只返回小写文本。

sheet <- read_excel('list_products.xls', skip = 4)
products <- c(sheet$Cod)
products <- Corpus(VectorSource(products))
c_words <- brewer.pal(8, 'Set2')
wordcloud(products, min.freq = 10, max.words = 30, scale = c(7,1), colors = c_words)

一旦我尝试在wordcloud函数之前添加以下代码,它就不起作用了:

products <- tm_map(products, content_transformer(toupper))

是什么让文本小写,我该怎么做才能把它变成大写?

r uppercase corpus word-cloud
1个回答
1
投票

好吧,你可以从这里看到:Make all words uppercase in Wordcloud in R,当你做TermDocumentMatrix(CORPUS)时,默认情况下单词会变成小写。实际上,如果你在没有参数trace(wordcloud)时执行freq,那么tdm <- tm::TermDocumentMatrix(corpus)就会被执行,所以你的话会变成小写。

你有两个选择来解决这个问题:包括单词和freq而不是语料库:

filePath <- "http://www.sthda.com/sthda/RDoc/example-files/martin-luther-king-i-have-a-dream-speech.txt" # I am using this text because you DID NOT PROVIDED A REPRODUCIBLE EXAMPLE
text <- readLines(filePath)
products <- Corpus(VectorSource(text))
products <- tm_map(products, toupper)
c_words <- brewer.pal(8, 'Set2')
tdm <- tm::TermDocumentMatrix(products, control = list(tolower = F))
freq_corpus <- slam::row_sums(tdm)
wordcloud(names(freq_corpus), freq_corpus, min.freq = 10, max.words = 30, scale = c(7,1), colors = c_words)

你会得到:

enter image description here

第二个选项是修改wordcloud:

首先你做trace(worcloud, edit=T),然后用第21行替换:

tdm <- tm::TermDocumentMatrix(corpus, control = list(tolower = F))

单击“保存并执行:

filePath <- "http://www.sthda.com/sthda/RDoc/example-files/martin-luther-king-i-have-a-dream-speech.txt"
text <- readLines(filePath)
products <- Corpus(VectorSource(text))
products <- tm_map(products, toupper)
c_words <- brewer.pal(8, 'Set2')
wordcloud(names(freq_corpus), freq_corpus, min.freq = 10, max.words = 30, scale = c(7,1), colors = c_words)

你会得到类似的东西:

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.