R 中的换行符删除

Question

我正在遵循本教程的代码https://www.youtube.com/watch?v=JyMBwydhYR8

All_Files <- list.files(pattern = "pdf$")
All_opinions <- lapply(All_Files, pdf_text)

document <-  Corpus(VectorSource(All_opinions))

social_sentences <- document %>%
    tolower() %>%
    paste0(collapse= " ") %>%
    stringr::str_squish() %>%
    stringr::str_split(fixed(".")) %>%
    unlist() %>%
    tm::removePunctuation()

但是创建向量“social_sentences”后，换行符并未被删除。

相反，删除标点符号后，只剩下“n”字母，它与更接近的单词连接起来。

即使在教程中，也可以看到“hilln”一词。

“str_squish()”函数已经是代码的一部分，我什至改变了它的位置，看看它是否解决了问题。我还尝试了“gsub()”和“str_replace_all()”函数。

Answer 1

确实如此，视频中出现了额外的n个符号。但事实上，代码完全删除了

\n

。

试试这个：

text <- "\n\nString with excess,  trailing and: leading! white   space\n\n"
text %>%
  tolower() %>%
  paste0(collapse= " ") %>%
  stringr::str_split(fixed(".")) %>%
  unlist() %>%
  tm::removePunctuation() %>%  
  stringr::str_squish()

结果是：

[1] "string with excess trailing and leading white space"

R 中的换行符删除

问题描述投票：0回答：1

1个回答

最新问题

R 中的换行符删除

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1