我想代表对话的小标题,然后将其转换为.txt,可以在文本编辑器中对其进行手动编辑,然后返回小标题进行处理。
[我遇到的主要挑战是分离文本块,以便在编辑后保留“ Speaker”字样将其重新导入为类似格式。
速度很重要,因为文件量大且每个文本段的长度都很大。
这是输入小标题:
tibble::tribble(
~word, ~speakerTag,
"been", 1L,
"going", 1L,
"on", 1L,
"and", 1L,
"what", 1L,
"your", 1L,
"goals", 1L,
"are.", 1L,
"Yeah,", 2L,
"so", 2L,
"so", 2L,
"John", 2L,
"has", 2L,
"15", 2L
)
这是.txt中所需的输出:
###Speaker 1###
been going on and what your goals are.
###Speaker 2###
Yeah, so so John has 15
这里是手动更正错误后的期望回报:
~word, ~speakerTag,
"been", 1L,
"going", 1L,
"on", 1L,
"and", 1L,
"what", 1L,
"your", 1L,
"goals", 1L,
"in", 1L,
"r", 1L,
"Yeah,", 2L,
"so", 2L,
"so", 2L,
"John", 2L,
"hates", 2L,
"50", 2L
)
一种方法是在每个"\n"
的末尾添加speakerTag
library(dplyr)
df1 <- df %>%
group_by(speakerTag) %>%
mutate(word = replace(word, n(), paste0(last(word), "\n\n")))
并将其写入文本文件。
writeLines(paste(df1$word, collapse = " "), 'temp.txt')
它看起来像:
cat(paste(df1$word, collapse = " "))
#been going on and what your goals are.
# Yeah, so so John has 15