如何用新单词替换 .txt 文件中重复出现的单词的多个匹配项?

问题描述 投票:0回答:1

所以我有一个包含多个数据帧的 .txt 文件。它看起来类似于以下示例:

$stim
rt 82289.8878539, 82294.8309221, 82299.3357436, 82304.1822179
category 1, 2, 1, 1
orient 263, 313, 266, 253

$stim
rt 82289.887000, 82294.8309333, 82299.3357444, 82304.1822179
category 1, 2, 2, 2
orient 263, 310, 360, 250

这个 .txt 文件中的每个数据帧都对应一个文件名,我已将其存储在列表中。我想做的是将 $stim 替换为文件名。我构建了一个 for 循环来执行此操作,如下所示:

library(stringr)
text <- readLines("filepath")

wrong_words <-  ("$stim")
new_words <- (filenames)
for (i in seq_along(wrong_words)) {
  text <- str_replace_all(text, wrong_words[i], new_words[i])
}
text
writeLines(text, con="filepath")

但是,当我运行此循环时,没有任何变化,并且我得到与以前完全相同的 .txt 文件。我做错了什么?

r for-loop replace text-files
1个回答
0
投票

你想要的是

grep
这里并循环匹配。
stringi::stri_replace_all_regex
等而不是用字典替换,即所有“$stim”将被替换为相同的新单词。我们可以将其包装在一个函数中,但仍然包含字典功能。为了避免特征蔓延,我们省略了
readLines
/
writeLines

> batch_replace <- \(text, wrong_words, new_words) {
+   len0 <- length(wrong_words)
+   len1 <- length(new_words)
+   len2 <- length(pos <- grep(paste(wrong_words, collapse='|'), text))
+   if (len0 != 1L && len0 != len1) {
+     stop(sprintf("Counts of wrong_words (%s) must be 1 or match with new_words (%s).", 
+                  len0, len1))
+   }
+   if (len2 == 0L) {
+     message('No matches found.')
+     return(text)
+   } 
+   else if (len0 == 1L) {
+     pos <- grep(wrong_words, text)
+     return(replace(text, pos, new_words))
+   } 
+   else if (len0 == len2) {
+     Map(\(ps, nw) text[ps] <<- nw, pos, new_words)
+     return(text)
+   } else {
+     stop(sprintf("Counts of found words (%s) and new_words (%s) must match.", 
+                  len1, len2))
+   }
+ }

使用方法

> text <- readLines('foo.txt')
> wrong_words <- c("\\$stim")
> new_words <- c("## WORD1", "## WORD2")
> batch_replace(text, wrong_words, new_words)
 [1] "## WORD1"                                                     
 [2] "rt 82289.8878539, 82294.8309221, 82299.3357436, 82304.1822179"
 [3] "category 1, 2, 1, 1"                                          
 [4] "orient 263, 313, 266, 253"                                    
 [5] ""                                                             
 [6] "## WORD2"                                                     
 [7] "rt 82289.887000, 82294.8309333, 82299.3357444, 82304.1822179" 
 [8] "category 1, 2, 2, 2"                                          
 [9] "orient 263, 310, 360, 250"                                    
[10] ""  

您还可以提供字典。

> batch_replace(text, c("\\$stim", "\\$stim"), c("## WORD1", "## WORD2"))

完整方法

> readLines('foo.txt') |> 
+   batch_replace(c("\\$stim"), c("## WORD1", "## WORD2")) |> 
+   writeLines('foo2.txt')
>
> readLines('foo2.txt')  ## check
 [1] "## WORD1"                                                     
 [2] "rt 82289.8878539, 82294.8309221, 82299.3357436, 82304.1822179"
 [3] "category 1, 2, 1, 1"                                          
 [4] "orient 263, 313, 266, 253"                                    
 [5] ""                                                             
 [6] "## WORD2"                                                     
 [7] "rt 82289.887000, 82294.8309333, 82299.3357444, 82304.1822179" 
 [8] "category 1, 2, 2, 2"                                          
 [9] "orient 263, 310, 360, 250"                                    
[10] ""                             
© www.soinside.com 2019 - 2024. All rights reserved.