使用purrr迭代替换数据帧列中的字符串

Question

我想使用purrr通过gsub()函数在数据框列上迭代运行多个字符串替换。

这是示例数据框：

df <- data.frame(Year = "2019",
                 Text = c(rep("a aa", 5), 
                          rep("a bb", 3), 
                          rep("a cc", 2)))

> df
   Year Text
1  2019 a aa
2  2019 a aa
3  2019 a aa
4  2019 a aa
5  2019 a aa
6  2019 a bb
7  2019 a bb
8  2019 a bb
9  2019 a cc
10 2019 a cc

这是我通常运行字符串替换以及所需结果的方式。

df$Text <- gsub("aa", "One", df$Text, fixed = T)
df$Text <- gsub("bb", "Two", df$Text, fixed = T)
df$Text <- gsub("cc", "Three", df$Text, fixed = T)

> df
   Year    Text
1  2019   a One
2  2019   a One
3  2019   a One
4  2019   a One
5  2019   a One
6  2019   a Two
7  2019   a Two
8  2019   a Two
9  2019 a Three
10 2019 a Three

但是随着字符串替换列表的增加，使用它是不现实的，因此我尝试使用purrr和patterns和replacements列表来迭代此类更改，但我仅设法产生了错误消息。我希望代码可以遍历text_pattern和text_replacement，并为每对模式/替换在gsub上运行df$Text。下面是示例以及错误消息。

text_pattern <- c("aa", "bb", "cc")
text_replacement <- c("One", "Two", "Three")

walk2(text_pattern, text_replacement, function(...){
  gsub(text_pattern, text_replacement, df$Text, fixed = F)
  }
)

Warning messages:
1: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'pattern' has length > 1 and only the first element will be used
2: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'replacement' has length > 1 and only the first element will be used
3: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'pattern' has length > 1 and only the first element will be used
4: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'replacement' has length > 1 and only the first element will be used
5: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'pattern' has length > 1 and only the first element will be used
6: In gsub(text_former, text_replace, df$Text, fixed = F) :
  argument 'replacement' has length > 1 and only the first element will be used

是否可以使用purrr中的功能来完成此操作？或者，我是否尝试使用错误的工具，是否应该使用其他功能？

Answer 1

我们可以使用reduce2

library(purrr)
library(stringr)
df$Text <- reduce2(text_pattern, text_replacement, ~ str_replace(..1, ..2, ..3), 
           .init = df$Text)
df$Text
#[1] "a One"   "a One"   "a One"   "a One"   "a One"   "a Two"   "a Two"   "a Two"   "a Three" "a Three"

或者不使用匿名函数调用

reduce2(text_pattern, text_replacement, .init = df$Text, str_replace)

Answer 2

@ akrun的答案很好，但是您可能会发现一些中间点，对于更好地理解purrr很有用。

walk2不会返回输出，它只会返回第一个输入向量。
从docs：
walk（）调用.f产生副作用，并返回输入.x。
与您正在执行的操作最接近的模拟是map2，但请参见下文，了解为什么这还不是您所需要的。
purrr内部的参数，例如map和walk，是指要迭代的向量的通用表示形式。
关于如何引用输入向量，您有两个选择。一种是命名function(...)中的参数。例如，使用function(x, y)，将产生无错误的输出：
```
map2(text_pattern, text_replacement, function(x, y){
  gsub(x, y, df$Text, fixed = F)
}
)  # switching to map2() because walk2 gives silent output
```
您还可以使用~语法，然后将输入的可迭代变量称为.x和.y：
```
map2(text_pattern, text_replacement, ~gsub(.x, .y, df$Text, fixed = F))
```
输出不是您期望的。
purrr方法，例如map和walk，在每个模式的整个矢量上循环。 1中两个代码段的输出如下：
```
[[1]]
 [1] "a One" "a One" "a One" "a One" "a One" "a bb"  "a bb"  "a bb"  "a cc"  "a cc" 

[[2]]
 [1] "a aa"  "a aa"  "a aa"  "a aa"  "a aa"  "a Two" "a Two" "a Two" "a cc"  "a cc" 

[[3]]
 [1] "a aa"    "a aa"    "a aa"    "a aa"    "a aa"    "a bb"    "a bb"    "a bb"   
 [9] "a Three" "a Three"  
```
因此即使固定语法，您仍然会得到一个由三个元素组成的列表，每个元素的内容是每对text_pattern-text_replacement对的替换操作的结果。仍然需要进行大量操作，以将所有元素与被替换的元素融合在一起。这就是@akrun向reduce2的转变。

使用purrr迭代替换数据帧列中的字符串

问题描述投票：1回答：2

2个回答

最新问题

使用purrr迭代替换数据帧列中的字符串

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2