轻松替换R中的多个单词; str_replace_all给出两个对象长度不等的错误

问题描述 投票:0回答:1

我正在尝试使用str_replace_all用一个一致的字符串(即“主持人:”)替换许多不同的值(即“Mod”,“M2”,“M3”,“Interviewer”)。我正在使用多个不同的类别进行此操作,并且我希望避免必须编写每个唯一值,因为有很多。

因此,我制作了一个由我想要标准化的所有唯一值组成的读物,然后将其读入然后将每个列拉出(为了简单起见,只有5个但只显示了2个)以使它们成为向量:

speak_names <- read_csv("speak_names.csv")
speak_namesMisc <- dplyr::pull(speak_names, Misc)
speak_namesMod <- dplyr::pull(speak_names, Moderator)

对于替换值,我创建了一个与上面的向量相等长度的字符向量,因为我知道替换和模式必须是相等的长度:

Misc <- rep("Misc:", 2)
Mod <- rep("Moderator:", 28)

当我使用此代码运行Misc时,它工作得很好:

atas_clean$speaker <- str_replace_all(atas_clean$speaker, speak_namesMisc, Misc)

但是当我尝试相同的主持人版本时(即使我尝试在Misc之前运行它),我收到一条错误消息:

atas_clean$speaker <- str_replace_all(atas_clean$speaker, speak_namesMod, 
Mod)

Warning message:
In stri_replace_all_regex(string, pattern, fix_replacement(replacement),  :
longer object length is not a multiple of shorter object length

我不知道为什么我收到此错误,因为这个相同的函数产生TRUE:

identical(length(speak_namesMod), length(Mod))

我正在使用的数据帧是16,244行,如果这对模式或替换有任何影响。我陷入困境,并试图找出为什么这不起作用和/或另一个解决方案,不涉及输入向量中的每个字符元素。

谢谢!

r string character str-replace substitution
1个回答
0
投票
library('dplyr') # load the dplyr package
library('stringr') # load the stringr package

Here is a sample of my own dataset to answer your question

我的数据的dput()给出了

abc<-as.data.frame(
structure(list(Name = c("ME-9_ 005", "ME-9_ 004", "ME-9_ 003", 
                        "ME-9_ 002", "ME-9_ 001", "ME-9_ 000", "ME-8_ 005", "ME-8_ 004", 
                        "ME-8_ 003", "ME-8_ 002", "ME-8_ 001", "ME-8_ 000", "ME-7_ 005", 
                        "ME-7_ 004", "ME-7_ 003", "ME-7_ 002", "ME-7_ 001", "ME-7_ 000"
), Mg = c(0.411058647473409, 0.361611969040526, 0.435757145931429, 
          0.36656632349025, 0.312782034685408, 0.357913661160629, 0.414639893651842, 
          0.460992875568015, 0.554803107534663, 0.418743792959099, 0.499114614445091, 
          0.475374442706501, 0.564660334010035, 0.502678818989733, 0.417617035801997, 
          0.488463005872639, 0.484776757286094, 0.424850010858818),
Al = c(0.575667101719941,  0.586351493923602, 0.574053324307634, 0.628497798862674, 0.552234153060378, 
       0.580547408629286, 1.05746950789483, 1.07094531357244, 1.11340157804305, 
       1.03043684466386, 1.02899468191215, 1.07222457991059, 1.5276908007952, 
       1.66549994904359, 1.43287302441973, 1.37434198093964, 1.55835986529032, 
       1.66902429579112), 
Si = c(0.495188340689301, 0.513374456164654, 
       0.51809643007659, 0.569128515813393, 0.542590350648068, 0.516673370168739, 
       1.72437228079744, 1.59076392020817, 1.77327433861292, 1.76671780355934, 
       1.60625706442694, 1.92449284567535, 3.27248599245035, 3.23739024834759, 
       2.84115179036218, 2.51112086010829, 2.98829002803169, 2.93347114563903
), 
P = c(0.222881184902066, 0.258237982165306, 0.230235867213535, 
      0.262379290809071, 0.230438623604524, 0.238615393939999, 0.260241811918024, 
      0.238785817517132, 0.248589968755681, 0.248270048794532, 0.272489046130942, 
      0.266707140244041, 0.25935282543278, 0.258801008935983, 0.250692297246152, 
      0.246890941447243, 0.277698144829677, 0.274197618349091)), 
row.names = c(NA, 
              -18L), class = c("tbl_df", "tbl", "data.frame")))

here is how my data looked before cleaning

head(abc,10)

But for your specific question, you should do

abc$Name <- str_replace_all(
  abc$Name, # column we want to search
  c("001" = "","002" = "","003" = "","004" = "","005" = "","000" = "",
    "-" = " ","_" = "") # each string schould be matched with a replacement
)

here is how my data looked after cleaning

head(abc,10)

我希望这有帮助

© www.soinside.com 2019 - 2024. All rights reserved.