
问题描述 投票:2回答:1


twitterdata_cleaning$bio = gsub('air force','airforce',twitterdata_cleaning$bio)

上面的代码行将"proud member of the air force"转换为"proud member of the airforce"。我已经能够成功使用数十个两个单词的短语来做到这一点。



    **column1**               **column2*
   san francisco              sanfrancisco
     new york                   newyork
     las vegas                  lasvegas
     san diego                  sandiego
   new hampshire              newhampshire
      good bye                   goodbye
      air force                  airforce
     video game                 videogame
    high school                  school
    middle school                school
    elementary school            school

我想在公式中使用gsub命令,该公式在数据帧中搜索column 1中的所有术语,并使用类似以下内容的方式将它们转换为column 2中的术语:

twitterdata_df$tweet = gsub('textfile$column1','textfile$columnb',twitterdata_df$tweet)


i love sanfrancisco
can not wait to go to newyork
what happens in lasvegas stays there
at the beach in sandiego
can beat the autumn leave in newhampshire
so done with all the drama goodbye
proud member of the airforce
love this videogame so much
playing at the school tonight 
so sick of school
school was the best and i miss it


r text text-files gsub data-cleaning




df <- data.frame(old = c("five", "six", "seven"),
                 new = as.character(5:7),
                 stringsAsFactors = FALSE)

text <- c("I am a vector with numbers six and other text five",
          "another vector seven six text five")

str_replace_all(text, setNames(df$new, df$old))


[1] "I am a vector with numbers 6 and other text 5" "another vector 7 6 text 5" 




textfile <- read.csv(textConnection("column1, column2
san francisco, sanfrancisco
new york, newyork
las vegas, lasvegas
san diego, sandiego
new hampshire, newhampshire
good bye, goodbye
air force, airforce
video game, videogame
high school, school
middle school, school
elementary school, school"), stringsAsFactors = FALSE)


twitterdata_df <- data.frame(id = 1:11)
twitterdata_df$tweet <- c("i love san francisco",
                          "can not wait to go to new york",
                          "what happens in las vegas stays there",
                          "at the beach in san diego",
                          "can beat the autumn leave in new hampshire",
                          "so done with all the drama goodbye",
                          "proud member of the air force",
                          "love this video game so much",
                          "playing at the high school tonight",
                          "so sick of middle school",
                          "elementary school was the best and i miss it")


twitterdata_df$tweet2 <- str_replace_all(twitterdata_df$tweet, setNames(textfile$column2, textfile$column1))



   id                                        tweet                                     tweet2
1   1                         i love san francisco                       i love  sanfrancisco
2   2               can not wait to go to new york             can not wait to go to  newyork
3   3        what happens in las vegas stays there      what happens in  lasvegas stays there
4   4                    at the beach in san diego                  at the beach in  sandiego
5   5   can beat the autumn leave in new hampshire can beat the autumn leave in  newhampshire
6   6           so done with all the drama goodbye         so done with all the drama goodbye
7   7                proud member of the air force              proud member of the  airforce
8   8                 love this video game so much               love this  videogame so much
9   9           playing at the high school tonight             playing at the  school tonight
10 10                     so sick of middle school                         so sick of  school
11 11 elementary school was the best and i miss it          school was the best and i miss it
© www.soinside.com 2019 - 2024. All rights reserved.