匹配子串向量与字符串向量中的对应元素

问题描述 投票:0回答:1

我有两个字符向量,其中一个(下例中的 bowlers)包含另一个(全名)的子字符串。我想用匹配的全名替换 bowlers 中的每个元素。但是,并不是每个 full_names 条目都会出现在 bowlers 中,而且 bowlers 中的一些条目足够短,它们是完整的字符串,而不仅仅是子字符串。 bowlers 中的同一元素也可以有多个实例。

执行此操作的一种低效方法是创建一个匹配向量,但我希望能够将其应用于多个数据集。

示例数据:

bowlers <- c("Dilon Heylige", "Siddarth Mata", "Dilon Heylige", "Muhammad Sadi", "Adnesh Tondal", "Muhammad Sadi", "Timil Patel", "Siddarth Mata", "Timil Patel", "Marty Kain", "Muhammad Sadi", "Marty Kain", "Siddarth Mata", "Marty Kain", "Dilon Heylige", "Timil Patel", "Adnesh Tondal", "Muhammad Sadi", "Adnesh Tondal", "Dilon Heylige", "Neeraj Goel", "Sheryar Khan", "Neeraj Goel", "Sheryar Khan", "Hammad Azam", "Sheryar Khan", "Hammad Azam", "Vatsal Vaghel", "Hammad Azam", "Vatsal Vaghel", "Mohit Nataraj", "Zia Muhammad ", "Sheryar Khan", "Sami Aslam", "Neeraj Goel", "Zia Muhammad ", "Neeraj Goel", "Zia Muhammad ", "Vatsal Vaghel" "Zia Muhammad "

full_names <- c("Karan Chandel", "Sami Aslam", "Neeraj Goel", "Zia Muhammad Shahzad", "Shivam Mishra", "Hammad Azam", "Mohit Nataraj", "Aditya Srinivas", "Sheryar Khan", "Vatsal Vaghela", "Saideep Ganesh", "Dilon Heyliger", "Siddarth Matani", "Muhammad Sadiq", "Adnesh Tondale", "Timil Patel", "Marty Kain", "Mrunal Patel", "Sri Krishna Anantha Raju", "Abhinay Reddy", "Ravi Timbawala", "Devam Shrivastava")

我能得到的最接近的东西是使用

grepl(paste(full_names, collapse = "|"), bowlers)
,它提供了 TRUE 和 FALSE 值的向量。

r stringr grepl
1个回答
0
投票

使用

grep()
,用
bowlers
迭代
sapply()

sapply(bowlers, \(x) grep(x, full_names, value = TRUE))
         Dilon Heylige          Siddarth Mata          Dilon Heylige 
      "Dilon Heyliger"      "Siddarth Matani"       "Dilon Heyliger" 
         Muhammad Sadi          Adnesh Tondal          Muhammad Sadi 
      "Muhammad Sadiq"       "Adnesh Tondale"       "Muhammad Sadiq" 
           Timil Patel          Siddarth Mata            Timil Patel 
         "Timil Patel"      "Siddarth Matani"          "Timil Patel" 
            Marty Kain          Muhammad Sadi             Marty Kain 
          "Marty Kain"       "Muhammad Sadiq"           "Marty Kain" 
         Siddarth Mata             Marty Kain          Dilon Heylige 
     "Siddarth Matani"           "Marty Kain"       "Dilon Heyliger" 
           Timil Patel          Adnesh Tondal          Muhammad Sadi 
         "Timil Patel"       "Adnesh Tondale"       "Muhammad Sadiq" 
         Adnesh Tondal          Dilon Heylige            Neeraj Goel 
      "Adnesh Tondale"       "Dilon Heyliger"          "Neeraj Goel" 
          Sheryar Khan            Neeraj Goel           Sheryar Khan 
        "Sheryar Khan"          "Neeraj Goel"         "Sheryar Khan" 
           Hammad Azam           Sheryar Khan            Hammad Azam 
         "Hammad Azam"         "Sheryar Khan"          "Hammad Azam" 
         Vatsal Vaghel            Hammad Azam          Vatsal Vaghel 
      "Vatsal Vaghela"          "Hammad Azam"       "Vatsal Vaghela" 
         Mohit Nataraj          Zia Muhammad            Sheryar Khan 
       "Mohit Nataraj" "Zia Muhammad Shahzad"         "Sheryar Khan" 
            Sami Aslam            Neeraj Goel          Zia Muhammad  
          "Sami Aslam"          "Neeraj Goel" "Zia Muhammad Shahzad" 
           Neeraj Goel          Zia Muhammad           Vatsal Vaghel 
         "Neeraj Goel" "Zia Muhammad Shahzad"       "Vatsal Vaghela" 
         Zia Muhammad  
"Zia Muhammad Shahzad" 

(您可以使用

unname()
删除名称;我留下它们来演示解决方案。)

© www.soinside.com 2019 - 2024. All rights reserved.