从数据帧列中提取与字符串向量中的模式匹配的字符串

问题描述 投票:0回答:1

我有这个列数据集,其中一个基本上是引用和州名称,下面是一个示例: `

library(tidyverse)
df <- tibble(num = c(11,12,13), quote = c("In Ohio, there are plenty of hobos","Georgia, where the peaches are peachy","Oregon, no, we did not die of dysentery"))

我想创建一个提取特定状态的列。

这是我尝试过的:

states <- state.name
df <- df %>% mutate(state = na.omit(as.vector(str_match(quote,states)))[[1]])

哪个获取此错误:

Error in `mutate()`:
ℹ In argument: `state = na.omit(as.vector(str_match(quote, states)))[[1]]`.
Caused by error in `str_match()`:
! Can't recycle `string` (size 3) to match `pattern` (size 50).
r stringr
1个回答
0
投票

您需要将州名称折叠到一个公共字符串上,然后使用

str_extract
从中提取名称。

library(dplyr)
library(stringr)

df %>% 
  mutate(state = str_extract(quote,str_c(state.name, collapse = "|")))

#    num quote                                   state  
#  <dbl> <chr>                                   <chr>  
#1    11 In Ohio, there are plenty of hobos      Ohio   
#2    12 Georgia, where the peaches are peachy   Georgia
#3    13 Oregon, no, we did not die of dysentery Oregon 
© www.soinside.com 2019 - 2024. All rights reserved.