stringr

问题描述 投票:0回答:1

我正在用R解决这个问题

states = c("Masassachusetts, USA", "Buffalo, NY", "Flint, MI","California, USA", "Idaho, USA", "Orlando, FL"...)

我需要得到一个新的矢量,它将会像

state_name = (Massachusetts, NY, MI, California, Idaho, FL..)

我试着用 stringr 但不知道如何打印州名或缩写。

ifelse(str_detect(states," [A-Z][A-Z]")),#need to figure out what to do write to get the abbreviated state
 ifelse(str_deteCt(states,"[U][S][A]))#  code to print the full name state
,other))
r stringr
1个回答
1
投票
library(stringr)

分隔符是用"\, "而不是"\,",假设字符串在逗号后有一个空格(USA前的前导空格)。

state_name <- ifelse(word(states,2,sep = "\\, ")=="USA", word(states,1,sep = "\\,"), 
                 word(states,2,sep = "\\, "))

当USA前没有空格时,上面的代码将不稳健。在这种情况下,它将打印 "USA"。下面给出的代码可以在有空格和没有空格的字符串混合的情况下工作。

state_name <- ifelse(word(states, -1) =="USA", word(states,1,sep = "\\,"), 
                     word(states,2,sep = "\\, "))

请注意,考虑到前面的空格,第三个函数的分隔符仍然是""/\"。你也可以把它改为"\",然后再从输出中去掉空白。

state_name <- ifelse(word(states, -1) =="USA", word(states,1,sep = "\\,"), 
                     word(states,2,sep = "\\,"))

state_name <- trimws(state_name, which = "l")

state_name
#[1] "Masassachusetts" "NY"              "MI"              "California"      "Idaho"           "FL" 

编辑。为了回答评论中关于有NA的问题,当数据集是这样的,

states <- c("Masassachusetts, USA", "SUNNY Buffalo" 
           "Buffalo, NY", "Flint, MI","California, USA", 
           "Idaho, USA", "Orlando, FL", "Shanghai, China")

在这种情况下,我的建议是有一个州名的列表和它们的缩写(附在答案末尾)。

%in% 可以用来确认这些字符串是否是州名。

library(dplyr)

state_name <- ifelse(word(states, -1) =="USA", word(states,1,sep = "\\,"),
                     ifelse(word(states, 2, sep = "\\, ") %in% stl, 
                            word(states, 2, sep = "\\, "), NA))

state_name
#[1] "Masassachusetts" NA                "NY"              "MI"              "California"      "Idaho"          
#[7] "FL"              NA

州名及其缩写的列表。

stl <- c("Alabama", "Alaska", "Arizona", "Arkansas", "California", "Colorado", 
         "Connecticut", "Delaware", "Florida", "Georgia", "Hawaii", "Idaho", 
         "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", 
         "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", 
         "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", 
         "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", 
         "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Rhode Island", 
         "South Carolina", "South Dakota", "Tennessee", "Texas", "Utah", 
         "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", 
         "Wyoming", "AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", 
         "GA", "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", 
         "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", "NM", 
         "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", 
         "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY")
© www.soinside.com 2019 - 2024. All rights reserved.