根据字符串的开头替换所有字符串

问题描述 投票:0回答:3

我试图根据字符串的第一个字符更改字符串的元素。我需要这样做,而不是使用整个字符串,因为我经常抓取这些数据,并且字符串的后半部分经常发生变化。

这是我拥有的字符串的示例:

teams <- structure(c(3L, 14L, 4L, 5L, 15L, 10L, 7L, 2L, 12L, 13L, 9L, 
        8L, 1L, 11L, 6L, 21L, 29L, 17L, 16L, 30L, 23L, 19L, 20L, 25L, 
        22L, 26L, 28L, 27L, 24L, 18L), .Label = c("Dallas Mavericks (13)Â", 
                                                  "Denver Nuggets (8)Â", "Golden State Warriors (1)Â", "Houston Rockets (3)Â", 
                                                  "Los Angeles Clippers (4)Â", "Los Angeles Lakers (15)Â", "Memphis Grizzlies (7)Â", 
                                                  "Minnesota Timberwolves (12)Â", "New Orleans Pelicans (11)Â", 
                                                  "Oklahoma City Thunder (6)Â", "Phoenix Suns (14)Â", "Portland Trail Blazers (9)Â", 
                                                  "Sacramento Kings (10)Â", "San Antonio Spurs (2)Â", "Utah Jazz (5)Â", 
                                                  "Atlanta Hawks (4)Â", "Boston Celtics (3)Â", "Brooklyn Nets (15)Â", 
                                                  "Charlotte Hornets (7)Â", "Chicago Bulls (8)Â", "Cleveland Cavaliers (1)Â", 
                                                  "Detroit Pistons (10)Â", "Indiana Pacers (6)Â", "Miami Heat (14)Â", 
                                                  "Milwaukee Bucks (9)Â", "New York Knicks (11)Â", "Orlando Magic (13)Â", 
                                                  "Philadelphia 76ers (12)Â", "Toronto Raptors (2)Â", "Washington Wizards (5)Â"
        ), class = "factor")

我想将例如

"Golden State Warriors (1)Â"
更改为
GSW
,为此我已经尝试过:

teams <- gsub("Golden", "GSW", teams)

将该字符串转换为

"GSW State Warriors (1)Â"
仅捕获字符串元素的第一部分而不是整个字符串,我也尝试过使用
sub
,以及当我调用
?gsub
时找到的每个函数(例如
grep
grepl
),但显然我不太了解正则表达式。

r regex string
3个回答
3
投票

我们可以使用

sub
并捕获前 3 个字符作为一个组 (
.^(.{3})
),然后是其他字符 (
.*
),并将其替换为该捕获组的反向引用

sub("^(.{3}).*", "\\1", teams)

更新

根据新信息,我们使用正则表达式查找来匹配单词边界 (

[^A-Z]+
) 之后的大写字母后面的一个或多个非大写字母 (
(?<=\\b[A-Z])
),并将其替换为空格 (
""
)

gsub("(?<=\\b[A-Z])[^A-Z]+", "", teams, perl = TRUE)
#[1] "GSW" "SAS" "HR"  "LAC" "UJ"  "OCT" "MG"  "DN"  "PTB" "SK"  "NOP" 
#[12] "MT"  "DM"  "PS"  "LAL" "CC"  "TR"  "BC"  "AH"  "WW"  "IP"  "CH" 
#[23] "CB"  "MB"  "DP"  "NYK" "P"   "OM"  "MH"  "BN" 

1
投票

这是一个使用

stringi

的想法
library(stringi) 
sapply(strsplit(stri_replace_last_regex(teams, '\\s+', ''), ' '), function(i)
                                            paste(substring(i, 1, 1), collapse = ''))
 #[1] "GSW" "SAS" "HR"  "LAC" "UJ"  "OCT" "MG"  "DN"  "PTB" "SK"  "NOP" "MT"  "DM"  "PS"  "LAL" "CC"  "TR"  "BC"  "AH"  "WW"  "IP"  "CH"  "CB" 
#[24] "MB"  "DP"  "NYK" "P7"  "OM"  "MH"  "BN" 

或者得到你想要的输出,

mapply(stri_replace_first_regex, teams, '\\w+', ind)

1
投票

初步回应:

使用正则表达式回答您的问题:

gsub("\\Â .*", "", teams)

## Store in object and print
teams2 <- gsub("\\Â .*", "", teams)
head(teams2)
## [1] "Golden State Warriors" "San Antonio Spurs"     "Houston Rockets"       "Los Angeles Clippers" 
## [5] "Utah Jazz"             "Oklahoma City Thunder"

你走在正确的轨道上,但我的策略不是改变你所拥有的,而是 1) 找到公共元素(

Â
,后面跟一个空格的字符),然后 2) drop 该公共元素。

请记住,如有必要,您可以多次运行

gsub
。例如:

teams <- gsub("\\Â .*", "", teams)
teams <- gsub("PATTERN2", "", teams)
teams <- gsub("PATTERN3", "", teams)

等等。

更新:

为了仅返回字符串的缩写形式,我采用了我在第一篇文章中建议的“多个

gsub
”方法,如下所示:

teams <- gsub("\\Â .*", "", teams)
teams <- abbreviate(teams, named = F) #useful function to consider
teams <- gsub("[a-z]", "", teams)
## continue as needed
head(teams)
## [1] "GSW" "SAS" "HR"  "LAC" "UJ"  "OCT"
© www.soinside.com 2019 - 2024. All rights reserved.