去掉单词右边至少2个空格的所有数字和逗号

问题描述 投票:0回答:2

我正在尝试抓取这个支持微软语音服务的区域表。我设法得到以下字符向量:

region <- c("southafricanorth 6", "eastasia 5", "southeastasia 1,2,3,4,5", 
"australiaeast 1,2,3,4", "centralindia 1,2,3,4,5", "japaneast 2,5", 
"japanwest", "koreacentral 2", "canadacentral 1", "northeurope 1,2,4,5", 
"westeurope 1,2,3,4,5", "francecentral", "germanywestcentral", 
"norwayeast", "switzerlandnorth 6", "switzerlandwest", "uksouth 1,2,3,4", 
"uaenorth 6", "brazilsouth 6", "centralus", "eastus 1,2,3,4,5", 
"eastus2 1,2,4,5", "northcentralus 4,6", "southcentralus 1,2,3,4,5,6", 
"westcentralus 5", "westus 2,5", "westus2 1,2,4,5", "westus3"
)

去掉单词右边至少 2 个空格的所有数字和逗号的正则表达式是什么?例如,我只想要

westus2
,而不是
westus2 1,2,4,5

我试过这个无济于事:

gsub("\\s{2,}\\d+.*", "", region)

r regex stringr
2个回答
1
投票

没有上标的区域名称包含在 HTML 中的

<code>
标签内。因此,您可以通过将抓取代码修改为类似以下内容来避免使用正则表达式:

library(rvest)

url <- "https://learn.microsoft.com/en-us/azure/cognitive-services/Speech-Service/regions"

regions <- read_html(url) %>% 
  # first table only
  html_element("table") %>% 
  html_elements("code") %>% 
  html_text()

regions

[1] "southafricanorth"   "eastasia"           "southeastasia"      "australiaeast"      
    "centralindia"       "japaneast"          "japanwest"          "koreacentral"      
[9] "canadacentral"      "northeurope"        "westeurope"         "francecentral"      
    "germanywestcentral" "norwayeast"         "switzerlandnorth"   "switzerlandwest"   
[17] "uksouth"            "uaenorth"           "brazilsouth"        "centralus"          
     "eastus"             "eastus2"            "northcentralus"     "southcentralus"    
[25] "westcentralus"      "westus"             "westus2"            "westus3"

0
投票

另一个优雅的解决方案是

word()
包中的
stringr
函数:

默认第一个字:

word(string, start = 1L, end = start, sep = fixed(" "))

library(stringr)

word(region)

 [1] "southafricanorth"   "eastasia"           "southeastasia"      "australiaeast"     
 [5] "centralindia"       "japaneast"          "japanwest"          "koreacentral"      
 [9] "canadacentral"      "northeurope"        "westeurope"         "francecentral"     
[13] "germanywestcentral" "norwayeast"         "switzerlandnorth"   "switzerlandwest"   
[17] "uksouth"            "uaenorth"           "brazilsouth"        "centralus"         
[21] "eastus"             "eastus2"            "northcentralus"     "southcentralus"    
[25] "westcentralus"      "westus"             "westus2"            "westus3"
© www.soinside.com 2019 - 2024. All rights reserved.