如何使用通配符解析字符串?

问题描述 投票:0回答:3

如何使用通配符从r中的字符串解析/提取信息?

head(df)
set           type
a             [OutofArea]:[type:"928"]:[idnum:"27"]
a             [WithinRange]:[type:"029":[...
a             [OutofArea]:[type:"928"]:[...
a             [OutofArea]:[type:"274"]:[...
a             [OutofArea]:[type:"210"]:[...
a             [OutofArea]:[type:"199"]"[...

我只需要输入数字就可以了。因此只有928、029等。在这种情况下,数字是通配符-在类型:“之后和下一个”

之前的任何内容
r
3个回答
1
投票

我们可以使用str_extract提取“类型:”字符串后的数字

library(stringr)
library(dplyr)
df %>%
   mutate(new = str_extract(type, '(?<=type:")\\d+'))
#  set                                   type new
#1   a  [OutofArea]:[type:"928"]:[idnum:"27"] 928
#2   a [WithinRange]:[type:"029":[idnum:"27"] 029

数据

df <- structure(list(set = c("a", "a"), type = c("[OutofArea]:[type:\"928\"]:[idnum:\"27\"]", 
"[WithinRange]:[type:\"029\":[idnum:\"27\"]")), class = "data.frame", row.names = c(NA, 
-2L))

0
投票

[假定末尾的注释中可重复显示的数据,我们可以使用read.tablesep等于双引号,然后选择第二个字段。这将数字返回为数字,但是如果您希望将其作为字符,则将colClasses = "character"添加到read.table参数中。不使用包或正则表达式。

read.table(text = df$type, sep = '"', quote = '', fill = TRUE)[[2]]
## [1] 928  29 928 274 210 199

Lines <- 'set           type
a             [OutofArea]:[type:"928"]:[idnum:"27"]
a             [WithinRange]:[type:"029":[...
a             [OutofArea]:[type:"928"]:[...
a             [OutofArea]:[type:"274"]:[...
a             [OutofArea]:[type:"210"]:[...
a             [OutofArea]:[type:"199"]"[...'
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)

0
投票

我们可以使用sub

sub('.*type:"(\\d+)".*', '\\1', df$type)
#[1] "928" "029" "928" "274" "210" "199"

数据

df <- structure(list(set = c("a", "a", "a", "a", "a", "a"), 
     type = c("[OutofArea]:[type:\"928\"]:[idnum:\"27\"]", 
"[WithinRange]:[type:\"029\":", "[OutofArea]:[type:\"928\"]:", 
"[OutofArea]:[type:\"274\"]:", "[OutofArea]:[type:\"210\"]:", 
"[OutofArea]:[type:\"199\"]\"")), class = "data.frame", row.names = c(NA,-6L))
© www.soinside.com 2019 - 2024. All rights reserved.