如何使用通配符从r中的字符串解析/提取信息?
head(df)
set type
a [OutofArea]:[type:"928"]:[idnum:"27"]
a [WithinRange]:[type:"029":[...
a [OutofArea]:[type:"928"]:[...
a [OutofArea]:[type:"274"]:[...
a [OutofArea]:[type:"210"]:[...
a [OutofArea]:[type:"199"]"[...
我只需要输入数字就可以了。因此只有928、029等。在这种情况下,数字是通配符-在类型:“之后和下一个”
之前的任何内容我们可以使用str_extract
提取“类型:”字符串后的数字
library(stringr)
library(dplyr)
df %>%
mutate(new = str_extract(type, '(?<=type:")\\d+'))
# set type new
#1 a [OutofArea]:[type:"928"]:[idnum:"27"] 928
#2 a [WithinRange]:[type:"029":[idnum:"27"] 029
df <- structure(list(set = c("a", "a"), type = c("[OutofArea]:[type:\"928\"]:[idnum:\"27\"]",
"[WithinRange]:[type:\"029\":[idnum:\"27\"]")), class = "data.frame", row.names = c(NA,
-2L))
[假定末尾的注释中可重复显示的数据,我们可以使用read.table
且sep
等于双引号,然后选择第二个字段。这将数字返回为数字,但是如果您希望将其作为字符,则将colClasses = "character"
添加到read.table
参数中。不使用包或正则表达式。
read.table(text = df$type, sep = '"', quote = '', fill = TRUE)[[2]]
## [1] 928 29 928 274 210 199
Lines <- 'set type
a [OutofArea]:[type:"928"]:[idnum:"27"]
a [WithinRange]:[type:"029":[...
a [OutofArea]:[type:"928"]:[...
a [OutofArea]:[type:"274"]:[...
a [OutofArea]:[type:"210"]:[...
a [OutofArea]:[type:"199"]"[...'
df <- read.table(text = Lines, header = TRUE, as.is = TRUE)
我们可以使用sub
sub('.*type:"(\\d+)".*', '\\1', df$type)
#[1] "928" "029" "928" "274" "210" "199"
数据
df <- structure(list(set = c("a", "a", "a", "a", "a", "a"),
type = c("[OutofArea]:[type:\"928\"]:[idnum:\"27\"]",
"[WithinRange]:[type:\"029\":", "[OutofArea]:[type:\"928\"]:",
"[OutofArea]:[type:\"274\"]:", "[OutofArea]:[type:\"210\"]:",
"[OutofArea]:[type:\"199\"]\"")), class = "data.frame", row.names = c(NA,-6L))