正则表达式看R背后的限制

问题描述 投票:0回答:1

我试图提取下面文本中“高”关键字旁边的数值(带粗体字的项目)。但是我收到了一个错误

“stri_extract_first_regex中的错误(字符串,模式,opts_regex = opts(模式)):Look-Behind模式匹配必须具有有界的最大长度。(U_REGEX_LOOK_BEHIND_LIMIT)”

我使用的正则表达式是

"(?<=High\\s*>?=?\\s?)[\\d\\.]+[\\s\\-\\d\\.]+(?=\\s)").

这适用于在线的正则表达式测试程序,但当我在Rstudio中执行相同操作时,我得到上述错误

文字是

 Optimal             <2.6  Desirable           2.6 - 3.3  Borderline high     3.4 - 4.0  High                ***4.1 - 4.8***  Very high           >=4.9

 Desirable       <5.2  Borderline high 5.2 - 6.1  High            >= ***6.2***

 Desirable   <1.7  Borderline High 1.7 - 2.2  High      ***2.3 - 4.4***  Very high >=4.5

请注意我在R语言中使用了双斜杠。但是在SO中,它只显示一个斜杠

你能帮助我吗?

r regex regex-lookarounds
1个回答
0
投票

样本数据

我将一个'Borderline High'更改为'Borderline high'。假设拼写错误。

v <- c("Optimal             <2.6  Desirable           2.6 - 3.3  Borderline high     3.4 - 4.0  High                4.1 - 4.8  Very high           >=4.9",
       "Desirable       <5.2  Borderline high 5.2 - 6.1  High            >= 6.2",
         "Desirable   <1.7  Borderline high 1.7 - 2.2  High      2.3 - 4.4  Very high >=4.5")

library(dplyr)
library(stringr)
data.frame( text = v, stringsAsFactors = FALSE ) %>%
  #Extract text between "High" and "Very", trim whirespace
  dplyr::mutate( High = trimws( stringr::str_extract(text, "(?<=High).*(?=Very)") ) ) %>%
  #If no text was extracted, take everything after "High" until the end
  dplyr::mutate( High = ifelse( is.na( High ), trimws( stringr::str_extract(text, "(?<=High).*(?=$)") ), High ) ) %>%
  dplyr::select( High )

产量

#        High
# 1 4.1 - 4.8
# 2    >= 6.2
# 3 2.3 - 4.4

update

如果High之前没有High,则只取[a-zA-Z]之后的值。

data.frame( text = v, stringsAsFactors = FALSE ) %>%
  #Extract text between "High" and "Very", trim whirespace
  dplyr::mutate( High = trimws( stringr::str_extract(text, "(?<=[^a-zA-Z] High).*(?=Very)") ) ) %>%
  #If no text was extracted, take everything after "High" until the end
  dplyr::mutate( High = ifelse( is.na( High ), trimws( stringr::str_extract(text, "(?<=[^a-zA-Z] High).*(?=$)") ), High ) ) %>%
  dplyr::select( High )
       High
1 4.1 - 4.8
2    >= 6.2
3 2.3 - 4.4
© www.soinside.com 2019 - 2024. All rights reserved.