如何在r中实现可选的lookbehind和lookahead？

Question

我想摘录以下两段文字德和褝以及字符串中不含德或褝. 我对regex不是很在行，但在阅读了lookaheads和lookbehinds之后，我设法得到了部分我想要的东西。现在我必须把它们变成可选的，但无论我尝试了什么，我都无法做到正确。

library(stringr)
(sstring = c('{\"de\":\"extract this one\",\"en\":\"some text\"}',     'extract this one',     '{\"de\":\"extract this one\",\"en\":\"some text\"}', "p (340) extract this one"))
#> [1] "{\"de\":\"extract this one\",\"en\":\"some text\"}"
#> [2] "extract this one"                                  
#> [3] "{\"de\":\"extract this one\",\"en\":\"some text\"}"
#> [4] "p (340) extract this one"

str_extract_all(string = sstring, pattern = "(?<=.de\":\").*(?=.,\"en\":)")
#> [[1]]
#> [1] "extract this one"
#> 
#> [[2]]
#> character(0)
#> 
#> [[3]]
#> [1] "extract this one"
#> 
#> [[4]]
#> character(0)

所需的输出。

#> [1] "extract this one"         "extract this one"        
#> [3] "extract this one"         "p (340) extract this one"

^{创建于2020-05-08，作者：重读包 (v0.3.0)}

Answer 1

在基础R中

gsub('.*de\":\"(.*)\",\"en.*',"\\1",sstring)


[1] "extract this one"        
[2] "extract this one"        
[3] "extract this one"        
[4] "p (340) extract this one"

其中

.* 表示任何字符的任意长度
(...) 托架存储里面的东西，后被回收的。"\\1" 本质上，是将整个字符串的匹配模式子化，只有我们想要的文本。

Answer 2

我建议使用一种模式，可以匹配任何不含 {"de":" 子串或在 {"de":" 含有1个以上的字符，除了 ":

(?<=\{"de":")[^"]+|^(?!.*\{"de":").+

见搜索引擎演示.

详情

(?<=\{"de":") - 前面的正向观察，寻找前面的位置。{"de":"
[^"]+ - 然后提取1个以上的字符，除了 "
| - 或
^ - 句首
(?!.*\{"de":") - 确保没有 {"de":" 在字符串中和
.+ - 尽可能多地提取除换行符以外的1+字符。

请看一个 R演示在线:

library(stringr)
sstring = c('{\"de\":\"extract this one\",\"en\":\"some text\"}',     'extract this one',     '{\"de\":\"extract this one\",\"en\":\"some text\"}', "p (340) extract this one")
str_extract( sstring, '(?<=\\{"de":")[^"]+|^(?!.*\\{"de":").+')
# => [1] "extract this one"         "extract this one"        
#    [3] "extract this one"         "p (340) extract this one"

如何在r中实现可选的lookbehind和lookahead？

问题描述投票：0回答：2

2个回答

最新问题

如何在r中实现可选的lookbehind和lookahead？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2