正则表达式量词,跳过不匹配的表达式而不重置计数

问题描述 投票:0回答:2

我如何构建一个正则表达式,删除输入字符串的开头直到前两个不是Stack-Over Flowers的单词?

defaultCase = '1.2.3.4 Hello\ - my name is Bob'
nonDefault1 = '1.2.3.4 Hello Stack-Overflowers - my name is Bob, I have a question'
nonDefault2 = '1.2.3.4 Stack-Overflowers - Hello - my name is Bob and I like regexps'

理想情况下,所有这些情况都会以“我的名字是Bob”开头输出字符串的其余部分。

默认情况相当容易处理

`%Returns 'my name is Bob'`
`matchedString = regexpi(defaultCase,'(?<=^(\S*\w\S*\s[\s\W]*){2})\w.*','match','once')`

这两个非默认情况需要对负面外观进行一些应用。

matlab regex-lookarounds
2个回答
0
投票

在我看来,如果你试图只使用正则表达式解决它,这就是一种会让你疯狂的问题。它们是一种非常强大的工具,但有时候它们不是唯一的......而且,幸运的是,Matlab提供了各种各样的工具。这是我的宝贝步骤建议:

clear all;

defaultCase = '1.2.3.4 Hello\ - my name is Bob';
nonDefault1 = '1.2.3.4 Hello Stack-Overflowers - my name is Bob';
nonDefault2 = '1.2.3.4 Stack-Overflowers - Hello - my name is Bob';

a = GetText(defaultCase);
b = GetText(nonDefault1,'Stack-Overflowers');
c = GetText(nonDefault2,'Stack-Overflowers');

function res = GetText(input,exc)
    if (nargin < 2)
        exc = {''};
    end

    if (ischar(exc))
        exc = {exc};
    end

    % split the string into chunks
    res = strsplit(input,' ');

    % detect the chunks that contain only special characters
    rem_reg = cellfun(@(x)isempty(regexp(x,'^\W+$','once')),res,'UniformOutput',true);

    % detect the words that should be excluded
    rem_exc = ~strcmp(res,exc);

    % filter the original array of chunks based on the above criterions
    res = res(rem_reg & rem_exc);

    % return the final result
    res = strjoin(res(3:end),' ');
end

0
投票

这可以通过逐个逐步建立正则表达式来完成。

%Demonstration Cases
defaultCase = '1.2.3.4 Hello\ - my name is Bob'
nonDefault1 = '1.2.3.4 Hello Stack-Overflowers - my name is Bob, I have a question'
nonDefault2 = '1.2.3.4 Stack-Overflowers - Hello - my name is Bob and I like regexps';

%Word to skip during counting
skipBasic = 'Stack-Overflowers';

%Set up the regular expression
word = '(\S*[a-zA-Z0-9]+\S*)';
space = '(\s[\W\s_]*)';
skipWord = ['(\S*' skipBasic '\S*)'];
skipWordSpace = ['(',skipWord space '?)'];
wordSpace = ['(',word space '?)'];
nonSkipWord = ['(\<(?!' skipWord ')' word '\>)'];
pairedWord = ['(' skipWordSpace '*' nonSkipWord ')'];
firstTwoPairedWords = ['^(' pairedWord space '){2}'];
unwantedFirstPart = ['(' firstTwoPairedWords,skipWordSpace,'*)'];
wantedPart = ['(?<=' unwantedFirstPart ')' nonSkipWord space wordSpace '*'];

%Create the parser 
endString = @(inputString) regexpi(inputString,wantedPart,'match','once');

%Apply the parser to the examples
disp(endString(defaultCase))
disp(endString(nonDefault1))
disp(endString(nonDefault2))

将正则表达式分解为可管理的位使得它更容易理解。这是 - 最终的结果,我永远无法徒手实现。

'(?<=(^((((\S*Stack-Overflowers\S*)(\s[\W\s_]*)?)*(\<(?!(\S*Stack-Overflowers\S*))(\S*[a-zA-Z0-9]+\S*)\>))(\s[\W\s_]*)){2}((\S*Stack-Overflowers\S*)(\s[\W\s_]*)?)*))(\<(?!(\S*Stack-Overflowers\S*))(\S*[a-zA-Z0-9]+\S*)\>)(\s[\W\s_]*)((\S*[a-zA-Z0-9]+\S*)(\s[\W\s_]*)?)*'

© www.soinside.com 2019 - 2024. All rights reserved.