我如何构建一个正则表达式,删除输入字符串的开头直到前两个不是Stack-Over Flowers的单词?
defaultCase = '1.2.3.4 Hello\ - my name is Bob'
nonDefault1 = '1.2.3.4 Hello Stack-Overflowers - my name is Bob, I have a question'
nonDefault2 = '1.2.3.4 Stack-Overflowers - Hello - my name is Bob and I like regexps'
理想情况下,所有这些情况都会以“我的名字是Bob”开头输出字符串的其余部分。
默认情况相当容易处理
`%Returns 'my name is Bob'`
`matchedString = regexpi(defaultCase,'(?<=^(\S*\w\S*\s[\s\W]*){2})\w.*','match','once')`
这两个非默认情况需要对负面外观进行一些应用。
在我看来,如果你试图只使用正则表达式解决它,这就是一种会让你疯狂的问题。它们是一种非常强大的工具,但有时候它们不是唯一的......而且,幸运的是,Matlab提供了各种各样的工具。这是我的宝贝步骤建议:
clear all;
defaultCase = '1.2.3.4 Hello\ - my name is Bob';
nonDefault1 = '1.2.3.4 Hello Stack-Overflowers - my name is Bob';
nonDefault2 = '1.2.3.4 Stack-Overflowers - Hello - my name is Bob';
a = GetText(defaultCase);
b = GetText(nonDefault1,'Stack-Overflowers');
c = GetText(nonDefault2,'Stack-Overflowers');
function res = GetText(input,exc)
if (nargin < 2)
exc = {''};
end
if (ischar(exc))
exc = {exc};
end
% split the string into chunks
res = strsplit(input,' ');
% detect the chunks that contain only special characters
rem_reg = cellfun(@(x)isempty(regexp(x,'^\W+$','once')),res,'UniformOutput',true);
% detect the words that should be excluded
rem_exc = ~strcmp(res,exc);
% filter the original array of chunks based on the above criterions
res = res(rem_reg & rem_exc);
% return the final result
res = strjoin(res(3:end),' ');
end
这可以通过逐个逐步建立正则表达式来完成。
%Demonstration Cases
defaultCase = '1.2.3.4 Hello\ - my name is Bob'
nonDefault1 = '1.2.3.4 Hello Stack-Overflowers - my name is Bob, I have a question'
nonDefault2 = '1.2.3.4 Stack-Overflowers - Hello - my name is Bob and I like regexps';
%Word to skip during counting
skipBasic = 'Stack-Overflowers';
%Set up the regular expression
word = '(\S*[a-zA-Z0-9]+\S*)';
space = '(\s[\W\s_]*)';
skipWord = ['(\S*' skipBasic '\S*)'];
skipWordSpace = ['(',skipWord space '?)'];
wordSpace = ['(',word space '?)'];
nonSkipWord = ['(\<(?!' skipWord ')' word '\>)'];
pairedWord = ['(' skipWordSpace '*' nonSkipWord ')'];
firstTwoPairedWords = ['^(' pairedWord space '){2}'];
unwantedFirstPart = ['(' firstTwoPairedWords,skipWordSpace,'*)'];
wantedPart = ['(?<=' unwantedFirstPart ')' nonSkipWord space wordSpace '*'];
%Create the parser
endString = @(inputString) regexpi(inputString,wantedPart,'match','once');
%Apply the parser to the examples
disp(endString(defaultCase))
disp(endString(nonDefault1))
disp(endString(nonDefault2))
将正则表达式分解为可管理的位使得它更容易理解。这是 - 最终的结果,我永远无法徒手实现。
'(?<=(^((((\S*Stack-Overflowers\S*)(\s[\W\s_]*)?)*(\<(?!(\S*Stack-Overflowers\S*))(\S*[a-zA-Z0-9]+\S*)\>))(\s[\W\s_]*)){2}((\S*Stack-Overflowers\S*)(\s[\W\s_]*)?)*))(\<(?!(\S*Stack-Overflowers\S*))(\S*[a-zA-Z0-9]+\S*)\>)(\s[\W\s_]*)((\S*[a-zA-Z0-9]+\S*)(\s[\W\s_]*)?)*'