我的程序挂在某个点上的字符串可能是由于正则表达式中的无限循环。
我的字符串值为
String input="The contents of this Office Memorandum may also be brought to the notice";
我的代码是
String pat = "(the|The)[ ]?((([A-Za-z0-9])+([ ]| , )?([A-Za-z0-9]))+ [ ]?)+([<][ ]?[A-Za-z0-9- ]+ [,]?[A-Za-z0-9- ]+[ ]?[>])*[ ]?(Act|Rules|Rule|Bill|Regulations)[ ]?[,]?[ ]?([0-9]+)?";
Pattern p = Pattern.compile(pat);
Matcher m = p.matcher(input);
while (m.find()) {
String sMatch = m.group();
String temp1=sMatch;
String temp2=sMatch;
}
它会进入while循环,但不会退出循环。我认为正则表达式有一些问题。我不明白-这是什么问题?有什么可以解决的。当调试器进入while(m.find())
时,它不在外面。
您遇到了灾难性的回溯,当找到一个[
(([A-Za-z0-9])+([ ]| , )?([A-Za-z0-9]))+
它是字母/数字的一个或多个,可选的空格,然后是字母/数字,以及一个或多个。您可以看到,当输入长度很大时,每个“一个或多个”的数量组合很大。
String input = "The contents of this Office Memorandum may also be brought to the notice Act";
String pat = "((the|The) ?((\\w|\\d)+[ ,]?(\\w|\\d)+ ?)+([<] ?[A-Za-z0-9 ]+ ,?[A-Za-z0-9 ]+[ ]?[>])* ?)";
Pattern p = Pattern.compile(pat);
Matcher m = p.matcher(input);
while (m.find()) {
int end = m.end();
String sMatch = m.group();
System.out.println(sMatch);
if (end < input.length()) {
String next = input.substring(end);
String pat1 = "(Act|Rules|Rule|Bill|Regulations)[ ]?[,]?[ ]?([0-9]+)?";
Pattern p1 = Pattern.compile(pat1);
Matcher m1 = p1.matcher(next);
if (m1.find()) {
System.out.println("found");
}
}
}
}
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class RegexTest
{
public static void main(String[] args)
{
String input="The Persons with Disabilities (Equal Opportunities, Protection of Rights and Full Participation) Act, 1995 states that everone must recognise the Corn Act and the Hoopety Fling Regulations 2006 or face death by oranges.";
String pat = "(?i)(the\\b)(?> +)((?>(?>(?>\\w|\\)|\\()+(?:,| )*))+?)\\b(Act|Rules|Rule|Bill|Regulations)\\b(?: |,)*(?>(\\d{4})(?:\\D|$))?";
Pattern p = Pattern.compile(pat);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.group(0).trim().replace(' ', '-'));
}
}
}
// Will Output:
// The-Persons-with-Disabilities-(Equal-Opportunities,-Protection-of-Rights-and-Full-Participation)-Act,-1995
// the-Corn-Act
// the-Hoopety-Fling-Regulations-2006
正如其他人所说的那样,这种表达遭受了灾难性的回溯more info on catastrophic backtracking我认为表达式应该可以满足您的需求:
(?i)(the\b)(?> +)((?>(?>\w+(?:,| )*))+?)\b(Act|Rules|Rule|Bill|Regulations)\b(?: |,)*(?>(\d{4})(?:\D|$))?
注意原子基团(?>)
的使用将有效地“忘记”回溯的可能性。让我知道您是否有任何问题。更新为包括括号:
(?i)(the\b)(?> +)((?>(?>(?>\w|\)|\()+(?:,| )*))+?)\b(Act|Rules|Rule|Bill|Regulations)\b(?: |,)*(?>(\d{4})(?:\D|$))?
示例字符串:
'The Persons with Disabilities (Equal Opportunities, Protection of Rights and Full Participation) Act, 1995 states that everone must recognise the Corn Act and the Hoopety Fling Regulations 2006 or face death by oranges.'
将产生结果:
[Match number 1] Matched: 'The Persons with Disabilities (Equal Opportunities, Protection of Rights and Full Participation) Act, 1995 ' at character 1 [Capture Group 1] 'The' found at character 1 [Capture Group 2] 'Persons with Disabilities (Equal Opportunities, Protection of Rights and Full Participation) ' found at character 5 [Capture Group 3] 'Act' found at character 98 [Capture Group 4] '1995' found at character 103 [Match number 2] Matched: 'the Corn Act ' at character 143 [Capture Group 1] 'the' found at character 143 [Capture Group 2] 'Corn ' found at character 147 [Capture Group 3] 'Act' found at character 152 [Capture Group 4] '' found at character 1 [Match number 3] Matched: 'the Hoopety Fling Regulations 2006 ' at character 160 [Capture Group 1] 'the' found at character 160 [Capture Group 2] 'Hoopety Fling ' found at character 164 [Capture Group 3] 'Regulations' found at character 178 [Capture Group 4] '2006' found at character 190