为什么matcher.find()没有给出任何结果?为什么会冻结?

问题描述 投票:0回答:1

我正在创建电子邮件抓取工具。但是,当我尝试使用一个特定的URL时,matcher.find()没有给出任何boolean结果。如我所见,它冻结了。但是对于其他一些URL,该代码也可以正常工作。

这是我的代码,

private Matcher matcher;
private Pattern pattern = null;
private final String emailPattern = "([\\w\\-]([\\.\\w])+[\\w]+@([\\w\\-]+\\.)+[A-Za-z]{2,4})";

public void scrape() {
   pattern = Pattern.compile(emailPattern);

   Document documentTwo = null;

   try {
      documentTwo = Jsoup.connect("https://www.mercurynews.com/2020/03/21/how-can-i-get-tested-for-covid-19-in-the-bay-area/")
              .ignoreHttpErrors(true)
              .userAgent(RandomUserAgent.getRandomUserAgent())
              .header("Content-Language", "en-US")
              .get();
   } catch (IOException ex) {
     break;
   }

   String pageBody = documentTwo.toString();

   matcher = pattern.matcher(pageBody);

   while (matcher.find()) {
      // this will never execute for the above web address
   }
}

要检查,我在while循环上方添加了System.out.println(matcher.find());,它卡在了那里而没有打印任何值。那么我在这里做错了吗?我尝试了许多不同的电子邮件正则表达式模式,但以上模式是有效的模式。那么有人可以帮助我吗?我对此表示高度赞赏。谢谢。

java regex matcher
1个回答
0
投票

您的正则表达式有问题。下面给出的是带有正则表达式的代码:

import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class Main {
    public static void main(String[] args) {
        Document documentTwo = null;
        try {
            documentTwo = Jsoup
                    .connect(
                            "https://www.mercurynews.com/2020/03/21/how-can-i-get-tested-for-covid-19-in-the-bay-area/")
                    .header("Content-Language", "en-US").get();
        } catch (IOException e) {
            e.printStackTrace();
        }

        String pageBody = documentTwo.toString();
        Pattern pattern = Pattern.compile(
                "([a-zA-Z0-9\\+\\.\\_\\%\\-\\+]{1,256}\\@[a-zA-Z0-9][a-zA-Z0-9\\-]{0,64}(\\.[a-zA-Z0-9][a-zA-Z0-9\\-]{0,25})+)");
        Matcher matcher = pattern.matcher(pageBody);
        while (matcher.find()) {
            System.out.println(matcher.group());
        }
    }
}

输出:

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
© www.soinside.com 2019 - 2024. All rights reserved.