使用 ParseDelegator 查找输入或其子项中第一次出现的 /wiki/Geographic_coordinate_system

问题描述 投票:0回答:0

输入例如Linux时,它会寻找/wiki/Geographic_coordinate_system的第一个实例,如果没有找到,它会查看它的直接孩子来找到它。

我的预期输出是

搜索:Linux - 维基百科 检查孩子: 发现于:贝尔实验室 - 维基百科

我没有打印第二行,因为我认为它是递归地遍历子节点而不返回主线程。在检查我的孩子之前,我如何循环回到 main 以便它通过我的 if 语句?

public static void main(String args[]) throws Exception {
        String subject = args[0].replace(" ", "_");
        System.out.println("Searching: " + subject + " - Wikipedia");

        URL url = new URL("https://en.wikipedia.org/wiki/" + subject);

        try {
            BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
            StringBuilder sb = new StringBuilder();
            String input;
            while ((input = in.readLine()) != null) {
                sb.append(input);
            }
            in.close();

            ParserDelegator parser = new ParserDelegator();
            MyParserCallback callback = new MyParserCallback();
            parser.parse(new StringReader(sb.toString()), callback, true);

            if (!callback.hasFound()) {
                System.out.println("Checking children:");

                Set<String> visitedLinks = new HashSet<>();
                visitedLinks.add("/wiki/" + subject);

                for (String href : callback.visitedLinks) {
                    if (!visitedLinks.contains(href)) {
                        visitedLinks.add(href);

                        String childUrl = "https://en.wikipedia.org" + href;
                        BufferedReader childIn = new BufferedReader(new InputStreamReader(new URL(childUrl).openStream()));
                        StringBuilder childSb = new StringBuilder();
                        while ((input = childIn.readLine()) != null) {
                            childSb.append(input);
                        }
                        childIn.close();

                        ParserDelegator childParser = new ParserDelegator();
                        MyParserCallback childCallback = new MyParserCallback();
                        childParser.parse(new StringReader(childSb.toString()), childCallback, true);

                        if (childCallback.hasFound()) {
                            return;
                        }
                    }
                }
java html-parsing wikipedia
© www.soinside.com 2019 - 2024. All rights reserved.