-Solved-使用Jsoup在标记后提取Text

问题描述 投票:-1回答:2

鉴于下面的代码给我输出这样的东西,

<a href="https://timesofindia.indiatimes.com/india/uk-envoy-lays-wreath-at-jallianwala-bagh-memorial-expresses-deep-regret/articleshow/68860078.cms"><img border="0" hspace="10" align="left" style="margin-top:3px;margin-right:5px;" src="https://timesofindia.indiatimes.com/photo/68860078.cms" /></a>British High Commissioner to India Sir Dominic Asquith laid a wreath at the Jallianwala Bagh memorial here on Saturday on the centenary of the massacre and said Britain "deeply regretted" the suffering caused to the victims.

我试图在</a>这个标签后提取文本

这是我的代码是jsoup中有任何方法可以完成我缺少的部分或其他东西吗?

try {
            Document document = Jsoup.connect("https://timesofindia.indiatimes.com/rssfeeds/-2128936835.cms").parser(Parser.xmlParser()).get();
            Elements items = document.getElementsByTag("item");
            for (Element element : items) {
                String title = element.select("title").text();
                String link = element.select("link").text();
                String time = element.select("pubDate").text();
                String description = element.select("description").text();
            System.out.println(description);
            }
        } catch (IOException ex) {
            Logger.getLogger(TimesOfIndia.class.getName()).log(Level.SEVERE, null, ex);
        }

预期产出:英国驻印度高级专员多米尼克·阿斯奎斯星期六在大屠杀一百周年纪念日在Jallianwala Bagh纪念馆献了花圈,并表示英国“深感遗憾”给受害者带来的苦难。

输出:<a href="https://timesofindia.indiatimes.com/india/uk-envoy-lays-wreath-at-jallianwala-bagh-memorial-expresses-deep-regret/articleshow/68860078.cms"><img border="0" hspace="10" align="left" style="margin-top:3px;margin-right:5px;" src="https://timesofindia.indiatimes.com/photo/68860078.cms" /></a>British High Commissioner to India Sir Dominic Asquith laid a wreath at the Jallianwala Bagh memorial here on Saturday on the centenary of the massacre and said Britain "deeply regretted" the suffering caused to the victims.

java jsoup
2个回答
1
投票

ElementnextSibling()方法,它应该工作:

element.select("description").select("a").nextSibling().text();

0
投票

我使用自己的解决方法解决了这个问题,这是代码

解决方案所以我这样做了这个代码的作用是什么?我创建了一个新的文档对象并删除了标签然后只是打印出文本,是的,这不是最好的方法,但仍然有效

d = Jsoup.parse(desc);
        Elements a = d.select("a");
        a.remove();
        System.out.println(d.body().text());

完整代码

try {
        Document d;
        Document document = Jsoup.connect("https://timesofindia.indiatimes.com/rssfeeds/-2128936835.cms").parser(Parser.xmlParser()).get();
        Elements items = document.getElementsByTag("item");
        for (Element element : items) {
            String title = element.select("title").text();
            String link = element.select("link").text();
            String time = element.select("pubDate").text();
            String desc = element.select("description").text();
            d = Jsoup.parse(desc);
            Elements a = d.select("a");
            a.remove();
            System.out.println(d.body().text());

        }
    } catch (IOException ex) {
        Logger.getLogger(TimesOfIndia.class.getName()).log(Level.SEVERE, null, ex);
    }
© www.soinside.com 2019 - 2024. All rights reserved.