使用HtmlUnit 2.18进行抓取网页时出错

问题描述 投票:0回答:2

我有以下代码:

WebClient webClient = new WebClient(BrowserVersion.getDefault());
HtmlPage page;
List<HtmlAnchor> anchor=new ArrayList<HtmlAnchor>();

try {
    System.out.println("Querying");
    page = webClient.getPage("https://www.amazon.com/gp/goldbox");
    anchor = page.getAnchors();
    for(HtmlAnchor s:anchor)
    {
      System.out.println(s.getAttribute("href"));
    }
    System.out.println("Success");
}

查询

Exception in thread "main" java.lang.NoSuchFieldError: INSTANCE
    at org.apache.http.impl.io.DefaultHttpRequestWriterFactory.<init>(DefaultHttpRequestWriterFactory.java:52)
    at org.apache.http.impl.io.DefaultHttpRequestWriterFactory.<init>(DefaultHttpRequestWriterFactory.java:56)
    at org.apache.http.impl.io.DefaultHttpRequestWriterFactory.<clinit>(DefaultHttpRequestWriterFactory.java:46)
    at org.apache.http.impl.conn.ManagedHttpClientConnectionFactory.<init>(ManagedHttpClientConnectionFactory.java:82)
    at org.apache.http.impl.conn.ManagedHttpClientConnectionFactory.<init>(ManagedHttpClientConnectionFactory.java:95)
    at org.apache.http.impl.conn.ManagedHttpClientConnectionFactory.<init>(ManagedHttpClientConnectionFactory.java:104)
    at org.apache.http.impl.conn.ManagedHttpClientConnectionFactory.<clinit>(ManagedHttpClientConnectionFactory.java:62)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$InternalConnectionFactory.<init>(PoolingHttpClientConnectionManager.java:572)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.<init>(PoolingHttpClientConnectionManager.java:174)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.<init>(PoolingHttpClientConnectionManager.java:158)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.<init>(PoolingHttpClientConnectionManager.java:149)
    at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.<init>(PoolingHttpClientConnectionManager.java:125)
    at com.gargoylesoftware.htmlunit.HttpWebConnection.createConnectionManager(HttpWebConnection.java:972)
    at com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(HttpWebConnection.java:161)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebClient.java:1321)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebClient.java:1238)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:346)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:415)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:400)
    at crawler.HtmlUnitCrawl.main(HtmlUnitCrawl.java:29)

可能是什么错误?

java web-scraping web-crawler htmlunit
2个回答
0
投票

您有CLASSPATH冲突,因为您的代码可以正常使用干净的安装。

请删除所有HttpComponents .jars,并使用HtmlUnit提供的内容。

此外,您可以看到使用的版本:

    Class<?> klass = DefaultHttpRequestWriterFactory.class;
    String location = klass.getProtectionDomain().getCodeSource().getLocation().toString();
    System.out.println(location);

在你的情况下哪个应该给httpcore-4.4.1.jar的位置。


0
投票

我确认HtmlUnit使用的是我项目中已经使用过的版本。所以我把HtmlUnit版本与我的项目版本兼容,一切正常。

Httpclient-4.2.1与HtmlUnit-2.21(使用httpclient-4.5.2.jar)冲突。所以我改为HtmlUnit 2.10(使用Httpclient-4.2.1)并且它工作正常。

检查项目中哪些库存在冲突

© www.soinside.com 2019 - 2024. All rights reserved.