nutch 错误:拥有多个根是非法的(结尾中的开始标记?)

问题描述 投票:0回答:1
$ bin/nutch inject crawl/crawldb urls
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Users/Gjergj%20Kadriu/Documents/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.c
lass]
SLF4J: Found binding in [jar:file:/C:/Users/Gjergj%20Kadriu/Documents/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.cla
ss]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: Illegal to have multiple roots (start tag in epilog?).
 at [row,col,system-id]: [9,2,"file:/C:/Users/Gjergj%20Kadriu/Documents/apache-nutch-1.19/conf/nutch-site.xml"]
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:3092)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:3041)
        at org.apache.hadoop.conf.Configuration.loadProps(Configuration.java:2914)
        at org.apache.nutch.crawl.Injector.main(Injector.java:533)
Caused by: com.ctc.wstx.exc.WstxParsingException: Illegal to have multiple roots (start tag in epilog?).
 at [row,col,system-id]: [9,2,"file:/C:/Users/Gjergj%20Kadriu/Documents/apache-nutch-1.19/conf/nutch-site.xml"]
        at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:634)
        at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:504)
        at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:488)
        ... 13 more

使用nutch-default中的默认值尝试了nutch-site.xml的不同配置,我在Windows 10中使用cygwin。尝试了环境变量故障排除等,但没有任何效果。关于如何解决此错误有什么想法吗?

java lucene nutch
1个回答
0
投票

文件 nutch-site.xml 必须是有效的 XML 文档。错误消息表明存在多个根元素。例如,可以使用以下 nutch-site.xml 重现该错误:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>http.agent.name</name>
  <value>my-first-web-crawler</value>
</property>
</configuration>
<configuration>
</configuration>

一旦 XML 语法固定,Nutch 应该能够读取配置文件。

© www.soinside.com 2019 - 2024. All rights reserved.