$ bin/nutch inject crawl/crawldb urls
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Users/Gjergj%20Kadriu/Documents/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.c
lass]
SLF4J: Found binding in [jar:file:/C:/Users/Gjergj%20Kadriu/Documents/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.cla
ss]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: Illegal to have multiple roots (start tag in epilog?).
at [row,col,system-id]: [9,2,"file:/C:/Users/Gjergj%20Kadriu/Documents/apache-nutch-1.19/conf/nutch-site.xml"]
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:3092)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:3041)
at org.apache.hadoop.conf.Configuration.loadProps(Configuration.java:2914)
at org.apache.nutch.crawl.Injector.main(Injector.java:533)
Caused by: com.ctc.wstx.exc.WstxParsingException: Illegal to have multiple roots (start tag in epilog?).
at [row,col,system-id]: [9,2,"file:/C:/Users/Gjergj%20Kadriu/Documents/apache-nutch-1.19/conf/nutch-site.xml"]
at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:634)
at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:504)
at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:488)
... 13 more
使用nutch-default中的默认值尝试了nutch-site.xml的不同配置,我在Windows 10中使用cygwin。尝试了环境变量故障排除等,但没有任何效果。关于如何解决此错误有什么想法吗?
文件 nutch-site.xml 必须是有效的 XML 文档。错误消息表明存在多个根元素。例如,可以使用以下 nutch-site.xml 重现该错误:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>http.agent.name</name>
<value>my-first-web-crawler</value>
</property>
</configuration>
<configuration>
</configuration>
一旦 XML 语法固定,Nutch 应该能够读取配置文件。