lucene 螺母错误:非法拥有多个根(在epilog中开始标记?)

disho6za  于 8个月前  发布在  Lucene
关注(0)|答案(1)|浏览(111)
$ bin/nutch inject crawl/crawldb urls
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Users/Gjergj%20Kadriu/Documents/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.c
lass]
SLF4J: Found binding in [jar:file:/C:/Users/Gjergj%20Kadriu/Documents/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.cla
ss]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: Illegal to have multiple roots (start tag in epilog?).
 at [row,col,system-id]: [9,2,"file:/C:/Users/Gjergj%20Kadriu/Documents/apache-nutch-1.19/conf/nutch-site.xml"]
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:3092)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:3041)
        at org.apache.hadoop.conf.Configuration.loadProps(Configuration.java:2914)
        at org.apache.nutch.crawl.Injector.main(Injector.java:533)
Caused by: com.ctc.wstx.exc.WstxParsingException: Illegal to have multiple roots (start tag in epilog?).
 at [row,col,system-id]: [9,2,"file:/C:/Users/Gjergj%20Kadriu/Documents/apache-nutch-1.19/conf/nutch-site.xml"]
        at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:634)
        at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:504)
        at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:488)
        ... 13 more

尝试了nutch-site.xml的不同配置,使用default和nutch-default,我在windows 10中使用cygwin。尝试环境变量故障排除等,没有工作。有什么方法可以解决这个错误吗?

0wi1tuuw

0wi1tuuw1#

文件nutch-site.xml必须是有效的XML文档。错误消息指示存在多个根元素。例如,错误可以通过以下nutch-site. xml重现:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>http.agent.name</name>
  <value>my-first-web-crawler</value>
</property>
</configuration>
<configuration>
</configuration>

一旦XML语法被修复,Nutch应该能够读取配置文件。

相关问题