运行爬网程序时收到致命错误

问题描述 投票:0回答:1

我想将二进制文件(PDF,WORD,TEXT)索引到elasticsearch中,我使用了fscrawler,运行fscrawler时出现以下错误。

我关注了这个链接:https://fscrawler.readthedocs.io/en/latest/user/getting_started.html

配置文件 - YAML

---
name: "hello"
fs:
  url: "/home/gowtham/Documents"
  update_rate: "15m"
  excludes:
  - "*/~*"
  json_support: false
  filename_as_id: false
  add_filesize: true
  remove_deleted: true
  add_as_inner_object: false
  store_source: false
  index_content: true
  attributes_support: false
  raw_metadata: false
  xml_support: false
  index_folders: true
  lang_detect: false
  continue_on_error: false
  ocr:
    language: "eng"
    enabled: true
    pdf_strategy: "ocr_and_text"
elasticsearch:
  nodes:
  - url: "http://10.0.2.2:9200"
  bulk_size: 100
  flush_interval: "5s"
  byte_size: "10mb"
  index : "hello"

这个位置/ home / gowtham / Documents有一个pdf文件

我收到以下错误


12:46:22,477 WARN  [f.p.e.c.f.c.v.ElasticsearchClientV6] failed to create index [hello], disabling crawler...
12:46:22,478 FATAL [f.p.e.c.f.c.FsCrawlerCli] Fatal error received while running the crawler: [Elasticsearch exception [type=illegal_argument_exception, reason=request [/hello] contains unrecognized parameter: [include_type_name]]]
12:46:22,478 DEBUG [f.p.e.c.f.c.FsCrawlerCli] error caught
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=illegal_argument_exception, reason=request [/hello] contains unrecognized parameter: [include_type_name]]
    at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177) ~[elasticsearch-6.7.1.jar:6.7.1]
    at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:2053) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
    at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:2030) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
    at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1777) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
    at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1734) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
    at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1696) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
    at org.elasticsearch.client.IndicesClient.create(IndicesClient.java:191) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
    at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndex(ElasticsearchClientV6.java:240) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndex(ElasticsearchClientV6.java:603) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndices(ElasticsearchClientV6.java:436) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.start(FsCrawlerImpl.java:161) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
    at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:270) [fscrawler-cli-2.7-SNAPSHOT.jar:?]
    Suppressed: org.elasticsearch.client.ResponseException: method [PUT], host [http://10.0.2.2:9200], URI [/hello?master_timeout=30s&include_type_name=true&timeout=30s], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"request [/hello] contains unrecognized parameter: [include_type_name]"}],"type":"illegal_argument_exception","reason":"request [/hello] contains unrecognized parameter: [include_type_name]"},"status":400}
        at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:936) ~[elasticsearch-rest-client-6.7.1.jar:6.7.1]
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:233) ~[elasticsearch-rest-client-6.7.1.jar:6.7.1]
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1764) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1734) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
        at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1696) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
        at org.elasticsearch.client.IndicesClient.create(IndicesClient.java:191) ~[elasticsearch-rest-high-level-client-6.7.1.jar:6.7.1]
        at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndex(ElasticsearchClientV6.java:240) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndex(ElasticsearchClientV6.java:603) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.client.v6.ElasticsearchClientV6.createIndices(ElasticsearchClientV6.java:436) ~[fscrawler-elasticsearch-client-v6-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.start(FsCrawlerImpl.java:161) ~[fscrawler-core-2.7-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:270) [fscrawler-cli-2.7-SNAPSHOT.jar:?]
    Caused by: org.elasticsearch.client.ResponseException: method [PUT], host [http://10.0.2.2:9200], URI [/hello?master_timeout=30s&include_type_name=true&timeout=30s], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"request [/hello] contains unrecognized parameter: [include_type_name]"}],"type":"illegal_argument_exception","reason":"request [/hello] contains unrecognized parameter: [include_type_name]"},"status":400}
        at org.elasticsearch.client.RestClient$1.completed(RestClient.java:552) ~[elasticsearch-rest-client-6.7.1.jar:6.7.1]
        at org.elasticsearch.client.RestClient$1.completed(RestClient.java:537) ~[elasticsearch-rest-client-6.7.1.jar:6.7.1]
        at org.apache.http.concurrent.BasicFuture.completed(BasicFuture.java:119) ~[httpcore-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.responseCompleted(DefaultClientExchangeHandlerImpl.java:177) ~[httpasyncclient-4.1.2.jar:4.1.2]
        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.processResponse(HttpAsyncRequestExecutor.java:436) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.inputReady(HttpAsyncRequestExecutor.java:326) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:265) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[httpasyncclient-4.1.2.jar:4.1.2]
        at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[httpasyncclient-4.1.2.jar:4.1.2]
        at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588) ~[httpcore-nio-4.4.5.jar:4.4.5]
        at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_201]
12:46:22,484 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [hello]
12:46:22,485 DEBUG [f.p.e.c.f.c.v.ElasticsearchClientV6] Closing Elasticsearch client manager
12:46:22,486 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
12:46:22,487 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [hello] stopped

请帮我解决这个问题。

提前致谢。

elasticsearch configuration-files elasticsearch-plugin
1个回答
0
投票

我使用Elasticsearch版本6.4而不得不使用6.7来解决这个问题。

@David的积分。 https://github.com/dadoonet/fscrawler/issues/713

© www.soinside.com 2019 - 2024. All rights reserved.