弹性搜索重新索引无声地失败

问题描述 投票:0回答:1

我正在按照这个 tutorial 尝试在弹性搜索中进行语义搜索。

当我按照这个命令将一个索引文件复制到另一个索引[reindexing]时

POST _reindex?wait_for_completion=false
{
  "source": {
    "index": "collection"
  },
  "dest": {
    "index": "collection-with-embeddings",
    "pipeline": "text-embeddings"
  }
}

新索引中缺少一些文档。但我不知道原因。我想找出原因。

对于上下文,

PUT _ingest/pipeline/text-embeddings
{
  "description": "Text embedding pipeline",
  "processors": [
    {
      "inference": {
        "model_id": "sentence-transformers__msmarco-minilm-l-12-v3",
        "target_field": "text_embedding",
        "field_map": {
          "text": "text_field"
        }
      }
    }
  ],
  "on_failure": [
    {
      "set": {
        "description": "Index document to 'failed-<index>'",
        "field": "_index",
        "value": "failed-{{{_index}}}"
      }
    },
    {
      "set": {
        "description": "Set error message",
        "field": "ingest.failure",
        "value": "{{_ingest.on_failure_message}}"
      }
    }
  ]
}

这是任务详情

{
    "completed": true,
    "task": {
        "node": "YgR8udaSSMqClwCGWOBGBw",
        "id": 5946104,
        "type": "transport",
        "action": "indices:data/write/reindex",
        "status": {
            "total": 2414,
            "updated": 1346,
            "created": 1068,
            "deleted": 0,
            "batches": 3,
            "version_conflicts": 0,
            "noops": 0,
            "retries": {
                "bulk": 0,
                "search": 0
            },
            "throttled_millis": 0,
            "requests_per_second": -1.0,
            "throttled_until_millis": 0
        },
        "description": "reindex from [source_index] to [destination_index]",
        "start_time_in_millis": 1680795982705,
        "running_time_in_nanos": 22702121635,
        "cancellable": true,
        "cancelled": false,
        "headers": {}
    },
    "response": {
        "took": 22699,
        "timed_out": false,
        "total": 2414,
        "updated": 1346,
        "created": 1068,
        "deleted": 0,
        "batches": 3,
        "version_conflicts": 0,
        "noops": 0,
        "retries": {
            "bulk": 0,
            "search": 0
        },
        "throttled": "0s",
        "throttled_millis": 0,
        "requests_per_second": -1.0,
        "throttled_until": "0s",
        "throttled_until_millis": 0,
        "failures": []
    }
}

我的数据不一样,但是Configuration是相似的。大约 75% 的数据没有被复制。

我正在使用来自弹性搜索的

sentence-transformers__msmarco-minilm-l-12-v3

有什么帮助吗?

elasticsearch elastic-stack
1个回答
1
投票

您可能没有足够的推理处理器处理能力,因此,一些文档以

failed-collection-with-embeddings
字段中提到的原因进入
ingest.failure
索引。

您可以做的是使用较小的批次(在源代码中指定较小的

size
)或使用request throttling.

© www.soinside.com 2019 - 2024. All rights reserved.