如何使用重新索引，摄取管道和处理器来构建反向的1：n弹性搜索索引

Question

[我已经开始尝试使用Elasticsearch接收管道和处理器，这是构建我可以描述为“反向索引”的一种可能更快的方法。

这是我想要做的：我有一个文档索引。每个文档类似于以下内容：

{
  "id": "DOC1",
  "title": "Quiz no. 1",
  "questions": [
    {
      "question": "Who was the first person to walk on the Moon?",
      "choices": [
        { "answer": "Michael Jackson", "correct": false },
        { "answer": "Neil Armstrong", "correct": true }
      ]
    },
    {
      "question": "Who wrote the Macbeth?",
      "choices": [
        { "answer": "William Shakespeare", "correct": true },
        { "answer": "Dante Alighieri", "correct": false },
        { "answer": "Arthur Conan Doyle", "correct": false }
      ]
    }
  ]
}

我试图了解是否存在重新索引，管道和处理器的神奇组合，这些组合可以使我自动构建questions索引。这是该索引的示例：

[
  {
    "question_id": "<randomly-generated-value-1>",
    "document_id": "DOC1",
    "question": "Who was the first person to walk on the Moon?",
    "choices": [
      { "answer": "Michael Jackson", "correct": false },
      { "answer": "Neil Armstrong", "correct": true }
    ]
  },
  {
    "question_id": "<randomly-generated-value-2>",
    "document_id": "DOC1",
    "question": "Who wrote the Macbeth?",
    "choices": [
      { "answer": "William Shakespeare", "correct": true },
      { "answer": "Dante Alighieri", "correct": false },
      { "answer": "Arthur Conan Doyle", "correct": false }
    ]
  }
]

在Elasticsearch文档中，提到可以使用特定管道执行REINDEX。查找simulate pipeline docs，我正在尝试使用几个处理器，包括foreach，但是我无法理解管道中生成的文档对原始索引还是1源文档仍然是1：1。可以生成多个目标文档（这是我需要的）。

这是我正在尝试的模拟管道：

{
  "pipeline": {
    "description": "Inverts the documents index into a questions index",
    "processors": [
      {
        "rename": {
          "field": "id",
          "target_field": "document_id",
          "ignore_missing": false
        }
      },
      {
        "foreach": {
          "field": "questions",
          "processor": {
            "rename": {
              "field": "_ingest._value.question",
              "target_field": "question"
            }
          }
        }
      },
      {
        "foreach": {
          "field": "questions",
          "processor": {
            "rename": {
              "field": "_ingest._value.choices",
              "target_field": "choices"
            }
          }
        }
      },
      {
        "remove": {
          "field": "questions"
        }
      }
    ]
  }
}

这几乎正在工作。这种方法的问题在于，只有一个结果文档与第一个问题相对应。在模拟管道的输出中不存在第二个问题，因此，我怀疑处理器的管道是否可以输出读取1个源文档的多个目标文档，还是我们被迫维持1：1的关系。

Answer 1

IIUC，这可能是clone processor的用例，如this answer.中所述

如何使用重新索引，摄取管道和处理器来构建反向的1：n弹性搜索索引

问题描述投票：0回答：1

1个回答

最新问题

如何使用重新索引，摄取管道和处理器来构建反向的1：n弹性搜索索引

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1