如何在 Azure AI 搜索索引器中为嵌套 Json 数组添加字段映射

问题描述 投票:0回答:1

我想使用 Azure AI 搜索对存储在 Azure Blob 存储中的 JSON 文档进行全文搜索。除了嵌套 JSON 数组的字段映射之外,一切都工作正常。下面是我正在使用的 JSON 文档的结构

{
  "conversation": {
    "datetime": "2023-11-27T09:45:00",
    "userDetails": {
      "userId": "98765",
      "username": "ProjectPro"
    },
    "messages": [
      {
        "sender": "user",
        "message": "Good morning! I have a question about the upcoming project."
      },
      {
        "sender": "assistant",
        "message": "Good morning! I'm here to help. What do you need assistance with regarding the project?"
      }
      //Other messages...
    ]
  }
}

我已将搜索索引配置为

{
  "name": "conversation-index",
  "fields": [
    {"name": "datetime", "type": "Edm.String", "searchable": false, "filterable": true, "sortable": true, "facetable": false},
    {"name": "userId", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true, "facetable": false},
    {"name": "username", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true, "facetable": false},
    {"name": "message", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false},
    {"name": "sender", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": false, "facetable": false},
  ]
}

并配置搜索索引器如下

{
  "name": "conversation-indexer",
  "dataSourceName": "conversation-datasource",
  "targetIndexName": "conversation-index",
  "schedule": { "interval": "PT1H" },
  "parameters": { "configuration": { "dataToExtract": "contentAndMetadata",  "parsingMode": "json" } },
  "fieldMappings": [
    {"sourceFieldName": "/conversation/datetime", "targetFieldName": "datetime"},
    {"sourceFieldName": "/conversation/userDetails/userId", "targetFieldName": "userId"},
    {"sourceFieldName": "/conversation/userDetails/username", "targetFieldName": "username"},
    {"sourceFieldName": "/conversation/messages[].message", "targetFieldName": "message"},
    {"sourceFieldName": "/conversation/messages[].sender", "targetFieldName": "sender"}
  ]
}

messagesender 的索引器字段映射不起作用。搜索对这两个字段都返回 null。对嵌套 JSON 数组进行索引的正确方法是什么?

azure azure-blob-storage azure-cognitive-search blobstore azure-search-.net-sdk
1个回答
0
投票
从数组中选择所有值的正确方法是使用 *。因此,在您的情况下,您的 sourceFieldName 应该是“/conversation/messages/*/sender”。唯一的事情是,因为 messages 是一个数组,所以上述映射的输出也将是一个数组,只是一个字符串数组,因为您只从数组内的对象中选择“sender”属性。由于您的索引定义将“sender”字段作为 Edm.String,因此映射仍然无法工作,它需要改为 Collection(Edm.String)。如果您希望“messages”数组中的每个对象在索引中产生自己的对象(以便“sender”字段将是您当前定义的单个 Edm.String),我建议您查看我们的新预览

索引投影功能可以让您实现这一目标。

© www.soinside.com 2019 - 2024. All rights reserved.