我已将不到 20 个 HTML 文档上传到 Azure 存储帐户中的 Blob 容器。每个文件有两个标签:
source_url
和document_type
。
我已导入数据并对其进行矢量化(使用“概述”边栏选项卡中的正确向导),从而创建了 Azure AI 搜索数据源、索引和索引器。
我更新了索引定义,添加了两个具有完全相同标签名称的可检索和可搜索字段。它遵循从 Azure 门户获取的定义:
{
"name": "source_url",
"type": "Edm.String",
"searchable": true,
"filterable": false,
"retrievable": true,
"sortable": false,
"facetable": false,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": "standard.lucene",
"normalizer": null,
"dimensions": null,
"vectorSearchProfile": null,
"synonymMaps": []
},
{
"name": "document_type",
"type": "Edm.String",
"searchable": true,
"filterable": false,
"retrievable": true,
"sortable": true,
"facetable": false,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": "standard.lucene",
"normalizer": null,
"dimensions": null,
"vectorSearchProfile": null,
"synonymMaps": []
}
当我尝试使用已知文档标题进行搜索时(索引重新生成后),我可以看到两个字段始终为空:
{
"@odata.context": "https://reg-srch-eu-dev.search.windows.net/indexes('vector-docs-json-2')/$metadata#docs(*)",
"@search.answers": [],
"value": [
{
"@search.score": 0.016393441706895828,
"@search.rerankerScore": 2.7293198108673096,
"@search.captions": [
{
"text": "hua da trading, inc. - 664359 - 12_20_2023 _ fda.json. that...",
"highlights": "<em>hua da trading, inc. - 664359</em> - 12_20_2023 _ fda.json. that..."
}
],
"chunk_id": "436dc57017d5_aHR0cHM6Ly9yZWdzYWV1ZGV2LmJsb2IuY29yZS53aW5kb3dzLm5ldC9kb2NzLWpzb24vZmRhL0h1YSUyMERhJTIwVHJhZGluZywlMjBJbmMuJTIwLSUyMDY2NDM1OSUyMC0lMjAxMl8yMF8yMDIzJTIwXyUyMEZEQS5qc29u0_pages_8",
"parent_id": "aHR0cHM6Ly9yZWdzYWV1ZGV2LmJsb2IuY29yZS53aW5kb3dzLm5ldC9kb2NzLWpzb24vZmRhL0h1YSUyMERhJTIwVHJhZGluZywlMjBJbmMuJTIwLSUyMDY2NDM1OSUyMC0lMjAxMl8yMF8yMDIzJTIwXyUyMEZEQS5qc29u0",
"chunk": "that you recalled 300 boxes of your “WeFun,” lot numbers 18520168 and 09/30/2026, due to presence of undeclared sildenafil in August 2023...",
"title": "Hua Da Trading, Inc. - 664359 - 12_20_2023 _ FDA.json",
"source_url": null,
"document_type": null
},
//...
}
我已在 Azure OpenAI Playground 中配置此 Azure AI 搜索索引,并将源 URL 设置为索引字段。聊天能够按日期提取文档(并回复其他问题),但无法提供源 URL:
所请求的信息在检索到的数据中不可用。请尝试其他查询或主题。
我想知道我错过了什么。我该怎么做才能使 Blob 标签正确映射到搜索索引中?
如 Microsoft 文档 所示,仅映射元数据,而不映射标签:
目前,此索引器不支持索引 blob 索引标记。