使用索引投影检索 Azure AI 搜索中的各个页面

Question

我正在尝试使用 Azure AI 搜索从一组与搜索查询匹配的 pdf 中返回特定页面。现在，我正在使用“generateNormalizedImagePerPage”图像操作将每个页面转换为图像，然后使用 OcrSkill 从生成的图像中读取文本。这允许我分割内容，但问题是，当您查询索引时，它返回整个 pdf 文档，而不是仅匹配的特定页面。

我认为我可以使用索引投影将 pdf 的每一页作为搜索索引中的单独文档。

这是我尝试过的。我创建了索引。

var index = new SearchIndex(name: "myindex")
{
    Fields =
    [
        new SearchField (name: "id", type: SearchFieldDataType.String) 
            { IsSearchable = true, IsKey = true, },
        new SearchField (name: "content", type: SearchFieldDataType.String) 
            { IsFilterable = true, IsKey = false },
        new SearchField (name: "pagetext", type: SearchFieldDataType.String) 
            { IsSearchable = true },
        new SearchField (name: "pagenumber", type: SearchFieldDataType.String) 
            { IsSearchable = true }
    ]
};

然后我创建了索引投影，设置投影模式以跳过索引父文档。我还将parentKeyFieldName设置为“content”，因为这篇文章说该字段必须是Edm.String，不能是关键字段，并且必须将Filterable设置为true。

var mappings = new List<InputFieldMappingEntry>
{
    new (name: "pagetext")
    {
        Source = "/document/normalized_images/*/text"
    },
    new (name: "pagenumber")
    {
        Source = "/document/normalized_images/*/pageNumber"
    }
};

var selectors = new List<SearchIndexerIndexProjectionSelector>
{
    new (targetIndexName: "myindex",
         parentKeyFieldName: "content",
         sourceContext: "/document/normalized_images/*",
         mappings: mappings)
};

var indexProjections = new SearchIndexerIndexProjections(selectors)
{
    Parameters = new SearchIndexerIndexProjectionsParameters
    {
        ProjectionMode = IndexProjectionMode.SkipIndexingParentDocuments
    }
};

我的问题是在尝试创建技能组时出现错误。

One or more index projection selectors are invalid. 
Details: Index 'myindex' must contain field 'content', it must be of type Edm.String, 
cannot be the key field and it must be filterable.

这个错误让我很困惑，因为我以为我满足了文章中指定的 targetIndexName 的所有要求：

在创建包含索引投影定义的技能组之前，必须已在搜索服务上创建。
必须包含具有在parentKeyFieldName 参数中定义的名称的字段。该字段必须是 Edm.String 类型，不能是关键字段，并且必须将可过滤设置为 true。
关键字段必须将 searchable 设置为 true 并使用关键字分析器进行定义。
必须为映射中定义的每个名称定义字段，其中任何一个都不能是关键字段。

Answer 1

索引中的

content

字段不符合错误消息中指定的要求。我们必须有一个这样的索引


 Fields =
                {
                    new SearchField("id", SearchFieldDataType.String) { IsSearchable = true, IsKey = true },
                    new SearchField("content", SearchFieldDataType.String) { IsSearchable = true, IsFilterable = true },
                    new SearchField("pagetext", SearchFieldDataType.String) { IsSearchable = true },
                    new SearchField("pagenumber", SearchFieldDataType.Int32) { IsFilterable = true }
                }

修改了技能集的创建以包括索引预测和创建技能。

CreateOrUpdateDemoSkillSetWithIndexProjections

方法现在将

indexProjections

作为附加参数，并将其设置在技能组的索引选项中。

注：

实体识别技能 (v2) (Microsoft.Skills.Text.EntityRecognitionSkill) 现已停止，并由 Microsoft.Skills.Text.V3.EntityRecognitionSkill 取代。请按照已弃用的技能中的建议迁移到受支持的技能。

代码取自git

 private static SearchIndexerSkillset CreateOrUpdateDemoSkillSet(SearchIndexerClient indexerClient, IList<SearchIndexerSkill> skills, string azureAiServicesKey)
 {
     // Azure AI services was formerly known as Cognitive Services.
     // The APIs still use the old name, so we need to create a CognitiveServicesAccountKey object
     SearchIndexerSkillset skillset = new SearchIndexerSkillset("demoskillset", skills)
     {
         Description = "Demo skillset",
         CognitiveServicesAccount = new CognitiveServicesAccountKey(azureAiServicesKey)
     };

     // Create the skillset in your search service.
     // The skillset does not need to be deleted if it was already created
     // since we are using the CreateOrUpdate method
     try
     {
         indexerClient.CreateOrUpdateSkillset(skillset);
     }
     catch (RequestFailedException ex)
     {
         Console.WriteLine("Failed to create the skillset\n Exception message: {0}\n", ex.Message);
         ExitProgram("Cannot continue without a skillset");
     }

     return skillset;
 }

输出： enter image description here

enter image description here

使用索引投影检索 Azure AI 搜索中的各个页面

问题描述投票：0回答：1

1个回答

最新问题

使用索引投影检索 Azure AI 搜索中的各个页面

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1