ElasticSearch - 索引和搜索嵌套的JSON记录,其中键的嵌套程度在JSON文档中有所不同。

问题描述 投票:0回答:1

我有一组JSON记录,结构如下。

{
     "_root": [
       {
         "Text": "IMPORTANT NOTICE",
         "Page": 0,
         "Type": "Header3",
         "Child": [
           {
             "Text": "IMPORTANT NOTICE FOR BUYERS",
             "Page": 0,
             "Type": "Header2",
             "Child": [
               {
                 "Text": "IMPORTANT NOTICE FOR SELLERS",
                 "Page": 0,
                 "Type": "Header4",
                 "Child": [
                   {
                     "Text": "IMPORTANT INFORMATION",
                     "Page": 0,
                     "Type": "Header5",
                     "Child": [
                       {
                         "Text": "THIS OFFERING IS AVAILABLE ONLY TO INVESTORS",
                         "Page": 0
                       }
                     ]
                   }
                 ]
               }
             ]
           }
         ]
       }
     ],
     "_text": "THIS OFFERING IS AVAILABLE ONLY TO INVESTORS"
    }
    {
     "_root": [
       {
         "Text": "IMPORTANT NOTICE",
         "Page": 0,
         "Type": "Header2",
         "Child": [
           {
             "Text": "IMPORTANT NOTICE FOR BUYERS",
             "Page": 0,
             "Type": "Header4",
             "Child": [
               {
                 "Text": "IMPORTANT NOTICE FOR SELLERS",
                 "Page": 0,
                 "Type": "Header5",
                 "Child": [
                   {
                     "Text": "IMPORTANT INFORMATION",
                     "Page": 0,
                     "Type": "Header6",
                     "Child": [
                       {
                         "Text": "THIS OFFERING IS AVAILABLE ONLY TO INVESTORS",
                         "Page": 0
                       }
                     ]
                   }
                 ]
               }
             ]
           }
         ]
       }
     ],
     "_text": "THIS OFFERING IS AVAILABLE ONLY TO INVESTORS"
    }
    {
     "_root": [
       {
         "Text": "IMPORTANT NOTICE",
         "Page": 0,
         "Type": "Header1",
         "Child": [
           {
             "Text": "IMPORTANT NOTICE FOR BUYERS",
             "Page": 0,
             "Type": "Header2",
             "Child": [
               {
                 "Text": "IMPORTANT NOTICE FOR SELLERS",
                 "Page": 0,
                 "Type": "Header3",
                 "Child": [
                   {
                     "Text": "IMPORTANT INFORMATION",
                     "Page": 0,
                     "Type": "Header4",
                     "Child": [
                       {
                         "Text": "THIS OFFERING IS AVAILABLE ONLY TO INVESTORS",
                         "Page": 0
                       }
                     ]
                   }
                 ]
               }
             ]
           }
         ]
       }
     ],
     "_text": "THIS OFFERING IS AVAILABLE ONLY TO INVESTORS"
    }

我将这些记录存储在ElasticSearch中,然后我需要在每个Json记录中搜索特定的关键字文本。搜索关键字可能存在于某些 "嵌套 "的Json结构中,也可能不存在。换句话说,下面的查询会返回一个结果,但后面的查询不会返回。

{
  "query": { "match": {"_root.Child.Child.Child.Child.Text" : "OFFERING" } }
}

这个不返回结果:

{
  "query": { "match": {"_root.Child.Child.Child.Text" : "OFFERING" } }

}

当JSON文档的嵌套程度和关键字标识符不同时,我如何使搜索返回正确的结果?同样,在索引过程中,我没有一个固定的映射来定义每条记录。

注意:我重新发布这个问题(经过改进),因为我的同事之前也发布过类似的问题,但已经关闭了。

json elasticsearch lucene
1个回答
0
投票

在你的示例文档中,第2个查询没有返回任何东西是有道理的--没有任何的 OFFERING 在该给定路径下!

意见:你的 Child 子对象都往往包含一个且只有一个子对象。所以将整个事物扁平化应该不是太困难。你仍然会保留 Type 标识符等,但您的结构将更容易处理,而且您的查询的复杂性将减少到1个匹配查询和可能的 "_root.children.Type": "Header6" 或类似...

次优方案:你可以做以下工作,直到达到最深的层次。

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "_root.Child.Text": "OFFERING"
          }
        },
        {
          "match": {
            "_root.Child.Child.Text": "OFFERING"
          }
        },
        {
          "match": {
            "_root.Child.Child.Child.Text": "OFFERING"
          }
        },
        {
          "match": {
            "_root.Child.Child.Child.Child.Text": "OFFERING"
          }
        }
      ]
    }
  }
}

0
投票

不如用muti match

{
    "query": {
        "multi_match": {
            "query": "OFFERING",
            "fields": ["_root.Child.Text", "_root.Child.Text.Text", "_root.Child.Text.Text.Text"]
        }
    }
}
© www.soinside.com 2019 - 2024. All rights reserved.