如何在弹性搜索查询的过滤上下文中获取无痛脚本中的文本体字段?

问题描述 投票:1回答:1

我构建了一个带过滤器的弹性搜索查询,在过滤器上下文中,我正在编写一个无痛脚本,根据文本字段的正文来过滤一些文档。然而,当我想访问文本字段时,我得到的是一个术语列表,而不是原始文本。我正在寻找一种方法来访问无痛脚本中的原始文本正文,而不是术语列表。另外,如果无法访问文本正文,我希望在这种情况下访问文档的术语频率向量。

例如,如果我运行这个查询。

GET twitter/_search
{
  "query": {
      "bool": { 
      "must":{
        "term" : { "body" : "spark" }
      },
      "filter": [
        {
        "script" : {
                    "script" : {
                        "lang": "painless",
                        "source": """
                          String text = doc['body'].toString();
                          Debug.explain(text);
                         return true;
                      """

                    }
                }
      }
      ]
      } 

    }
}

我得到了这样的响应:

  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 4,
    "skipped" : 0,
    "failed" : 1,
    "failures" : [
      {
        "shard" : 2,
        "index" : "twitter",
        "node" : "AClIunrSRUKb1gbhBz-JoQ",
        "reason" : {
          "type" : "script_exception",
          "reason" : "runtime error",
          "painless_class" : "java.lang.String",
          "to_string" : "[and, by, cutting, doug, hadoop, jack, jim, lucene, made, spark, the, was]",
          "java_class" : "java.lang.String",
          "script_stack" : [
            "Debug.explain(text);\n                         ",
            "              ^---- HERE"
          ],
          "script" : """
                          String text = doc['body'].toString();
                          Debug.explain(text);
                         return true;
                      """,
          "lang" : "painless",
          "caused_by" : {
            "type" : "painless_explain_error",
            "reason" : null
          }
        }
      }
    ]
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

正如你所看到的那样,调试结果显示 doc['body'].toString() 实际上是一个术语表 [and, by, cutting, doug, hadoop, jack, jim, lucene, made, spark, the, was]. 我想得到的是访问原文,在这个例子中是"body" : "The Lucene was made by Doug Cutting and the hadoop was made by Jim and Spark was made by jack"

注:我已将 "fielddata": true"store":true 在这个字段上,还将文档编入了一个索引。body.exact 字段,这样术语就不会被分析,但我的问题是,我不能在过滤器上下文中访问脚本中的原始文本,我总是得到独特的术语列表。

非常感谢你的帮助

elasticsearch full-text-search elasticsearch-painless
1个回答
1
投票

你可以使用 keyword 数据类型:

PUT twitter
{
  "mappings": {
    "_doc": {
      "properties": {
        "body": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}
GET twitter/_search
{
  "query": {
    "bool": {
      "must": {
        "term": {
          "body": "spark"
        }
      },
      "filter": [
        {
          "script": {
            "script": {
              "lang": "painless",
              "source": """
                          String text = doc['body.keyword'].toString();
                          Debug.explain(text);
                         return true;
"""
            }
          }
        }
      ]
    }
  }
}

屈服

"painless_class" : "java.lang.String",
          "to_string" : "[The Lucene was made by Doug Cutting and the hadoop was made by Jim and Spark was made by jack]",
          "java_class" : "java.lang.String",
          "script_stack" : [
            "Debug.explain(text);\n                         ",
            "              ^---- HERE"
          ],
          "script" : """
                          String text = doc['body.keyword'].toString();
                          Debug.explain(text);
                         return true;
""",

0
投票

到目前为止,我发现的一个解决方案是使用多字段并有一个子字段,例如 body.raw,索引为 keyword 而在这种情况下,如果我们称 doc['body.raw'].value.toString(); 我们会得到原文。我还是想找到一个解决方案,让我不必为两个字段建立索引,而是从一个 _source 或类似的东西。

© www.soinside.com 2019 - 2024. All rights reserved.