我构建了一个带过滤器的弹性搜索查询,在过滤器上下文中,我正在编写一个无痛脚本,根据文本字段的正文来过滤一些文档。然而,当我想访问文本字段时,我得到的是一个术语列表,而不是原始文本。我正在寻找一种方法来访问无痛脚本中的原始文本正文,而不是术语列表。另外,如果无法访问文本正文,我希望在这种情况下访问文档的术语频率向量。
例如,如果我运行这个查询。
GET twitter/_search
{
"query": {
"bool": {
"must":{
"term" : { "body" : "spark" }
},
"filter": [
{
"script" : {
"script" : {
"lang": "painless",
"source": """
String text = doc['body'].toString();
Debug.explain(text);
return true;
"""
}
}
}
]
}
}
}
我得到了这样的响应:
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 4,
"skipped" : 0,
"failed" : 1,
"failures" : [
{
"shard" : 2,
"index" : "twitter",
"node" : "AClIunrSRUKb1gbhBz-JoQ",
"reason" : {
"type" : "script_exception",
"reason" : "runtime error",
"painless_class" : "java.lang.String",
"to_string" : "[and, by, cutting, doug, hadoop, jack, jim, lucene, made, spark, the, was]",
"java_class" : "java.lang.String",
"script_stack" : [
"Debug.explain(text);\n ",
" ^---- HERE"
],
"script" : """
String text = doc['body'].toString();
Debug.explain(text);
return true;
""",
"lang" : "painless",
"caused_by" : {
"type" : "painless_explain_error",
"reason" : null
}
}
}
]
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
正如你所看到的那样,调试结果显示 doc['body'].toString()
实际上是一个术语表 [and, by, cutting, doug, hadoop, jack, jim, lucene, made, spark, the, was]
. 我想得到的是访问原文,在这个例子中是"body" : "The Lucene was made by Doug Cutting and the hadoop was made by Jim and Spark was made by jack"
注:我已将 "fielddata": true
和 "store":true
在这个字段上,还将文档编入了一个索引。body.exact
字段,这样术语就不会被分析,但我的问题是,我不能在过滤器上下文中访问脚本中的原始文本,我总是得到独特的术语列表。
非常感谢你的帮助
你可以使用 keyword
数据类型:
PUT twitter
{
"mappings": {
"_doc": {
"properties": {
"body": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
GET twitter/_search
{
"query": {
"bool": {
"must": {
"term": {
"body": "spark"
}
},
"filter": [
{
"script": {
"script": {
"lang": "painless",
"source": """
String text = doc['body.keyword'].toString();
Debug.explain(text);
return true;
"""
}
}
}
]
}
}
}
屈服
"painless_class" : "java.lang.String",
"to_string" : "[The Lucene was made by Doug Cutting and the hadoop was made by Jim and Spark was made by jack]",
"java_class" : "java.lang.String",
"script_stack" : [
"Debug.explain(text);\n ",
" ^---- HERE"
],
"script" : """
String text = doc['body.keyword'].toString();
Debug.explain(text);
return true;
""",
到目前为止,我发现的一个解决方案是使用多字段并有一个子字段,例如 body.raw
,索引为 keyword
而在这种情况下,如果我们称 doc['body.raw'].value.toString();
我们会得到原文。我还是想找到一个解决方案,让我不必为两个字段建立索引,而是从一个 _source
或类似的东西。