如何通过弹性搜索从文本中获得得分相似的标签的总和

问题描述 投票:0回答:1

[我尝试使用Elastic Search(版本6.8)从文本中查找最相似的标签,并且我希望得到相似标签的总和,而不是默认的弹性搜索的计算(公式)。

例如,我创建my_test_index并插入三个文档:

POST my_test_index/_doc/17
{
  "id": 17,
  "tags": ["devops", "server", "hardware"]
}

POST my_test_index/_doc/20
{
  "id": 20,
  "tags": ["software", "application", "developer", "develop"]
}

POST my_test_index/_doc/21
{
  "id": 21,
  "tags": ["electronic", "electric"]
}

没有映射,这是默认设置。

所以,我要求以下查询:

GET my_test_index/_search
{
  "query": {
    "more_like_this": {
      "fields": [
        "tags"
      ],
      "like": [
        "i like electric devices and develop some softwares."
      ],
      "min_term_freq": 1,
      "min_doc_freq": 1
    }
  }
}

并获得此响应:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "my_test_index",
        "_type" : "_doc",
        "_id" : "21",
        "_score" : 0.2876821,
        "_source" : {
          "id" : 21,
          "tags" : [
            "electronic",
            "electric"
          ]
        }
      },
      {
        "_index" : "my_test_index",
        "_type" : "_doc",
        "_id" : "20",
        "_score" : 0.2876821,
        "_source" : {
          "id" : 20,
          "tags" : [
            "software",
            "application",
            "developer",
            "develop"
          ]
        }
      }
    ]
  }
}

但是,这对我来说不合适,我想计算类似以下标记的总分:我在文本和标签中有“ electric”字样,等于“ electric”标签,得到1.0分,与“ electrical”标签相似,得到〜0.7分。文字和标签中的“ develop”单词等于“ develop”标签,得分为1.0,与“ developer”标签相似,得分为〜0.8,与“ 软件“得分约为0.9,依此类推...

所以,我希望这个结果==> _id:20的总和==〜2.7,_id:21 =〜1.7和....

我希望有人可以提供一个示例,说明如何执行此操作,或者至少将我指出正确的方向。

谢谢。

elasticsearch lucene text-mining
1个回答
0
投票

我认为您没有在映射中的text字段中使用tags字段,这导致ID 2021得分相同,我在映射中将其定义为text并获得ID 21的高分,这是预期的。

下面是我的解决方法。

索引定义

{
    "mappings": {
        "properties": {
            "id": {
                "type": "integer"
            },
            "tags" : {
                "type" : "text" --> note this
            }
        }
    }
}

您提供的索引示例文档,并使用相同的搜索查询

搜索查询

{
  "query": {
    "more_like_this": {
      "fields": [
        "tags"
      ],
      "like": [
        "i like electric devices and develop some softwares."
      ],
      "min_term_freq": 1,
      "min_doc_freq": 1
    }
  }
}

搜索结果

 "hits": [
         {
            "_index": "so_array",
            "_type": "_doc",
            "_id": "3",
            "_score": 1.135697, --> note score
            "_source": {
               "id": 21,
               "tags": [
                  "electronic",
                  "electric"
               ]
            }
         },
         {
            "_index": "so_array",
            "_type": "_doc",
            "_id": "2",
            "_score": 0.86312973, --> note score
            "_source": {
               "id": 20,
               "tags": [
                  "software",
                  "application",
                  "developer",
                  "develop"
               ]
            }
         }
      ]
© www.soinside.com 2019 - 2024. All rights reserved.