在elasticsearch中处理串联单词

问题描述 投票:0回答:1

我觉得这应该是一个非常简单的问题,但出于某种原因我可以解决它。

我想使用 ElasticSearch 构建一个产品搜索引擎。 我在连接单词时遇到问题,例如我想搜索 Smart watch。 我运行两个不同的查询:(1)“智能手表”和(2)“智能手表”。

在(1)中,我得到的结果在产品标题中都包含“智能手表”和“智能手表”。但是,在(2)中,我只得到具有“智能手表”的产品,我不会得到智能和智能手表之间空格的任何变化。观看:

这是我的索引配置:

config = {
 "settings": {
    "analysis": {
      "analyzer": {
        "nGram_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "char_filter":["html_strip","custom_char_filter","space_maker_2", "space_maker_3" ],
          "filter": [
            "lowercase",
            "asciifolding",
            "nGram_filter"
          ]
        },
        "whitespace_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "char_filter": ["space_maker_2", "space_maker_3"
          ],
          "filter": [
            "lowercase",
            "asciifolding",
            "synonym_apply",
            "special_stopwards"
          ]
        }
      },
      "char_filter": {
        "custom_char_filter": {
          "type": "mapping",
          "mappings": [
            "$ => dollar"
          ]
        },
        "space_maker_1": {
          "type": "pattern_replace", 
          "pattern": "(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[a-z])",
          "replacement": " "
        },
        "space_maker_2": {
          "type": "pattern_replace",
          "pattern": "(?<=\\p{Digit})(?=\\p{Alpha})|(?<=\\p{Alpha})(?=\\p{Digit})",
          "replacement": " "
        },
        "space_maker_3": {
          "type": "pattern_replace",
          "pattern": "(?<=[a-zA-Z0-9])(?=[^a-zA-Z0-9])|(?<=[^a-zA-Z0-9])(?=[a-zA-Z0-9])",
          "replacement": " "
        }
      },
       "filter": {
        "nGram_filter": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 20,
          "token_chars": [
            "letter",
            "digit",
            "punctuation",
            "symbol"
          ]
        },
        "synonym_apply": {
            "type": "synonym",
            "lenient": "true",
            "synonyms": [ "kilo, kilogram => kg",
            "buck, dollar => usd"
            ]
          },
        "special_stopwards": {
            "type": "stop",
            "stopwords": [ "ass", "butt" ]
          }
      }
    }
  },

    "mappings": {
        "properties": {
            "brand": {
                "type": "keyword"
            },
            "category": {
                "type": "keyword"
            },
            "tags": {
                "type": "keyword"
            },
            "domain": {
                "type": "keyword"
            },
            "image": {
                "type": "text"
            },
            "purchases": {
                "type": "double"
            },
            "views": {
                "type": "double"
            },
            "price": {
                "type": "double"
            },
            "product_id": {
                "type": "text"
            },
            "product_url": {
                "type": "text"
            },
            "title": {
                "type": "text",
                "analyzer": "nGram_analyzer",
                "search_analyzer": "nGram_analyzer",
            },
            "description": {
                "type": "text"
            },
            "country": {
                "type": "integer"
            },
            "last_seen_date": {
                "type": "text"
            }
        }
    }
}

我目前只使用简单的匹配查询产品标题。

如何更改查询或索引来解决此问题?或者它甚至可以解决吗?

elasticsearch search e-commerce opensearch
1个回答
0
投票

你的问题是

nGram_analyzer
。它用于索引时间和查询时间。

让文档有一个标题“智能手表”。标题被

nGram_filter
:

分割为标记

sm、sma、smar、smart、wa、wat、watc、手表

查询文本“smartwatch”也被

nGram_filte
r 分割成标记:

sm、sma、smar、smart、smartw、smartwa、smartwat、smartwatc、smartwatch

Elasticsearch 搜索匹配 4 个标记(termFreq):sm、sma、smar、smart,并将标题为“smart watch”的文档添加到命中中。

尝试这些查询来检查上面的文本

在以下查询的回复中查找文本

termFreq=4.0

POST /<your index>/_explain/<id of document with title "smart watch">
{
  "query": {
    "match": {
      "title": "smartwatch"
    }
  }
}

是的,这个问题可以通过替换 search_analyzer 来解决

POST /<your index>/_search
{
  "query": {
    "match": {
      "title": {
        "query": "smartwatch",
        "analyzer": "keyword"
      }
    }
  }
}
© www.soinside.com 2019 - 2024. All rights reserved.