Elasticsearch Procter&Gamble`和`Procter&Gamble`问题>>

问题描述 投票:0回答:2

我的任务是:*使procter&gambleprocter & gamble产生相同的结果,包括得分*使其通用,而不是通过同义词,因为它可以是任何其他Somehow&Somewhat*突出显示procter&gambleprocter & gamble,如果词组匹配则不要单独标记*我想使用simple_query_string,因为我允许搜索运算符*使AT&T也可搜索

这是我的摘录。 procter&gambleprocter & gamble搜索的问题会产生不同的分数,并且会产生不同的文档。但是用户期望procter&gambleprocter & gamble

的结果相同
DELETE /english_example
PUT /english_example
{
  "settings": {
    "analysis": {
      "filter": {
        "english_stop": {
          "type":       "stop",
          "stopwords":  "_english_" 
        },
        "english_keywords": {
          "type":       "keyword_marker",
          "keywords":   ["example"] 
        },
        "english_stemmer": {
          "type":       "stemmer",
          "language":   "english"
        },
        "english_possessive_stemmer": {
          "type":       "stemmer",
          "language":   "possessive_english"
        },
        "acronymns": {
          "type": "word_delimiter_graph",
          "catenate_all" : true,
          "preserve_original":true
        },
        "acronymns_": {
          "type": "word_delimiter_graph",
          "catenate_all" : true,
          "preserve_original":true
        },
        "custom_stop_words_filter": {
          "type": "stop",
          "ignore_case": true,
          "stopwords": [ "t" ]
        }

      },
      "analyzer": {
        "default": {
          "tokenizer":  "whitespace",
          "char_filter": [
           "ampersand_filter"
          ],
          "filter": [
            "english_possessive_stemmer",
            "lowercase",
            "acronymns",
            "flatten_graph",
            "english_stop",
            "custom_stop_words_filter",
            "english_keywords",
            "english_stemmer"
          ]
        }
      },
      "char_filter": {
        "ampersand_filter": {
          "type": "pattern_replace",
          "pattern": "(?=[^&]*)( {0,}& {0,})(?=[^&]*)",
          "replacement": "_and_"
        },
        "ampersand_filter2": {
          "type": "mapping",
          "mappings": [
            "& => _and_"
          ]
        }
      }
    }
  }
}
PUT /english_example/_bulk 
{ "index" : { "_id" : "1" } }
{ "description" : "wi-fi AT&T BB&T Procter & Gamble, some\nOther $500 games with Peter's", "contents" : "Much text with somewhere I meet Procter or Gamble" }
{ "index" : { "_id" : "2" } }
{ "description" : "Procter & Gamble", "contents" : "Much text with somewhere I meet Procter and Gamble" }
{ "index" : { "_id" : "3" } }
{ "description" : "Procter&Gamble", "contents" : "Much text with somewhere I meet Procter & Gamble" }
{ "index" : { "_id" : "4" } }
{ "description" : "Come Procter&Gamble", "contents" : "Much text with somewhere I meet Procter&Gamble" }
{ "index" : { "_id" : "5" } }
{ "description" : "Tome Procter & Gamble", "contents" : "Much text with somewhere I don't meet AT&T" }


# "query": "procter & gamble",
GET english_example/_search
{
    "query": {
      "simple_query_string": {
          "query": "procter & gamble",
          "default_operator": "or",
          "fields": [
            "description^2",
            "contents^80"
          ]
      }
    },
    "highlight": {
      "fields": {
        "description": {},
        "contents": {}
      }
    }
}


# "query": "procter&gamble",
GET english_example/_search
{
    "query": {
      "simple_query_string": {
          "query": "procter&gamble",
          "default_operator": "or",
          "fields": [
            "description^2",
            "contents^80"
          ]
      }
    },
    "highlight": {
      "fields": {
        "description": {},
        "contents": {}
      }
    }
}


# "query": "at&t",
GET english_example/_search
{
    "query": {
      "simple_query_string": {
          "query": "at&t",
          "default_operator": "or",
          "fields": [
            "description^2",
            "contents^80"
          ]
      }
    },
    "highlight": {
      "fields": {
        "description": {},
        "contents": {}
      }
    }
}

在我的代码段中,我也使用word_delimiter_graphwhitespace标记生成器重新定义了默认分析器,以搜索AT&T匹配项。

我的任务是:*使procter&gamble和procter&gamble产生相同的结果,包括得分*使它具有普遍性,而不是通过同义词,因为它可以是任何其他Somehow&Somewhat *高亮...

elasticsearch elasticsearch-analyzers
2个回答
-1
投票

-1
投票
© www.soinside.com 2019 - 2024. All rights reserved.