我需要在弹性搜索中包含特殊字符

问题描述 投票:0回答:1

我已经用这个分析器创建了一个索引

{
  "settings": {
    "analysis": {
      "filter": {
        "specialCharFilter": {
          "type": "ngram",
          "min_gram": 2,
          "max_gram": 30
        }
      },
      "analyzer": {
        "specialChar": {
          "type": "custom",
          "tokenizer": "custom_tokenizer",
          "filter": [
            "lowercase",
            "specialCharFilter"
          ]
        }
      },
      "tokenizer": {
        "custom_tokenizer": {
          "type": "ngram",
          "min_gram": 2,
          "max_gram": 30,
          "token_chars": [
            "letter",
            "digit",
            "symbol",
            "punctuation"
          ]
        }
      }
    },
    "index.max_ngram_diff": 30
  },
  "mappings": {
    "properties": {
      "partyName": {
        "type": "keyword",
        "analyzer": "specialChar",
        "search_analyzer": "standard"
      }
    }
  }
} 


[
  {
    "partyName": "FLYJAC LOGISTICS PVT LTD-TPTBLR ."
  },
  {
    "partyName": "L&T GEOSTRUCTURE PRIVATE LIMITED"
  }
]

如果我使用 {"query": {"match": {"partyName": "L&T"}}} 进行查询

我想要以下对象的输出 {"partyName" : "L&T GEOSTRUCTURE PRIVATE LIMITED"}

amazon-web-services elasticsearch kibana amazon-elasticsearch
1个回答
0
投票

首先,拥有 ngram 标记生成器和 ngram 标记过滤器是没有意义的,这会生成太多无用的标记并不必要地增加索引大小。

接下来,您搜索

L&T
不会产生任何结果的原因是因为
standard
搜索时间分析器将删除
&
符号,只搜索
l
t
,这不会产生任何结果因为您只索引最小长度为 2 的标记。

我建议使用以下分析器,使用空白标记生成器简单地在空白处分割单词,然后在每个标记上运行edge-ngram,即您可以搜索任何索引标记的任何前缀(最小长度为2)。此外

partyName
字段必须是
text
类型(而不是
keyword
)。如果你想分析它的内容:

PUT test
{
  "settings": {
    "analysis": {
      "filter": {
        "specialCharFilter": {
          "type": "edge_ngram",
          "min_gram": 2,
          "max_gram": 30
        }
      },
      "analyzer": {
        "specialChar": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "specialCharFilter"
          ]
        }
      }
    },
    "index.max_ngram_diff": 30
  },
  "mappings": {
    "properties": {
      "partyName": {
        "type": "text",
        "analyzer": "specialChar",
        "search_analyzer": "lowercase"
      }
    }
  }
} 

然后我们可以索引您的样本数据:

PUT test/_doc/1
{
  "partyName": "FLYJAC LOGISTICS PVT LTD-TPTBLR ."
}
  
PUT test/_doc/2
{
  "partyName": "L&T GEOSTRUCTURE PRIVATE LIMITED"
}

然后搜索您提供的查询将产生第二个文档:

POST test/_search
{
  "query": {
    "match": {
      "partyName": "L&T"
    }
  }
}
=>

"hits": [
  {
    "_index": "test",
    "_id": "2",
    "_score": 1.0538965,
    "_source": {
      "partyName": "L&T GEOSTRUCTURE PRIVATE LIMITED"
    }
  }
]
© www.soinside.com 2019 - 2024. All rights reserved.