elasticsearch ngram分析器/令牌无法正常工作?

问题描述 投票:3回答:1

似乎ngram标记生成器不起作用,或者我对它的理解/使用不正确。

我的令牌生成器的最小字数是3,最大字数是5。我可以使用其他技术(使用简单的分析器及相关技术)找到该术语,但不使用ngram。

我想通过使用ngram来完成的工作是查找名称并解决拼写错误。

请查看我的映射,我的设置和我的查询的简化版本,如果您有任何想法,请告诉我-它使我发疯!

设置...

{
   "myindex": {
      "settings": {
         "index": {
            "analysis": {
               "analyzer": {                  
                  "ngramAnalyzer": {
                     "type": "custom",
                     "filter": [
                        "lowercase"
                     ],
                     "tokenizer": "nGramTokenizer"
                  }  
               },
               "tokenizer": {
                  "nGramTokenizer": {
                     "type": "nGram",
                     "min_gram": "3",
                     "max_gram": "5"
                  }
               }
            },
            "number_of_shards": "5",
            "number_of_replicas": "1",
            "version": {
               "created": "1020199"
            },
            "uuid": "60ggSr6TREaDTItkaNUagg"
         }
      }
   }
}

映射...

{
   "myindex": {
      "mappings": {
         "mytype": {
            "properties": { 
               "artists.name": {
                  "type": "string",
                  "analyzer": "simple",
                  "fields": {
                     "ngram": {
                        "type": "string",
                        "analyzer": "ngramAnalyzer"
                     },
                     "raw": {
                        "type": "string",
                        "index": "not_analyzed"
                     }
                  }
               }
            }
         }
      }
   }
}

查询...

{"query": {"match": {"artists.name.ngram": "madonna"}}}

文档...

{
   "_index": "myindex",
   "_type": "mytype",
   "_id": "602537592951",
   "_version": 1,
   "found": true,
   "_source": {
      "artists": [
         {
            "name": "Madonna",
            "id": "P    64565"
         }
      ]
   }
}

编辑顺便说一下,此查询有效(无ngram):

{"query": {"match": {"artists.name": "madonna"}}}

这显然与这里的嵌套对象有关。我显然没有正确地将ngram应用于嵌套对象。

想法?

elasticsearch nest
1个回答
4
投票

好-我想通了。我真的希望这对某人有帮助,让我发疯。

这是我的映射原来看起来像的样子:

{
   "myindex": {
      "mappings": {
         "mytype": {
            "properties": {               
               "artists": {
                  "properties": {
                     "id": {
                        "type": "string"
                     },
                     "name": {
                        "type": "string",
                        "analyzer": "ngramAnalyzer",
                        "fields": {
                           "raw": {
                              "type": "string",
                              "index": "not_analyzed"
                           }
                        }
                     }
                  }
               }
            }
        }
    }
}

这是我使用Nest语法的方式...

首先,我有一个名为Person的子类型(类),它的名称和ID看起来像这样(POCO)...

[Serializable]
public class Person
{
    public string Name { get; set; }
    [ElasticProperty(Analyzer = "fullTerm", Index = FieldIndexOption.not_analyzed)]
    public string Id { get; set; }
}

然后我的映射就这样...

.AddMapping<MyIndex>(m => m
.MapFromAttributes()
.Properties(props =>
{
    props           
        .Object<Person>(x => x.Name("artists")
        .Properties(pp => pp
            .MultiField(
                mf => mf
                .Name(s => s.Name)
                .Fields(f => f
                    .String(s => s.Name(o => o.Name).Analyzer("ngramAnalyzer"))
                    .String(s => s.Name(o => o.Name.Suffix("raw")).Index(FieldIndexOption.not_analyzed))
                )
            )
        )
    )
)

注意:这里的对象指示它是我的艺术家类型下的另一个对象。

谢谢,我!!!

编辑:卷曲映射可能是这样的...

curl-XPOST"http://localhost:9200/yourindex/_mappings"-H'Content-Type:application/json'-d'{"myindex":{"mappings":{"mytype":{"properties":{"artists":{"properties":{"id":{"type":"string"},"name":{"type":"string","analyzer":"ngramAnalyzer","fields":{"raw":{"type":"string","index":"not_analyzed"}}}}}}}}}}'
© www.soinside.com 2019 - 2024. All rights reserved.