当聚合映射中有超过 1 个嵌套类型属性时,有人可以帮忙映射吗?我们正在使用 8.0 版本并使用 Logstash 将数据从数据库同步到 ES 索引。
问题:当我在 Logstash 配置文件中映射超过 1 个嵌套类型属性时,我看到在文档中为嵌套类型属性创建了重复的数据。让我试着用下面的示例更好地解释。
索引映射
PUT test
{
"settings": {
"index.mapping.coerce": false
},
"mappings": {
"dynamic": "strict",
"properties" : {
"agreementId" : {
"type" : "text",
"copy_to" : [
"primaryFields"
]
},
"customers" : {
"properties" : {
"customerId" : {
"type" : "keyword",
"index" : false,
"doc_values" : false
},
"customerAddresses" : {
"type" : "nested",
"properties" : {
"custAddress" : {
"type" : "text"
},
"custAddressType" : {
"type" : "keyword",
"doc_values" : false
}
}
},
"phones" : {
"properties" : {
"phonenumber" : {
"type" : "text",
"copy_to" : [
"primaryFields"
]
},
"phonetype" : {
"type" : "keyword",
"doc_values" : false
}
}
}
}
}
}
}
}
在我们的数据库中,我们有一个协议号作为主键,它可以有超过 1 个客户资料(在这个场景中我们使用 1 个)。每个客户可以有多个电话和多个地址。根据查询,我的输出看起来像这样
**agreement** **customer** **Address** **Addresstype** **Contact** **Contacttype**
123456879 10 123 Main St. Mailing 1111111111 Home
123456789 10 123 Main St. Mailing 2222222222 Cell
123456789 10 456 South Billing 1111111111 Home
123456789 10 456 South Billing 2222222222 Cell
在索引中创建文档时,这就是它的样子
{
"took": 474,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "test",
"_id": "123456789",
"_score": null,
"_source": {
"agreementId": 123456789,
"customers": [
{
"phones": [
{
"phonetype": "Cell",
"phonenumber": "2222222222"
},
{
"phonetype": "Cell",
"phonenumber": "2222222222"
},
{
"phonetype": "Home",
"phonenumber": "1111111111"
},
{
"phonetype": "Home",
"phonenumber": "1111111111"
}
],
"customerAddresses": [
{
"custAddressType": "Mailing",
"custAddress": "123 Main St."
},
{
"custAddressType": "Billing",
"custAddress": "456 South"
},
{
"custAddressType": "Mailing",
"custAddress": "123 Main St."
},
{
"custAddressType": "Billing",
"custAddress": "456 South"
}
]
}
]
},
"sort": [
1713679200000
]
}
]
}
}
如您所见,电话和客户地址重复出现。这是在配置文件中定义映射的方式。
aggregate {
task_id => "%{agreement}"
code => "
map['agreementId'] = event.get('agreement')
map['customers'] ||= []
if (event.get('customer') != nil)
customer_found = false
map['customers'].each { |cus|
if cus['customerId'] == event.get('customer')
customer_found = true
end
}
if !customer_found
map['customers'] << {
'customerId' => event.get('customer')
}
end
map['customers'].each { |cus|
if cus['customerId'] == event.get('customer') && event.get('Contact') != nil
cus['phones'] ||=[]
cus['phones'] << {
'phonenumber' => event.get('Contact'),
'phonetype' => event.get('Contacttype'),
}
end
}
map['customers'].each { |cus|
if cus['customerId'] == event.get('customer_id') && event.get('Address') != nil
cus['customerAddresses'] ||=[]
cus['customerAddresses'] << {
'custAddress' => event.get('Address'),
'custAddressType' => event.get('Addresstype'),
}
end
}
end
event.cancel()
"
push_previous_map_as_event => true
timeout => 5
timeout_tags => ['aggregated']
}
if "aggregated" not in [tags] {
drop {}
}
}
我什至尝试将“Phones”属性作为嵌套类型提及,但在重复中没有运气。