如何使用elasticsearch迭代脚本过滤器查询中的索引数据？

Question

我有以下索引数据

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 7992,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_id": "33952",
        "default_fee": 12,
        "custom_dates": [
          {
            "date": "2023-11-01",
            "price": 100
          },
          {
            "date": "2023-11-02",
            "price": 50
          }
        ],
        "options": [
          {
            "id": 95,
            "cost": 5,
            "type": [
              "Car"
            ]
          }
        ]
      }
    ]
  }
}

我添加了一个脚本字段作为总计，以计算运行时的总计，如下所示

{
  script_fields: {
    total: {
      script: {
        source: "
          DateTimeFormatter formatter = DateTimeFormatter.ofPattern('yyyy-MM-dd');
          def from = LocalDate.parse(params.checkin, formatter);
          def to = LocalDate.parse(params.checkout, formatter);
          def stay = params.total_stay;

          def custom_price_dates = [];
          if (params['_source']['custom_dates'] != null && !params['_source']['custom_dates'].isEmpty()) {
            custom_price_dates = params['_source']['custom_dates'].stream()
            .filter(filter_doc -> {
              def date = LocalDate.parse(filter_doc.start_date, formatter);
              return !date.isBefore(from) && !date.isAfter(to.minusDays(1));
            })
            .collect(Collectors.toList());
          }

          def custom_price = custom_price_dates.stream().mapToDouble(custom_doc -> custom_doc.price).sum();
          def default_price = stay == custom_price_dates.size() ? 0 : (stay - custom_price_dates.size()) * params['_source']['default_fee'];
          def calc_price = default_price + custom_price;
          return calc_price; 
        ",
        params: {
          checkin: Date.current.to_s,
          checkout: Date.current.to_s,
          total_stay: 2
        }
      }
    }
  },
  _source: ["*"]
}

这将返回脚本字段的总计。现在我想根据上述总数的范围进行过滤。我该如何实现它？我尝试使用脚本查询如下，但它不会循环遍历 custom_dates，因为它是嵌套类型。

此外，我之前无法索引总计，因为入住和退房日期是动态的，并且给定的入住和退房日期可能有自定义价格。请推荐。

Answer 1

这个可以做到，但是比较复杂。首先，我们需要了解这个搜索是分两个阶段执行的——查询和获取。在查询阶段，每个分片使用其排序键（默认为 _score）收集前 10 个命中，在获取阶段，协调节点从所有分片收集这些 id 和排序键，从中选择前 10 个，然后要求每个分片返回那里文件。脚本字段是在获取阶段计算的，因此过滤器无法访问它们。

更糟糕的是，您将自定义日期索引为嵌套对象。在内部，嵌套对象作为单独的对象进行索引，将信息从它们传递到主查询的唯一方法是通过 _score。因此，基本上，为了实现您想要通过嵌套对象实现的目标，您需要将价格编码到 _score 中。为了简化计算，我们需要在嵌套对象中存储价格差异而不是实际价格。所以如果默认价格是12，特价是100，我们需要存储88。

然后我们可以找到与我们的日期范围匹配的所有嵌套对象：

           {
              "nested": {
                "path": "custom_dates",
                "query": {
                  "range": {
                    "custom_dates.start_date": {
                      "gte": "2023-10-31",
                      "lte": "2023-11-02"
                    }
                  }
                }
              }
            }

然后我们可以将其包装到脚本得分中，它将用价格替换得分：

            {
              "nested": {
                "path": "custom_dates",
                "query": {
                  "script_score": {
                    "script": {
                      "source": "doc['custom_dates.price_adjustment'].value"
                    },
                    "query": {
                      "range": {
                        "custom_dates.start_date": {
                          "gte": "2023-10-31",
                          "lte": "2023-11-02"
                        }
                      }
                    }
                  }
                },
                "score_mode": "sum"
              }
            }

然后我们可以使用另一个

script_score

来计算默认价格：

            {
              "script_score": {
                "script": {
                  "params": {
                    "total_stay": 3
                  },
                  "source": "doc['default_fee'].value * params.total_stay"
                },
                "query": {
                  "match_all": {}
                }
              }
            }

然后我们可以将它们组合在一起形成两个加分的should子句。

所以，现在我们的 _score 等于分配给每条记录的价格。最后一步是通过 _score 过滤记录，这可以通过另一个带有

script_score

参数的

min_score

来完成：

    "script_score": {
      "query": {
        "bool": {
          "should": [
            {
              .... default price calculation ....
            },
            {
              .... adjusted price calculation ....
            }
          ]
        }
      },
      "script": {
        "source": "if (_score >= params.min_price && _score <=params.max_price) { 1 } else { 0 }",
        "params": {
          "min_price": 100,
          "max_price": 200
        }
      },
      "min_score": 1
    }

如果我们把这些放在一起，我们会得到这样的结果：

DELETE test
PUT test
{
  "mappings": {
    "properties": {
      "default_fee": {
        "type": "double"
      },
      "custom_dates": {
        "type": "nested",
        "properties": {
          "start_date": {
            "type": "date"
          },
          "price_adjustment": {
            "type": "double"
          }
        }
      }
    } 
  }
}

PUT test/_doc/33952?refresh
{
  "default_fee": 12,
  "custom_dates": [
    {
      "start_date": "2023-11-01",
      "price_adjustment": 88
    },
    {
      "start_date": "2023-11-02",
      "price_adjustment": 38
    }
  ],
  "options": [
    {
      "id": 95,
      "cost": 5,
      "type": [
        "Car"
      ]
    }
  ]
}

PUT test/_doc/33953?refresh
{
  "default_fee": 24,
  "custom_dates": [
    {
      "start_date": "2023-11-01",
      "price_adjustment": 12
    },
    {
      "start_date": "2023-11-02",
      "price_adjustment": 1
    }
  ],
  "options": [
    {
      "id": 95,
      "cost": 5,
      "type": [
        "Truck"
      ]
    }
  ]
}

POST test/_search
{
  "query": {
    "script_score": {
      "query": {
        "bool": {
          "should": [
            {
              "script_score": {
                "script": {
                  "params": {
                    "total_stay": 3
                  },
                  "source": "doc['default_fee'].value * params.total_stay"
                },
                "query": {
                  "match_all": {}
                }
              }
            },
            {
              "nested": {
                "path": "custom_dates",
                "query": {
                  "script_score": {
                    "script": {
                      "source": "doc['custom_dates.price_adjustment'].value"
                    },
                    "query": {
                      "range": {
                        "custom_dates.start_date": {
                          "gte": "2023-10-31",
                          "lte": "2023-11-02"
                        }
                      }
                    }
                  }
                },
                "score_mode": "sum"
              }
            }
          ]
        }
      },
      "script": {
        "source": "if (_score >= params.min_price && _score <=params.max_price) { 1 } else { 0 }",
        "params": {
          "min_price": 100,
          "max_price": 200
        }
      },
      "min_score": 1
    }
  }
}

这个有用吗？是的，在某种程度上。在elasticsarch中，分数是非负32位浮点数。所以，那里没有太多的精确度，如果你的调整是负的，事情就会变得更加复杂。

我会在生产中做这样的事情吗？我不会。我要做的是以某种易于解析的格式将特殊日期存储在主文档中，以便我可以在查询阶段访问它。然后在

script

查询和

script_field

中从主文档中解析它。是的，您需要解析它两次，但正如我在回答开头提到的那样，我们对此无能为力，因为这些操作是在不同阶段执行的。最简单的方法是将其存储为多值关键字字段。基本上，你可以做这样的事情：

DELETE test
PUT test
{
  "mappings": {
    "properties": {
      "default_fee": {
        "type": "double"
      },
      "custom_dates": {
        "type": "keyword"
      }
    } 
  }
}

PUT test/_doc/33952?refresh
{
  "default_fee": 12,
  "custom_dates": ["2023-11-01:100", "2023-11-02:150"],
  "options": [
    {
      "id": 95,
      "cost": 5,
      "type": [
        "Car"
      ]
    }
  ]
}

PUT test/_doc/33953?refresh
{
  "default_fee": 24,
  "custom_dates": ["2023-11-01:12", "2023-11-02:1"],
  "options": [
    {
      "id": 95,
      "cost": 5,
      "type": [
        "Truck"
      ]
    }
  ]
}


POST test/_search
{
  "query": {
    "script": {
      "script": {
        "source": """
          DateTimeFormatter formatter = DateTimeFormatter.ofPattern('yyyy-MM-dd');
          def from = LocalDate.parse(params.checkin, formatter);
          def to = LocalDate.parse(params.checkout, formatter);
          def stay = java.time.temporal.ChronoUnit.DAYS.between(from, to);

          def custom_prices = [10];
          if (doc.containsKey('custom_dates')) {
            custom_prices = doc['custom_dates'].stream()
            .map(date_price -> {
              def date_price_parsed = date_price.splitOnToken(':');
              def date = LocalDate.parse(date_price_parsed[0], formatter);
              if (!date.isBefore(from) && !date.isAfter(to.minusDays(1))) {
                return Double.parseDouble(date_price_parsed[1]);
              } else {
                return -1;
              }
            })
            .filter(price -> {return price > 0;})
            .collect(Collectors.toList());
          }
          def custom_price = custom_prices.sum();
          def default_price = stay == custom_prices.size() ? 0 : (stay - custom_prices.size()) * doc['default_fee'].value;
          def calc_price = default_price + custom_price;
          return calc_price >= params.min_price && calc_price <= params.max_price; 

        """,
        "params": {
          "checkin": "2023-10-31",
          "checkout": "2023-11-02",
          "min_price": 100,
          "max_price": 200
        }
      }
    }
  },
  "script_fields": {
    "total": {
      "script": {
        "source": """
          DateTimeFormatter formatter = DateTimeFormatter.ofPattern('yyyy-MM-dd');
          def from = LocalDate.parse(params.checkin, formatter);
          def to = LocalDate.parse(params.checkout, formatter);
          def stay = java.time.temporal.ChronoUnit.DAYS.between(from, to);
          
          def custom_prices = [10];
          if (doc.containsKey('custom_dates')) {
            custom_prices = doc['custom_dates'].stream()
            .map(date_price -> {
              def date_price_parsed = date_price.splitOnToken(':');
              def date = LocalDate.parse(date_price_parsed[0], formatter);
              if (!date.isBefore(from) && !date.isAfter(to.minusDays(1))) {
                return Double.parseDouble(date_price_parsed[1]);
              } else {
                return -1;
              }
            })
            .filter(price -> {return price > 0;})
            .collect(Collectors.toList());
          }
          def custom_price = custom_prices.sum();
          def default_price = stay == custom_prices.size() ? 0 : (stay - custom_prices.size()) * doc['default_fee'].value;
          def calc_price = default_price + custom_price;
          return calc_price; 

        """,
        "params": {
          "checkin": "2023-10-31",
          "checkout": "2023-11-02"
        }
      }
    }
  },
  "_source": [
    "*"
  ]
}

如何使用elasticsearch迭代脚本过滤器查询中的索引数据？

问题描述投票：0回答：1

1个回答

最新问题

如何使用elasticsearch迭代脚本过滤器查询中的索引数据？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1