如何使用elasticsearch在脚本过滤查询中覆盖索引数据?

p8ekf7hl  于 7个月前  发布在  ElasticSearch
关注(0)|答案(1)|浏览(81)

我有以下索引数据

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 7992,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_id": "33952",
        "default_fee": 12,
        "custom_dates": [
          {
            "date": "2023-11-01",
            "price": 100
          },
          {
            "date": "2023-11-02",
            "price": 50
          }
        ],
        "options": [
          {
            "id": 95,
            "cost": 5,
            "type": [
              "Car"
            ]
          }
        ]
      }
    ]
  }
}

字符串
我已经添加了一个脚本字段作为总计算总运行时如下

{
  script_fields: {
    total: {
      script: {
        source: "
          DateTimeFormatter formatter = DateTimeFormatter.ofPattern('yyyy-MM-dd');
          def from = LocalDate.parse(params.checkin, formatter);
          def to = LocalDate.parse(params.checkout, formatter);
          def stay = params.total_stay;

          def custom_price_dates = [];
          if (params['_source']['custom_dates'] != null && !params['_source']['custom_dates'].isEmpty()) {
            custom_price_dates = params['_source']['custom_dates'].stream()
            .filter(filter_doc -> {
              def date = LocalDate.parse(filter_doc.start_date, formatter);
              return !date.isBefore(from) && !date.isAfter(to.minusDays(1));
            })
            .collect(Collectors.toList());
          }

          def custom_price = custom_price_dates.stream().mapToDouble(custom_doc -> custom_doc.price).sum();
          def default_price = stay == custom_price_dates.size() ? 0 : (stay - custom_price_dates.size()) * params['_source']['default_fee'];
          def calc_price = default_price + custom_price;
          return calc_price; 
        ",
        params: {
          checkin: Date.current.to_s,
          checkout: Date.current.to_s,
          total_stay: 2
        }
      }
    }
  },
  _source: ["*"]
}


这将返回脚本字段上的总计。现在我想根据上述总计的范围进行过滤。我如何实现它?我尝试使用脚本查询如下,但它不会通过自定义日期循环,因为它是嵌套类型。
此外,我不能索引总前入住和退房日期是动态的,可能会有自定义的价格在给定的入住和退房日期。请建议。

5f0d552i

5f0d552i1#

这是可以做到的,但它很复杂。首先,我们需要了解这个搜索是在两个阶段执行的-查询和获取。在查询阶段,每个分片收集前10个点击及其排序键。(默认为_score),在获取阶段,协调节点从所有分片收集这些ID和排序键,从它们中选择前10个,然后要求每个分片返回那里的文档。脚本字段是在获取阶段计算的,因此过滤器无法访问它们。
更糟糕的是,您将自定义日期作为嵌套对象进行索引。在内部,嵌套对象作为单独的对象进行索引,将信息从它们传递到主查询的唯一方法是通过_score。所以,基本上,为了实现你试图用嵌套对象实现的目标,你需要将price编码到_score。为了简化计算,我们需要在嵌套对象中存储价格差而不是实际价格。因此,如果默认价格是12,特价是100,我们需要存储88。
然后我们可以找到所有与我们的日期范围匹配的嵌套对象:

{
              "nested": {
                "path": "custom_dates",
                "query": {
                  "range": {
                    "custom_dates.start_date": {
                      "gte": "2023-10-31",
                      "lte": "2023-11-02"
                    }
                  }
                }
              }
            }

字符串
然后我们可以将其 Package 到脚本score中,它将score替换为price:

{
              "nested": {
                "path": "custom_dates",
                "query": {
                  "script_score": {
                    "script": {
                      "source": "doc['custom_dates.price_adjustment'].value"
                    },
                    "query": {
                      "range": {
                        "custom_dates.start_date": {
                          "gte": "2023-10-31",
                          "lte": "2023-11-02"
                        }
                      }
                    }
                  }
                },
                "score_mode": "sum"
              }
            }


然后我们可以使用另一个script_score来计算默认价格:

{
              "script_score": {
                "script": {
                  "params": {
                    "total_stay": 3
                  },
                  "source": "doc['default_fee'].value * params.total_stay"
                },
                "query": {
                  "match_all": {}
                }
              }
            }


然后我们可以把它们合并组合成两个should子句来增加分数。
所以,现在我们让_score等于分配给每条记录的价格。最后一步是通过_score过滤记录,这可以通过另一个script_scoremin_score参数来完成:

"script_score": {
      "query": {
        "bool": {
          "should": [
            {
              .... default price calculation ....
            },
            {
              .... adjusted price calculation ....
            }
          ]
        }
      },
      "script": {
        "source": "if (_score >= params.min_price && _score <=params.max_price) { 1 } else { 0 }",
        "params": {
          "min_price": 100,
          "max_price": 200
        }
      },
      "min_score": 1
    }


如果我们把这些放在一起,我们会得到这样的东西:

DELETE test
PUT test
{
  "mappings": {
    "properties": {
      "default_fee": {
        "type": "double"
      },
      "custom_dates": {
        "type": "nested",
        "properties": {
          "start_date": {
            "type": "date"
          },
          "price_adjustment": {
            "type": "double"
          }
        }
      }
    } 
  }
}

PUT test/_doc/33952?refresh
{
  "default_fee": 12,
  "custom_dates": [
    {
      "start_date": "2023-11-01",
      "price_adjustment": 88
    },
    {
      "start_date": "2023-11-02",
      "price_adjustment": 38
    }
  ],
  "options": [
    {
      "id": 95,
      "cost": 5,
      "type": [
        "Car"
      ]
    }
  ]
}

PUT test/_doc/33953?refresh
{
  "default_fee": 24,
  "custom_dates": [
    {
      "start_date": "2023-11-01",
      "price_adjustment": 12
    },
    {
      "start_date": "2023-11-02",
      "price_adjustment": 1
    }
  ],
  "options": [
    {
      "id": 95,
      "cost": 5,
      "type": [
        "Truck"
      ]
    }
  ]
}

POST test/_search
{
  "query": {
    "script_score": {
      "query": {
        "bool": {
          "should": [
            {
              "script_score": {
                "script": {
                  "params": {
                    "total_stay": 3
                  },
                  "source": "doc['default_fee'].value * params.total_stay"
                },
                "query": {
                  "match_all": {}
                }
              }
            },
            {
              "nested": {
                "path": "custom_dates",
                "query": {
                  "script_score": {
                    "script": {
                      "source": "doc['custom_dates.price_adjustment'].value"
                    },
                    "query": {
                      "range": {
                        "custom_dates.start_date": {
                          "gte": "2023-10-31",
                          "lte": "2023-11-02"
                        }
                      }
                    }
                  }
                },
                "score_mode": "sum"
              }
            }
          ]
        }
      },
      "script": {
        "source": "if (_score >= params.min_price && _score <=params.max_price) { 1 } else { 0 }",
        "params": {
          "min_price": 100,
          "max_price": 200
        }
      },
      "min_score": 1
    }
  }
}


这个有用吗是的,在一定程度上。在elasticsarch分数是非负的32位浮点数。所以,那里没有太多的精度,如果你的调整是负的,它会使事情变得更加复杂。
我会在生产环境中做这样的事情吗?我不会。我会做的是将特殊日期以某种易于解析的格式存储在主文档中,以便我可以在查询阶段访问它。然后在script查询和script_field中从主文档解析它。是的,您需要解析两次,但正如我在回答的开头提到的,我们对此无能为力,因为这些操作是在不同的阶段执行的。最简单的方法是将其存储为多值关键字字段。基本上,你可以这样做:

DELETE test
PUT test
{
  "mappings": {
    "properties": {
      "default_fee": {
        "type": "double"
      },
      "custom_dates": {
        "type": "keyword"
      }
    } 
  }
}

PUT test/_doc/33952?refresh
{
  "default_fee": 12,
  "custom_dates": ["2023-11-01:100", "2023-11-02:150"],
  "options": [
    {
      "id": 95,
      "cost": 5,
      "type": [
        "Car"
      ]
    }
  ]
}

PUT test/_doc/33953?refresh
{
  "default_fee": 24,
  "custom_dates": ["2023-11-01:12", "2023-11-02:1"],
  "options": [
    {
      "id": 95,
      "cost": 5,
      "type": [
        "Truck"
      ]
    }
  ]
}

POST test/_search
{
  "query": {
    "script": {
      "script": {
        "source": """
          DateTimeFormatter formatter = DateTimeFormatter.ofPattern('yyyy-MM-dd');
          def from = LocalDate.parse(params.checkin, formatter);
          def to = LocalDate.parse(params.checkout, formatter);
          def stay = java.time.temporal.ChronoUnit.DAYS.between(from, to);

          def custom_prices = [10];
          if (doc.containsKey('custom_dates')) {
            custom_prices = doc['custom_dates'].stream()
            .map(date_price -> {
              def date_price_parsed = date_price.splitOnToken(':');
              def date = LocalDate.parse(date_price_parsed[0], formatter);
              if (!date.isBefore(from) && !date.isAfter(to.minusDays(1))) {
                return Double.parseDouble(date_price_parsed[1]);
              } else {
                return -1;
              }
            })
            .filter(price -> {return price > 0;})
            .collect(Collectors.toList());
          }
          def custom_price = custom_prices.sum();
          def default_price = stay == custom_prices.size() ? 0 : (stay - custom_prices.size()) * doc['default_fee'].value;
          def calc_price = default_price + custom_price;
          return calc_price >= params.min_price && calc_price <= params.max_price; 

        """,
        "params": {
          "checkin": "2023-10-31",
          "checkout": "2023-11-02",
          "min_price": 100,
          "max_price": 200
        }
      }
    }
  },
  "script_fields": {
    "total": {
      "script": {
        "source": """
          DateTimeFormatter formatter = DateTimeFormatter.ofPattern('yyyy-MM-dd');
          def from = LocalDate.parse(params.checkin, formatter);
          def to = LocalDate.parse(params.checkout, formatter);
          def stay = java.time.temporal.ChronoUnit.DAYS.between(from, to);
          
          def custom_prices = [10];
          if (doc.containsKey('custom_dates')) {
            custom_prices = doc['custom_dates'].stream()
            .map(date_price -> {
              def date_price_parsed = date_price.splitOnToken(':');
              def date = LocalDate.parse(date_price_parsed[0], formatter);
              if (!date.isBefore(from) && !date.isAfter(to.minusDays(1))) {
                return Double.parseDouble(date_price_parsed[1]);
              } else {
                return -1;
              }
            })
            .filter(price -> {return price > 0;})
            .collect(Collectors.toList());
          }
          def custom_price = custom_prices.sum();
          def default_price = stay == custom_prices.size() ? 0 : (stay - custom_prices.size()) * doc['default_fee'].value;
          def calc_price = default_price + custom_price;
          return calc_price; 

        """,
        "params": {
          "checkin": "2023-10-31",
          "checkout": "2023-11-02"
        }
      }
    }
  },
  "_source": [
    "*"
  ]
}

相关问题