过滤结果并操作文档

efzxgjgh 于 2021-06-09 发布在 ElasticSearch

关注(0)|答案(2)|浏览(394)

我有以下查询-可以正常工作（这可能不是实际的查询）：

{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "location",
            "query": {
              "geo_distance": {
                "distance": "16090km",
                "distance_type": "arc",
                "location.point": {
                  "lat": "51.794177",
                  "lon": "-0.063055"
                }
              }
            }
          }
        },
        {
          "geo_distance": {
            "distance": "16090km",
            "distance_type": "arc",
            "location.point": {
              "lat": "51.794177",
              "lon": "-0.063055"
            }
          }
        }
      ]
    }
  }
}

尽管我希望执行以下操作（作为查询的一部分，但不影响现有查询）：
查找所有具有 field_name = 1 在所有文件上 field_name = 1 按地理距离运行排序
删除已删除的重复项 field_name = 1 同样的值 field_name_2 = 2 并在文档结果中保留最接近的项，但删除其余项
更新（进一步说明）：
不能使用聚合，因为我们要在结果中操作文档。
同时维护文件内的秩序；含义：
如果我有20个文档，按字段排序；我有5个字段的名称为1，我想按距离对这5个进行排序，去掉其中的4个；同时仍然保持第一种(可能在实际查询之前进行大地距离排序和消除？）
不太清楚如何做到这一点，任何帮助是感激的-我目前正在使用elasticsearch dsl drf-但我可以很容易地将查询转换为elasticsearch dsl。
示例文档（操作前）：

[{
"field_name": 1,
"field_name_2": 2,
"location": ....
},
{
"field_name": 1,
"field_name_2": 2,
"location": ....
},
{
"field_name": 55,
"field_name_5": 22,
"location": ....
}]

输出（所需）：

[{
"field_name": 1,
"field_name_2": 2,
"location": .... <- closest
},
{
"field_name": 55,
"field_name_5": 22,
"location": ....
}]

elasticsearch elasticsearch-aggregation elasticsearch-dsl

来源：https://stackoverflow.com/questions/65089604/elasticsearch-filtering-a-result-and-manipulating-the-documents

2条答案

按热度按时间

67up9zun1#

实现所需功能的一种方法是保持查询部分的现有状态（这样您仍然可以获得所需的命中率），并添加聚合部分，以便获得最接近的文档，并在其上添加一个附加条件 filed_name . 聚合部分将由以下部分组成：
一 filter 聚合以仅考虑具有 field_name = 1 一 geo_distance 距离很小的聚集
一 top_hits 聚合以返回距离最近的文档
聚合部分如下所示：

{
  "query": {
    ...same as you have now...
  },
  "aggs": {
    "field_name": {
      "filter": {
        "term": {
          "field_name": 1           <--- only select desired documents
        }
      },
      "aggs": {
        "geo_distance": {
          "field": "location.point",
          "unit": "km",
          "distance_type": "arc",
          "origin": {
            "lat": "51.794177",
            "lon": "-0.063055"
          },
          "ranges": [
            {
              "to": 1               <---- single bucket for docs < 1km (change as needed)
            }
          ]
        },
        "aggs": {
          "closest": {
            "top_hits": {
              "size": 1,            <---- closest document
              "sort": [
                {
                  "_geo_distance": {
                    "location.point": {
                      "lat": "51.794177",
                      "lon": "-0.063055"
                    },
                    "order": "asc",
                    "unit": "km",
                    "mode": "min",
                    "distance_type": "arc",
                    "ignore_unmapped": true
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}

赞(0）回复(0）举报 2021-06-09

5jvtdoz22#

这可以使用 Field Collapsing -相当于分组下面是如何实现这一点的示例：

{"collapse": {"field": "vin",
              "inner_hits": {
                  "name": "closest_dealer",
                  "size": 1,
                  "sort": [
                      {
                          "_geo_distance": {
                              "location.point": {
                                  "lat": "latitude",
                                  "lon": "longitude"
                              },
                              "order": "desc",
                              "unit": "km",
                              "distance_type": "arc",
                              "nested_path": "location"
                          }
                      }
                  ]
              }
              }
 }

塌陷是在场地上完成的 vin -以及 inner_hits 用于对分组项进行排序并获取最接近的项(尺寸=1）

赞(0）回复(0）举报 2021-06-09

我来回答

过滤结果并操作文档

2条答案

相关问题

热门标签

最新问答