elasticsearch Histogram field type 使用及注意事项

x33g5p2x  于2022-02-20 转载在 ElasticSearch  
字(5.0k)|赞(0)|评价(0)|浏览(287)

Histogram

先附上文档链接: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/histogram.html

当在网络上搜索 elasticsearch Histogram 时,会有两个结果:

  • type Histogram
  • aggregation Histogram

但是 对于 aggregation 的结果会比较多,而 type 的却很少,那么,本篇博文主要记录 type Histogram 的使用以及注意事项。ps(本篇博文还有一些未理解的点待调研,因此,本篇博文会不断更新)

Histogram field type

Histogram 是由两个成对数组定义的类型。
它有以下注意事项:

  • values 存储类型为 double 而且必须升序
  • counts 必须是 integet 必须是正整数或者0
  • 这两个数组的长度是一致的,这是因为他们的值一 一 对应
  • 并且不支持 嵌套数组,以及排序。

Histogram 存储的数据为二进制文档,而不是索引,这样可以更快速的聚合,它的字节大小最多为 13*数组的长度。

Quick start

添加 mapping

PUT histogram_test
{
  "mappings" : {
    "properties" : {
      "my_histogram" : {
        "type" : "histogram"
      },
      "my_text" : {
        "type" : "keyword"
      }
    }
  }
}

添加数据

PUT histogram_test/_doc/1
{
  "my_text" : "histogram_1",
  "my_histogram" : {
      "values" : [0.1, 0.2, 0.3, 0.4, 0.5], 
      "counts" : [3, 7, 23, 12, 6] 
   }
}
PUT histogram_test/_doc/2
{
  "my_text" : "histogram_2",
  "my_histogram" : {
      "values" : [0.1, 0.2, 0.3, 0.4, 1], 
      "counts" : [3, 7, 23, 12, 6] 
   }
}

Error example

错误示范: 添加 values 不是递增的字段

PUT histogram_test/_doc/1
{
  "my_text" : "histogram_1",
  "my_histogram" : {
      "values" : [0.1, 0.2, 0.1, 0.4, 0.5], 
      "counts" : [3, 7, 23, 12, 6] 
   }
}
 
***********result************** 
{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "error parsing field [my_histogram], [values] values must be in increasing order, got [0.1] but previous value was [0.2]"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "failed to parse field [my_histogram] of type [histogram]",
    "caused_by" : {
      "type" : "mapper_parsing_exception",
      "reason" : "error parsing field [my_histogram], [values] values must be in increasing order, got [0.1] but previous value was [0.2]"
    }
  },
  "status" : 400
}

错误示范:counts 的数值小于0

PUT histogram_test/_doc/3
{
  "my_text" : "histogram_3",
  "my_histogram" : {
      "values" : [0.1, 0.2, 0.3, 0.4, 1], 
      "counts" : [3, 7, 23, 12, -6] 
   }
}
 
***********result**************
 
{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "error parsing field [my_histogram], [counts] elements must be >= 0 but got -6"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "failed to parse field [my_histogram] of type [histogram]",
    "caused_by" : {
      "type" : "mapper_parsing_exception",
      "reason" : "error parsing field [my_histogram], [counts] elements must be >= 0 but got -6"
    }
  },
  "status" : 400
}

Aggregation

  • min aggregation
  • max aggregation
  • sum aggregation
  • value_count aggregation
  • avg aggregation
  • percentiles aggregation (ps 还没搞懂,待调研)
  • percentile ranks aggregation (ps 还没搞懂,待调研)
  • boxplot aggregation (ps 还没搞懂,待调研)
  • histogram aggregation
  • range aggregation (ps 还没搞懂,待调研)
min aggregation

将 values 中 最小的值返回

GET /histogram_test/_search
{
  "aggs": {
    "min_latency": {
      "min": {
        "field": "my_histogram"
      }
    }
  }
}
**********************value********************
 
 "aggregations" : {
    "min_latency" : {
      "value" : 0.1
    }
  }
max

将 values 中 最大的值返回

GET /histogram_test/_search
{
  "aggs": {
    "max_histogram": {
      "max": {
        "field": "my_histogram"
      }
    }
  }
}
**********************value********************
"aggregations" : {
    "max_histogram" : {
      "value" : 1.0
    }
  }
sum

将 values 和 counts 的一一对应的值进行相乘,最后在一起相加。

GET /histogram_test/_search
{
  "aggs": {
    "sum_histogram": {
      "sum": {
        "field": "my_histogram"
      }
    }
  }
}
**********************value********************
"aggregations" : {
    "sum_histogram" : {
      "value" : 35.8
    }
  }
value_count

对所有 counts 的值进行相加。

GET /histogram_test/_search
{
  "aggs": {
    "count_histogram": {
      "value_count": {
        "field": "my_histogram"
      }
    }
  }
}
**********************value********************
  "aggregations" : {
    "count_histogram" : {
      "value" : 102
    }
  }
avg

将值数组 values 中的每个数字乘以其在计数数组 counts 中的关联计数。最终,它将计算所有直方图的这些值的平均值,可以理解成 sum / count.

GET /histogram_test/_search
{
  "aggs": {
    "avg_histogram": {
      "avg": {
        "field": "my_histogram"
      }
    }
  }
}
**********************value********************
"aggregations" : {
    "avg_histogram" : {
      "value" : 0.3509803921568627
    }
  }
histogram aggregation

根据 values 计算出每个区间的数量。
interval 区间的间隔数。

GET /histogram_test/_search
{
  "aggs": {
    "histogram_histogram": {
      "histogram": {
        "field": "my_histogram",
        "interval": 0.5
      }
    }
  }
}
**********************value********************
"aggregations" : {
    "histogram_histogram" : {
      "buckets" : [
        {
          "key" : 0.0,
          "doc_count" : 90
        },
        {
          "key" : 0.5,
          "doc_count" : 6
        },
        {
          "key" : 1.0,
          "doc_count" : 6
        }
      ]
    }
  }

Query

只有指定的查询才可用。

exists query
GET /histogram_test/_search
{
  "query": {
    "exists": {
      "field": "my_histogram"
    }
  }
}

END

博文中的待调研的部分,博主会在后续的时间里进行补齐,欢迎小伙伴们多多交流。

相关文章