elasticsearch 查询

es中的查询请求有两种方式，一种是简易版的查询，另外一种是使用JSON完整的请求体，叫做结构化查询（DSL）。由于DSL查询更为直观也更为简易，所以大都使用这种方式。DSL查询是POST过去一个json，由于post的请求是json格式的，所以存在很多灵活性，也有很多形式。这里有一个地方注意的是官方文档里面给的例子的json结构只是一部分，并不是可以直接黏贴复制进去使用的。一般要在外面加个query为key的机构。

路由查询

官方文档地址：https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html

通过url query参数来实现搜索，常用参数如下：

q：指定查询的语句；
df：df指定要查询的字段；
sort：排序；
timeout：指定过期时间；
form,size：用于分页
例如：

#查询user字段含有alfred的文档，结果按照age升序排列，返回5~14个文档，如果超过1s没有结束，则已超时结束
GET  /my_index/_search?q=alfred&df=user&sort=age:asc&from=4&size=10&timeout=1s

Request body search

通过body参数来实现搜索。

(1) match查询

match查询也叫模糊查询。matcha查询会先对搜索词进行分词，分词完毕后再逐个对分词结果进行匹配。match还有两个相似的功能，一个是match_phrase，一个叫multi_match。
例子：

#创建索引以及准备数据
PUT my_index
{
  "mappings": {
    "_doc": {
      "dynamic":"strict",
      "properties": {
        "title" : {
          "type":"text"
        },
        "name":{
          "type" : "keyword"
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "name" : "张三",
  "title" : "我的宝马有222马力"
}

PUT my_index/_doc/2
{
  "name" : "李四",
  "title" : "我的奥迪有220马力"
}

PUT my_index/_doc/3
{
  "name" : "王五",
  "title" : "我的玛莎拉蒂有250马力"
}
#match查询
POST my_index/_doc/_search
{
  "query": {
    "match": {
      "title": "宝马玛力"
    }
  },
  "highlight":{
    "pre_tags":"<tag1>",
    "post_tags" : "</tag1>",
    "fields":{"title":{}}
  }
}
#返回结果
{
  "took": 17,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.970927,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.970927,
        "_source": {
          "name": "张三",
          "title": "我的宝马有222马力"
        },
        "highlight": {
          "title": [
            "我的<tag1>宝</tag1><tag1>马</tag1>有222<tag1>马</tag1><tag1>力</tag1>"
          ]
        }
      },
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.8630463,
        "_source": {
          "name": "王五",
          "title": "我的玛莎拉蒂有250马力"
        },
        "highlight": {
          "title": [
            "我的<tag1>玛</tag1>莎拉蒂有250<tag1>马</tag1><tag1>力</tag1>"
          ]
        }
      },
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "name": "李四",
          "title": "我的奥迪有220马力"
        },
        "highlight": {
          "title": [
            "我的奥迪有220<tag1>马</tag1><tag1>力</tag1>"
          ]
        }
      }
    ]
  }
}

说明：match查询会将查询词“宝马玛力”分解成一个一个词语，“宝”，“马”，“玛”，“力”再去匹配，返回查询结果

(2) match_phrase查询（短语匹配）

和match查询类似，match_phrase查询首先解析查询字符串来产生一个词条列表。然后会搜索所有的词条，但只保留包含了所有搜索词条的文档，并且词条的位置要邻接。简单理解就是必须含有搜索词的所有词根，没做限制则还要毗邻。

#增加多一条数据
PUT my_index/_doc/5
{
  "name" : "陈六",
  "title" : "我的宝玛有250马力"
}
#查询
POST my_index/_doc/_search
{
  "query": {
    "match_phrase": {
      "title": "宝马玛力"
    }
  },
  "highlight":{
    "pre_tags":"<h1>",
    "post_tags" : "</h1>",
    "fields":{"title":{}}
  }
}
#返回结果
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

说明：因为没有文档含有搜索词的所有词条且毗邻。
完全匹配可能比较严，我们会希望有个可调节因子，少匹配一个也满足，那就需要使用到slop。
例如：

#添加多两条数据
PUT my_index/_doc/6
{
  "name" : "陈六",
  "title" : "我的宝马的玛力有250马力"
}
PUT my_index/_doc/7
{
  "name" : "陈六",
  "title" : "我的宝马的李玛力有250马力"
}
#查询
POST my_index/_doc/_search
{
  "query": {
    "match_phrase": {
      "title": {
        "query":"宝马玛力",
        "slop" : 1
      }
    }
  },
  "highlight":{
    "pre_tags":"<h1>",
    "post_tags" : "</h1>",
    "fields":{"title":{}}
  }
}
#返回结果
{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.0359334,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "6",
        "_score": 1.0359334,
        "_source": {
          "name": "陈六",
          "title": "我的宝马的玛力有250马力"
        },
        "highlight": {
          "title": [
            "我的<h1>宝</h1><h1>马</h1>的<h1>玛</h1><h1>力</h1>有250马力"
          ]
        }
      }
    ]
  }
}

说明："宝马的玛力"我的宝马的玛力有250马力"含有所以查询词条，且位置差一个

(2) multi_match查询

如果我们希望两个字段进行匹配，其中一个字段有这个文档就满足的话，使用multi_match

#增加多一条数据
PUT my_index/_doc/9
{
  "name" : "玛力",
  "title" : "我有一辆红旗"
}
#查询
POST my_index/_doc/_search
{
  "query": {
    "multi_match": {
      "query":"玛力",
      "fields":["title","name"]
    }
  },
  "highlight":{
    "pre_tags":"<h1>",
    "post_tags" : "</h1>",
    "fields":{"title":{}}
  }
}
#结果
{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "9",
        "_score": 0.2876821,
        "_source": {
          "name": "玛力",
          "title": "我有一辆红旗"
        }
      },
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": "张三",
          "title": "我的宝马有222马力"
        },
        "highlight": {
          "title": [
            "我的宝马有222马<h1>力</h1>"
          ]
        }
      }
    ]
  }
}

但是multi_match就涉及到匹配评分的问题

我们希望完全匹配的文档占的评分比较高，则需要使用best_fields
我们希望越多字段匹配的文档评分越高，就要使用most_fields
我们会希望这个词条的分词词汇是分配到不同字段中的，那么就使用cross_fields

POST my_index/_doc/_search
{
  "query": {
    "multi_match": {
      "query":"玛力",
      "fields":["title","name"],
      "type" : "best_fields"
    }
  },
  "highlight":{
    "pre_tags":"<h1>",
    "post_tags" : "</h1>",
    "fields":{"title":{}}
  }
}

term查询

term是代表完全匹配，即不进行分词器分析，文档中必须包含整个搜索的词汇
使用term要确定的是这个字段是否“被分析”(analyzed)，默认的字符串是被分析的。

DELETE my_index
PUT my_index
{
  "mappings": {
    "_doc": {
      "dynamic":"strict",
      "properties": {
        "title" : {
          "type":"text"
        },
        "name":{
          "type" : "keyword"
        }
      }
    }
  }
}
PUT my_index/_doc/1
{
  "name" : "张三",
  "title" : "我的宝马有222马力"
}
#查询
POST my_index/_doc/_search
{
  "query": {
    "term": {
      "title":"宝马"
    }
  }
}
#返回结果
{
  "took": 19,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

因为"title"字段的类型为"text"是被分析的，即拆词保存。没有直接保存"宝马"。所以不能被搜索出来

POST my_index/_doc/_search
{
  "query": {
    "term": {
      "name":"张三"
    }
  }
}
#返回结果
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": "张三",
          "title": "我的宝马有222马力"
        }
      }
    ]
  }
}

而"name"字段的类型为：keyword，不拆词，直接保存，所以能被检索出来
说明：当希望字段类型"text"的中文也能被"term"检索出来，则使用"ik_max_word"

DELETE my_index

PUT my_index
{
  "mappings": {
    "_doc": {
      "dynamic":"strict",
      "properties": {
        "title" : {
          "type":"text",
          "analyzer":"ik_max_word"
        },
        "name":{
          "type" : "keyword"
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "name" : "张三",
  "title" : "我的宝马有222马力"
}
#搜索
POST my_index/_doc/_search
{
  "query": {
    "term": {
      "title":"宝马"
    }
  }
}
#返回结果
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": "张三",
          "title": "我的宝马有222马力"
        }
      }
    ]
  }
}

bool联合查询: must,should,must_not

如果我们想要请求"title"中带"宝马"，但是"name"中不带"宝马"这样类似的需求，就需要用到bool联合查询。
联合查询就会使用到must,should,must_not三种关键词。
这三个可以这么理解

must: 文档必须完全匹配条件
should: should下面会带一个以上的条件，至少满足一个条件，这个文档就符合should
must_not: 文档必须不匹配条件

PUT my_index/_doc/2
{
  "name" : "宝马",
  "title" : "我的宝马x5有260马力"
}
PUT my_index/_doc/3
{
  "name" : "宝马",
  "title" : "我的奥迪有260马力"
}
#搜索
POST my_index/_doc/_search
{
  "query":{
    "bool":{
      "must":{
        "term":{
          "name":"宝马"
        }
      },
      "must_not":{
        "term": {
          "title": "宝马"
        }
      }
    }
  }
}
#返回结果
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "name": "宝马",
          "title": "我的奥迪有260马力"
        }
      }
    ]
  }
}

ElasticSearch6.x查询基础解析（一）