ElasticSearch6.x查询基础解析(一)

x33g5p2x  于2021-03-14 发布在 ElasticSearch  
字(10.4k)|赞(0)|评价(0)|浏览(498)

elasticsearch 查询

es中的查询请求有两种方式,一种是简易版的查询,另外一种是使用JSON完整的请求体,叫做结构化查询(DSL)。由于DSL查询更为直观也更为简易,所以大都使用这种方式。DSL查询是POST过去一个json,由于post的请求是json格式的,所以存在很多灵活性,也有很多形式。这里有一个地方注意的是官方文档里面给的例子的json结构只是一部分,并不是可以直接黏贴复制进去使用的。一般要在外面加个query为key的机构。

路由查询

官方文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html

通过url query参数来实现搜索,常用参数如下:

  • q:指定查询的语句;
  • df:df指定要查询的字段;
  • sort:排序;
  • timeout:指定过期时间;
  • form,size:用于分页
    例如:
#查询user字段含有alfred的文档,结果按照age升序排列,返回5~14个文档,如果超过1s没有结束,则已超时结束
GET  /my_index/_search?q=alfred&df=user&sort=age:asc&from=4&size=10&timeout=1s   

通过body参数来实现搜索。

(1) match查询

match查询也叫模糊查询。matcha查询会先对搜索词进行分词,分词完毕后再逐个对分词结果进行匹配。match还有两个相似的功能,一个是match_phrase,一个叫multi_match。
例子:

#创建索引以及准备数据
PUT my_index
{
  "mappings": {
    "_doc": {
      "dynamic":"strict",
      "properties": {
        "title" : {
          "type":"text"
        },
        "name":{
          "type" : "keyword"
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "name" : "张三",
  "title" : "我的宝马有222马力"
}

PUT my_index/_doc/2
{
  "name" : "李四",
  "title" : "我的奥迪有220马力"
}

PUT my_index/_doc/3
{
  "name" : "王五",
  "title" : "我的玛莎拉蒂有250马力"
}
#match查询
POST my_index/_doc/_search
{
  "query": {
    "match": {
      "title": "宝马玛力"
    }
  },
  "highlight":{
    "pre_tags":"<tag1>",
    "post_tags" : "</tag1>",
    "fields":{"title":{}}
  }
}
#返回结果
{
  "took": 17,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.970927,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.970927,
        "_source": {
          "name": "张三",
          "title": "我的宝马有222马力"
        },
        "highlight": {
          "title": [
            "我的<tag1>宝</tag1><tag1>马</tag1>有222<tag1>马</tag1><tag1>力</tag1>"
          ]
        }
      },
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.8630463,
        "_source": {
          "name": "王五",
          "title": "我的玛莎拉蒂有250马力"
        },
        "highlight": {
          "title": [
            "我的<tag1>玛</tag1>莎拉蒂有250<tag1>马</tag1><tag1>力</tag1>"
          ]
        }
      },
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "name": "李四",
          "title": "我的奥迪有220马力"
        },
        "highlight": {
          "title": [
            "我的奥迪有220<tag1>马</tag1><tag1>力</tag1>"
          ]
        }
      }
    ]
  }
}

说明:match查询会将查询词“宝马玛力”分解成一个一个词语,“宝”,“马”,“玛”,“力”再去匹配,返回查询结果

(2) match_phrase查询(短语匹配)

和match查询类似,match_phrase查询首先解析查询字符串来产生一个词条列表。然后会搜索所有的词条,但只保留包含了所有搜索词条的文档,并且词条的位置要邻接。简单理解就是必须含有搜索词的所有词根,没做限制则还要毗邻。

#增加多一条数据
PUT my_index/_doc/5
{
  "name" : "陈六",
  "title" : "我的宝玛有250马力"
}
#查询
POST my_index/_doc/_search
{
  "query": {
    "match_phrase": {
      "title": "宝马玛力"
    }
  },
  "highlight":{
    "pre_tags":"<h1>",
    "post_tags" : "</h1>",
    "fields":{"title":{}}
  }
}
#返回结果
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

说明:因为没有文档含有搜索词的所有词条且毗邻。
完全匹配可能比较严,我们会希望有个可调节因子,少匹配一个也满足,那就需要使用到slop。
例如:

#添加多两条数据
PUT my_index/_doc/6
{
  "name" : "陈六",
  "title" : "我的宝马的玛力有250马力"
}
PUT my_index/_doc/7
{
  "name" : "陈六",
  "title" : "我的宝马的李玛力有250马力"
}
#查询
POST my_index/_doc/_search
{
  "query": {
    "match_phrase": {
      "title": {
        "query":"宝马玛力",
        "slop" : 1
      }
    }
  },
  "highlight":{
    "pre_tags":"<h1>",
    "post_tags" : "</h1>",
    "fields":{"title":{}}
  }
}
#返回结果
{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.0359334,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "6",
        "_score": 1.0359334,
        "_source": {
          "name": "陈六",
          "title": "我的宝马的玛力有250马力"
        },
        "highlight": {
          "title": [
            "我的<h1>宝</h1><h1>马</h1>的<h1>玛</h1><h1>力</h1>有250马力"
          ]
        }
      }
    ]
  }
}

说明:"宝马的玛力"我的宝马的玛力有250马力"含有所以查询词条,且位置差一个

(2) multi_match查询

如果我们希望两个字段进行匹配,其中一个字段有这个文档就满足的话,使用multi_match

#增加多一条数据
PUT my_index/_doc/9
{
  "name" : "玛力",
  "title" : "我有一辆红旗"
}
#查询
POST my_index/_doc/_search
{
  "query": {
    "multi_match": {
      "query":"玛力",
      "fields":["title","name"]
    }
  },
  "highlight":{
    "pre_tags":"<h1>",
    "post_tags" : "</h1>",
    "fields":{"title":{}}
  }
}
#结果
{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "9",
        "_score": 0.2876821,
        "_source": {
          "name": "玛力",
          "title": "我有一辆红旗"
        }
      },
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": "张三",
          "title": "我的宝马有222马力"
        },
        "highlight": {
          "title": [
            "我的宝马有222马<h1>力</h1>"
          ]
        }
      }
    ]
  }
}

但是multi_match就涉及到匹配评分的问题

  • 我们希望完全匹配的文档占的评分比较高,则需要使用best_fields
  • 我们希望越多字段匹配的文档评分越高,就要使用most_fields
  • 我们会希望这个词条的分词词汇是分配到不同字段中的,那么就使用cross_fields
POST my_index/_doc/_search
{
  "query": {
    "multi_match": {
      "query":"玛力",
      "fields":["title","name"],
      "type" : "best_fields"
    }
  },
  "highlight":{
    "pre_tags":"<h1>",
    "post_tags" : "</h1>",
    "fields":{"title":{}}
  }
}

term查询

term是代表完全匹配,即不进行分词器分析,文档中必须包含整个搜索的词汇
使用term要确定的是这个字段是否“被分析”(analyzed),默认的字符串是被分析的。

DELETE my_index
PUT my_index
{
  "mappings": {
    "_doc": {
      "dynamic":"strict",
      "properties": {
        "title" : {
          "type":"text"
        },
        "name":{
          "type" : "keyword"
        }
      }
    }
  }
}
PUT my_index/_doc/1
{
  "name" : "张三",
  "title" : "我的宝马有222马力"
}
#查询
POST my_index/_doc/_search
{
  "query": {
    "term": {
      "title":"宝马"
    }
  }
}
#返回结果
{
  "took": 19,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

因为"title"字段的类型为"text"是被分析的,即拆词保存。没有直接保存"宝马"。所以不能被搜索出来

POST my_index/_doc/_search
{
  "query": {
    "term": {
      "name":"张三"
    }
  }
}
#返回结果
{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": "张三",
          "title": "我的宝马有222马力"
        }
      }
    ]
  }
}

而"name"字段的类型为:keyword,不拆词,直接保存,所以能被检索出来
说明:当希望字段类型"text"的中文也能被"term"检索出来,则使用"ik_max_word"

DELETE my_index

PUT my_index
{
  "mappings": {
    "_doc": {
      "dynamic":"strict",
      "properties": {
        "title" : {
          "type":"text",
          "analyzer":"ik_max_word"
        },
        "name":{
          "type" : "keyword"
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "name" : "张三",
  "title" : "我的宝马有222马力"
}
#搜索
POST my_index/_doc/_search
{
  "query": {
    "term": {
      "title":"宝马"
    }
  }
}
#返回结果
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": "张三",
          "title": "我的宝马有222马力"
        }
      }
    ]
  }
}

bool联合查询: must,should,must_not

如果我们想要请求"title"中带"宝马",但是"name"中不带"宝马"这样类似的需求,就需要用到bool联合查询。
联合查询就会使用到must,should,must_not三种关键词。
这三个可以这么理解

  • must: 文档必须完全匹配条件
  • should: should下面会带一个以上的条件,至少满足一个条件,这个文档就符合should
  • must_not: 文档必须不匹配条件
PUT my_index/_doc/2
{
  "name" : "宝马",
  "title" : "我的宝马x5有260马力"
}
PUT my_index/_doc/3
{
  "name" : "宝马",
  "title" : "我的奥迪有260马力"
}
#搜索
POST my_index/_doc/_search
{
  "query":{
    "bool":{
      "must":{
        "term":{
          "name":"宝马"
        }
      },
      "must_not":{
        "term": {
          "title": "宝马"
        }
      }
    }
  }
}
#返回结果
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "name": "宝马",
          "title": "我的奥迪有260马力"
        }
      }
    ]
  }
}

相关文章