重新索引elastic search不会返回所有文档

vyu0f0g1  于 2021-06-15  发布在  ElasticSearch
关注(0)|答案(1)|浏览(432)

我的ElasticSearch中有大约150万个文档。我希望对它们重新编制索引,这样每个索引都会过滤包含特定关键字的文档,并且( null index )不包含我在其他索引中指定的任何关键字。我不知道为什么我的索引返回的文档比预期的少。特别是我预计未来大约有120万份文件 null index 但它在新索引中只返回了大约3万个文档。如果你知道我做错了什么,我会很感激的!
这就是我如何在多个字段中重新索引包含某些关键字的文档

curl --location --request POST 'http://abcdef2344:9200/_reindex' \
--header 'Content-Type: application/json' \
--data-raw '{
  "source": {
    "index": "mydocs_email_*",
    "query": {
      "bool": {
        "filter": [
          {
            "bool": {
              "should": [
                {
                  "multi_match": {
                    "fields": [
                      "content",
                      "meta.raw.Message:Raw-Header:Subject"
                    ],
                    "query": "keyword1"
                  }
                },
                {
                  "multi_match": {
                    "fields": [
                      "content",
                      "meta.raw.Message:Raw-Header:Subject"
                    ],
                    "query": "keyword2"
                  }
                }
              ]
            }
          }
        ]
      }
    }
  },
  "dest": {
    "index": "analysis_keywords"
  }
}'

然后我用 must_not 创建另一个不包含 keyword1 以及 keyword2 .

curl --location --request POST 'http://abcdef2344:9200/_reindex' \
--header 'Content-Type: application/json' \
--data-raw '{
  "source": {
    "index": "mydocs_email_*",
    "query": {
      "bool": {
        "filter": [
          {
            "bool": {
              "must_not": [
                {
                  "multi_match": {
                    "fields": [
                      "content",
                      "meta.raw.Message:Raw-Header:Subject"
                    ],
                    "query": "keyword1"
                  }
                },
                {
                  "multi_match": {
                    "fields": [
                      "content",
                      "meta.raw.Message:Raw-Header:Subject"
                    ],
                    "query": "keyword2"
                  }
                }
              ]
            }
          }
        ]
      }
    }
  },
  "dest": {
    "index": "analysis_null"
  }
}'

这个 null index 归还2970万份文件。从错误消息看来,我应该期待128万个文件。它还说我需要增加索引中字段的数量——我在运行上述代码之后也做了这一点。尽管文件的数量仍然保持不变。

{"took":53251,"timed_out":false,"total":1277428,"updated":243,"created":29755,"deleted":0,"batches":30,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[{"index":"analysis_null","type":"_doc","id":"/email/.......msg","cause":{"type":"illegal_argument_exception","reason":"Limit of total fields [1000] in index [analysis_null] has been exceeded"},"status":400}]
zpqajqem

zpqajqem1#

这个错误的意思正是它所说的——在重新索引期间,字段总数超过了一个硬限制。
在重新编制索引之前更改设置不会解决问题吗?

DELETE analysis_null

PUT analysis_null
{
  "settings": {
    "index.mapping.total_fields.limit": 10000
  }
}

相关问题