索引扫描期间命中+读取的缓冲区太多

6vl6ewon 于 2021-07-26 发布在 Java

关注(0)|答案(1)|浏览(254)

我有两张table User 以及 Info . 我正在编写一个带有内部连接的简单查询，并将结果插入一个未标记的表中。

INSERT INTO Result (
iProfileId,email,format,content
) 
SELECT
  COALESCE(N1.iprofileId, 0),
  Lower(N1.email),
  W0.format,
  W0.content
FROM
  Info W0,
  User N1
where
  (N1.iprofileId = W0.iId);
``` `Info` 表有30m行和 `User` 这张table有158m行。由于某些原因，此查询在我的一个产品设置上花费的时间太长。乍一看，它似乎读取/命中了太多缓冲区：

Insert on Result (cost=152813.60..15012246.06 rows=31198136 width=1080) (actual time=5126063.502..5126063.502 rows=0 loops=1)
Buffers: shared hit=128815094 read=6103564 dirtied=599445 written=2088037
I/O Timings: read=2563306.517 write=570919.940
-> Merge Join (cost=152813.60..15012246.06 rows=31198136 width=1080) (actual time=0.097..5060947.922 rows=31191937 loops=1)
Merge Cond: (w0.iid = n1.iprofileid)
Buffers: shared hit=96480126 read=5574864 dirtied=70745 written=2009998
I/O Timings: read=2563298.981 write=562810.833
-> Index Scan using user_idx on info w0 (cost=0.56..2984094.60 rows=31198136 width=35) (actual time=0.012..246299.026 rows=31191937 loops=1)
Buffers: shared hit=481667 read=2490602 written=364347
I/O Timings: read=178000.987 write=38663.457
-> Index Scan using profile_id on user n1 (cost=0.57..14938848.88 rows=158842848 width=32) (actual time=0.020..4718272.082 rows=115378606 loops=1)
Buffers: shared hit=95998459 read=3084262 dirtied=70745 written=1645651
I/O Timings: read=2385297.994 write=524147.376
Planning Time: 11.531 ms
Execution Time: 5126063.577 ms

当我在一个不同的设置上运行这个查询，但是有相似的表和记录数时，profile\u id scan只使用5m页（以3m运行），而在这里它使用（read+hit）100m缓冲区（以1.45h运行）。当我检查使用真空详细这张表只有10万页。

INFO: "User": found 64647 removable, 109184385 nonremovable row versions in 6876625 out of 10546400 pages

这是一个很好的运行，但我们看到这个查询也需要4-5个小时。我的测试系统运行了不到3分钟也有 `iid` 分布于 `profile_id` 射程。但与prod系统相比，它的列和索引更少。为什么这么慢？

sql postgresql query-optimization

来源：https://stackoverflow.com/questions/62459331/too-many-buffers-hitread-during-index-scan

1条答案

按热度按时间

ecfsfe2w1#

你展示的执行计划有很多肮脏的书面页面。这表明表是新插入的，并且您的查询是第一个读取器。
在postgresql中，新表行的第一个读取器查阅提交日志，查看该行是否可见（创建该行的事务是否提交了？）。然后在行中设置标志（所谓的提示位）以避免下一个读卡器遇到这种麻烦。
设置提示位会修改行，因此块会被弄脏，最终必须写入磁盘。这种写入通常由检查点或后台编写器完成，但它们跟不上，因此查询必须自己清除许多脏页。
如果再运行一次查询，速度会更快。因此，这是一个好主意 VACUUM 批量加载后的表，这也将设置提示位。
然而，像这样的大型查询总是很慢的。您可以尝试进一步加快它的速度：
拥有大量ram并将表加载到共享缓冲区中 pg_prewarm 启动 work_mem 希望得到更快的散列连接 CLUSTER 这些表使用索引，因此堆获取变得更有效

赞(0）回复(0）举报 2021-07-26

我来回答

索引扫描期间命中+读取的缓冲区太多

1条答案

相关问题

热门标签

最新问答