不在子查询中与在上!=操作

kxeu7u2r  于 2021-06-18  发布在  Mysql
关注(0)|答案(1)|浏览(185)

我有两个名为ny_clean(3454602个条目)和picku_ids_temp_table(2739268个条目)的表,它们都有一个id char(11)列,它是主键,上面有一个btree索引(mysql 5.7)。
picku\u 0\u ids\u temp\u表中的“id”列是ny\u clean的子集,我想得到一个结果,它是ny\u clean,没有picku\u 0\u ids\u temp\u表中的id值。
方案1:

EXPLAIN
SELECT *
FROM pickup_0_ids_temp_table as t
JOIN ny_clean as n
ON n.id != t.id;
+----+-------------+----------+------------+-------+---------------+-------------------+---------+------+---------+----------+-----------------------------------------------------------------+
| id | select_type | table    | partitions | type  | possible_keys | key               | key_len | ref  | rows    | filtered | Extra                                                           |
+----+-------------+----------+------------+-------+---------------+-------------------+---------+------+---------+----------+-----------------------------------------------------------------+
|  1 | SIMPLE      | t        | NULL       | index | NULL          | PRIMARY           | 11      | NULL | 2734512 |   100.00 | Using index                                                     |
|  1 | SIMPLE      | ny_clean | NULL       | index | NULL          | btree_pk_ny_clean | 11      | NULL | 3445904 |    90.00 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+----------+------------+-------+---------------+-------------------+---------+------+---------+----------+-----------------------------------------------------------------+

方案2:

EXPLAIN
SELECT *
FROM ny_clean as n
WHERE n.id NOT IN (
    SELECT id 
    FROM pickup_0_ids_temp_table);
+----+--------------------+-------------------------+------------+-----------------+------------------------+---------+---------+------+---------+----------+-------------+
| id | select_type        | table                   | partitions | type            | possible_keys          | key     | key_len | ref  | rows    | filtered | Extra       |
+----+--------------------+-------------------------+------------+-----------------+------------------------+---------+---------+------+---------+----------+-------------+
|  1 | PRIMARY            | n                       | NULL       | ALL             | NULL                   | NULL    | NULL    | NULL | 3445904 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | pickup_0_ids_temp_table | NULL       | unique_subquery | PRIMARY,btree_pickup_0 | PRIMARY | 11      | func |       1 |   100.00 | Using index |
+----+--------------------+-------------------------+------------+-----------------+------------------------+---------+---------+------+---------+----------+-------------+

然后在这个较大的查询中使用其中一个选项

EXPLAIN
INSERT INTO y    
SELECT id, pickup_longitude, pickup_latitude 
FROM x
JOIN 
(OPTION 1 OR 2) as z
ON z.id =  x.id;

当我在较大的查询中使用选项1时,它运行了两天,但没有完成。另一方面,选项2在不到30分钟内完成了任务
我的问题是:为什么?遵循mysql文档(https://dev.mysql.com/doc/refman/5.7/en/subquery-materialization.html)我怀疑这是由于子查询的物化造成的,但我该如何检查呢?
我对解释输出的解释错了吗?因为从它来看,我希望选项1更快,因为它在两个表上都使用索引
或者它与更大的查询有关?
提前谢谢

hfyxw5xn

hfyxw5xn1#

你的选择1没有做你认为会做的事。
如果你有两张table

n.id            t.id
      1               1
      2               2 
      3               3
``` `ON n.id != t.id;` 你会得到:

1,2
1,3
2,1
2,3
3,1
3,2

这几乎是笛卡尔积。so 3.4毫米x 2.7毫米~9.18毫米行
然后尝试执行联接,因为materialized表没有索引,所以需要很长时间。

相关问题