hivetez还原程序运行得非常慢

sulc1iza 于 2021-05-27 发布在 Hadoop

关注(0)|答案(1)|浏览(400)

我连接了多个表，总行数约为250亿行。除此之外，我正在做聚合。下面是我的配置单元设置，我用它来生成最终输出。我不确定如何调整查询并使其运行得更快。目前，我正在做试验和错误，看看是否可以产生一些结果，但这似乎不工作。Map正在运行速度更快，但减速机是永远完成。有人能谈谈你的想法吗？谢谢您。

SET hive.execution.engine=tez;
    SET hive.exec.dynamic.partition.mode=nonstrict;
    SET hive.qubole.cleanup.partial.data.on.failure=true;
    SET hive.tez.container.size=8192;
    SET tez.task.resource.memory.mb=8192;
    SET tez.task.resource.cpu.vcores=2;
    SET hive.mapred.mode=nonstrict;
    SET hive.qubole.dynpart.use.prefix=true;
    SET hive.vectorized.execution.enabled=true;
    SET hive.vectorized.execution.reduce.enabled =true;
    SET hive.cbo.enable=true;
    SET hive.compute.query.using.stats=true;
    SET hive.stats.fetch.column.stats=true;
    SET hive.stats.fetch.partition.stats=true;
    SET mapred.reduce.tasks = -1;
    SET hive.auto.convert.join.noconditionaltask.size=2730;
    SET hive.auto.convert.join=true;
    SET hive.auto.convert.join.noconditionaltask=true;
    SET hive.auto.convert.join.noconditionaltask.size=8053063680;
    SET hive.compute.query.using.stats=true;
    SET hive.stats.fetch.column.stats=true;
    SET hive.stats.fetch.partition.stats=true;
    SET mapreduce.job.reduce.slowstart.completedmaps=0.8;
    set hive.tez.auto.reducer.parallelism = true;
    set hive.exec.reducers.max=100;
    set hive.exec.reducers.bytes.per.reducer=1024000000;

SQL:

SELECT D.d
      ,D.b
      ,COUNT(DISTINCT A.x)  AS cnt
      ,SUM(c)               AS sum
 FROM A
LEFT JOIN
       B
ON A.a = B.b
LEFT JOIN
       C 
ON B.b = C.c
JOIN
       D
 ON A.a >= D.d
AND A.a <= D.d
GROUP BY 1,2
CLUSTER BY D.d;

hadoop Hive query-optimization hiveql apache-tez

来源：https://stackoverflow.com/questions/54491233/hive-tez-reducers-are-running-super-slow

1条答案

按热度按时间

crcmnpdw1#

还没有查询计划，所以可能还有其他设置，但这些设置肯定限制了reducer的并行性：

set hive.exec.reducers.max=100;
set hive.exec.reducers.bytes.per.reducer=1024000000;

我建议增加允许的减速机数量并减少每个减速机的字节数，这将增加减速机上的并行性：

set hive.exec.reducers.max=5000; 
set hive.exec.reducers.bytes.per.reducer=67108864;

此外，hive1.2.0+还提供了count（distinct）的自动重写优化。检查此设置，应该是 true 默认情况下：

hive.optimize.distinct.rewrite=true;

如果查询停留在最后一个reducer上，那么连接键中就有一个歪斜

赞(0）回复(0）举报 2021-05-27

我来回答

hivetez还原程序运行得非常慢

1条答案

相关问题

热门标签

最新问答