无法接受的慢速配置单元查询

6gpjuf90  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(219)

我正在hdp2群集上运行配置单元0.14。我的数据集是使用kitesdk构建的,并使用外部表注册到配置单元。
请参见下面的我的表格布局:

hive> describe hivetweets;
OK
created_at              bigint                  from deserializer
id                      bigint                  from deserializer
in_reply_to_user_id     bigint                  from deserializer
in_reply_to_status_id   bigint                  from deserializer
lang                    string                  from deserializer
text                    string                  from deserializer
retweet_count           int                     from deserializer
year                    int                     Partition column derived from 'created_at' column, generated by Kite.
month                   int                     Partition column derived from 'created_at' column, generated by Kite.
day                     int                     Partition column derived from 'created_at' column, generated by Kite.
hour                    int                     Partition column derived from 'created_at' column, generated by Kite.

# Partition Information

# col_name              data_type               comment

year                    int                     Partition column derived from 'created_at' column, generated by Kite.
month                   int                     Partition column derived from 'created_at' column, generated by Kite.
day                     int                     Partition column derived from 'created_at' column, generated by Kite.
hour                    int                     Partition column derived from 'created_at' column, generated by Kite.
Time taken: 0.15 seconds, Fetched: 19 row(s)

我对此设置的初始测试查询是仅获取数据集的一行(我删除了示例中的实际输出):

hive> select * from hivetweets limit 1;
OK
Time taken: 103.726 seconds, Fetched: 1 row(s)

104秒运行这个查询太长了。
这可能不是分布式的,所以我尝试用更多的数据来测试它:

hive> select count(*) from hivetweets limit 100000;
Query ID = root_20150715132222_81e386ef-2990-4251-a61f-82ca8da4c48d
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.

Status: Running (Executing on YARN cluster with App id application_1436910684121_0006)

--------------------------------------------------------------------------------

VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED     19         19        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 567.52 s
--------------------------------------------------------------------------------
OK
197371741

在10分钟内统计10万条记录是很长的一段时间。
我很高兴有任何建议如何调试这个。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题