如何在Hive中使用tez引擎进行动态分区?

mv1qrgav  于 2023-03-08  发布在  Hive
关注(0)|答案(1)|浏览(209)

我把数据从mysql复制到hdfs。

sqoop import --username aaa --password bbb --num-mappers 1 --connect "jdbc:sqlserver://xxx.xxx.xxx.xxx:1234;database=test_db" --query "select DATE, DATA01, DATA02, DATA03 FROM test_tbl where \$CONDITIONS"  --delete-target-dir --target-dir /data/test/raw_data/sqoop_tbl/ --as-textfile

并在数据路径上方的配置单元中创建表。

create external table sqoop_tbl (DATE TIMESTAMP, DATA01 FLOAT, DATA02 FLOAT, DATA03 FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' stored as textfile location '/data/test/raw_data/sqoop_tbl';

我在配置单元结果中检索上述表格:

2022-12-13 10:22:12     86.0    70.0    85.0
2022-12-13 10:22:20     86.0    70.0    85.0
2022-12-13 10:22:28     86.0    70.0    85.0
2022-12-13 10:22:38     86.0    70.0    85.0
2022-12-13 10:22:46     86.0    70.0    85.0

然后我创建分区表如下:

create table if not exists partitioned_data (DATA01 FLOAT, DATA02 FLOAT, DATA03 FLOAT) PARTITIONED BY(date STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';

并将数据从test_tbl插入partitioned_data,如下所示:

insert into table partitioned_date partition (date) select substring(DATE,0,10), DATA01, DATA02, DATA03 from test_tbl;

这个工作很好,但是它的分区不是日期。

expected partition key,val:
date=2022-01-02
date=2022-01-03
date=2022-01-04
date=2022-01-05
date=2022-01-06
date=2022-01-07
date=2022-01-08
date=2022-01-09
...

result partition key,val:
date=0.0
date=0.1
date=0.2
date=0.3
date=0.4
date=0.5
date=0.6
date=0.7
date=0.8
date=0.9
date=1.0
date=1.1
date=1.2
...

为什么分区的瓦尔是错的??

bvjxkvbb

bvjxkvbb1#

好吧...这是我的错误。当我查询“插入~选择~",选择的列顺序是问题。我重新排序分区键到最后一个位置,它的工作!!!

相关问题