hive:无法将覆盖表从未分区的外部表插入到新的分区表中

0g0grzrc  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(352)

总而言之,我就是这么做的:
原始数据->在hdfs中选择并保存过滤后的数据->使用hdfs中保存的文件创建外部表->使用外部表填充空表。
看看这个异常,这似乎与两个表之间的输出类型有关
具体内容:
1) 我有一个包含大量数据的“table\u log”表(在数据库a中),其结构如下(有3个分区):

CREATE TABLE `table_log`(
  `e_id` string, 
  `member_id` string, 
  .
  .
PARTITIONED BY ( 
  `dt` string, 
  `service_type` string, 
  `event_type` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY '\u0001' 
  COLLECTION ITEMS TERMINATED BY '\u0002' 
  MAP KEYS TERMINATED BY '\u0003' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

2) 我按(td、服务类型、事件类型)过滤数据,并将结果保存在hdfs中,如下所示:

INSERT OVERWRITE DIRECTORY  '/user/atscale/filterd-ratlog' SELECT * FROM rat_log WHERE dt >= '2016-05-01' AND dt <='2016-05-31' AND service_type='xxxx_jp' AND event_type='vv';

3) 然后我用上面的结果创建了一个外部表(table\ log\ filtered\ ext)(在数据库b中)。注意,这个表没有分区。

DROP TABLE IF EXISTS table_log_filtered_ext;
CREATE EXTERNAL TABLE `table_log_filtered_ext`(
  `e_id` string, 
  `member_id` string, 
  .
  .
  dt string,
  service_type string,
  event_type string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY '\u0001' 
  COLLECTION ITEMS TERMINATED BY '\u0002' 
  MAP KEYS TERMINATED BY '\u0003'
LOCATION '/user/atscale/filterd-ratlog'

4) 我创建了另一个新表(table\u log\u filtered),类似于“table\u log”结构(有3个分区),如下所示:

CREATE TABLE `table_log_filtered` (
  `e_id` string, 
  `member_id` string, 
  .
  .
PARTITIONED BY ( 
  `dt` string, 
  `service_type` string, 
  `event_type` string)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY '\u0001' 
  COLLECTION ITEMS TERMINATED BY '\u0002' 
  MAP KEYS TERMINATED BY '\u0003' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

5) 现在我想从外部表“table\u log\u filtered\u ext”中的数据填充“table\u log\u filtered”表(在“table\u log”中有3个分区)

SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.execution.engine=tez; 

INSERT OVERWRITE TABLE rat_log_filtered PARTITION(dt, service_type, event_type) 
SELECT * FROM table_log_filtered_ext;

但是我得到了这个“java.lang.classcastexception”。看看这个异常,两个表之间的输出类型有一些问题要处理。。有什么建议吗

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":
.
.
.
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
      at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
      at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:422)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
      at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
      at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)
    Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0
      at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
      at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292)
      ... 16 more
    Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow
      at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:81)
      at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:753)
      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
      at org.apache.hadoop.hive.ql.exec.LimitOperator.process(LimitOperator.java:54)
      at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
      at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
      at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361)
      ... 17 more
ut6juiuv

ut6juiuv1#

为了防止其他人碰到这个问题,修复方法是@samson scharfrichter提到的,我为表\u log\u filters指定了stored as orc

CREATE TABLE `table_log_filtered` (
  `e_id` string, 
  `member_id` string, 
  .
  .
PARTITIONED BY ( 
  `dt` string, 
  `service_type` string, 
  `event_type` string)
STORED AS ORC

相关问题