hive#u bad#data:orc中字段#event#u time的类型timestamp与表架构中定义的类型varchar不兼容

brgchamk  于 2021-06-25  发布在  Hive
关注(0)|答案(0)|浏览(179)

最近,我们将hive数据归档到s3(数据格式为orc),但是雅典娜在查询数据时出现了一些错误。
当我执行 select * from ta_event_10 limit 1 ,雅典娜查询出错,错误如下:
hive#u bad#u data:orc中字段#event#u time的类型timestamp与表架构中定义的类型varchar不兼容
另外,有些sql可以运行,但结果不正确,例如 select "#event_time" from ta_event_10 limit 10 .
Hive show create table :

CREATE TABLE `ta.ta_event_10`(
  `#user_id` bigint,
  `#event_name` string,
  `#event_time` timestamp,
  `#app_id` string,
  `#account_id` string,
  `#distinct_id` string,
  `#server_time` timestamp,
  `#kafka_offset` bigint,
  `#device_id` string,
  `version_code` double,
  `#province` string,
  `#device_model` string,
  `#city` string,
  `channel` string,
  `launchsource` string,
  `flavor` string,
  `#app_version` string,
  `#os` string,
  `#country_code` string,
  `#ip` string,
  `#os_version` string,
  `#manufacturer` string,
  `#country` string,
  `version_name` string,
  `vpn` string,
  `parallelversion` string,
  `segmentid` string,
  `user_dimen` double,
  `packagename` string,
  `packageversion` string,
  `installsource` string,
  `packagelabel` string,
  `isrunning` string,
  `payment_platform` string,
  `buy_status` string,
  `productid` string,
  `orderid` string,
  `#zone_offset` double)
PARTITIONED BY (
  `$part_event` string,
  `$part_date` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://ta2/user/hive/warehouse/ta.db/ta_event_10'
TBLPROPERTIES (
  'presto_query_id'='20200113_034929_06651_2dhiw',
  'presto_version'='316-ta',
  'transient_lastDdlTime'='1578887369')

雅典娜 show create table :

CREATE EXTERNAL TABLE `ta_event_10`(
  `#user_id` bigint, 
  `#event_name` string, 
  `#event_time` timestamp, 
  `#app_id` string, 
  `#account_id` string, 
  `#distinct_id` string, 
  `#server_time` timestamp, 
  `#kafka_offset` bigint, 
  `#device_id` string, 
  `version_code` double, 
  `#province` string, 
  `#device_model` string, 
  `#city` string, 
  `channel` string, 
  `launchsource` string, 
  `flavor` string, 
  `#app_version` string, 
  `#os` string, 
  `#country_code` string, 
  `#ip` string, 
  `#os_version` string, 
  `#manufacturer` string, 
  `#country` string, 
  `version_name` string, 
  `vpn` string, 
  `parallelversion` string, 
  `segmentid` string, 
  `user_dimen` double, 
  `packagename` string, 
  `packageversion` string, 
  `installsource` string, 
  `packagelabel` string, 
  `isrunning` string, 
  `payment_platform` string, 
  `buy_status` string, 
  `productid` string, 
  `orderid` string, 
  `#zone_offset` double)
PARTITIONED BY ( 
  `$part_event` string, 
  `$part_date` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  's3://test-ta-export/ta_event_10'
TBLPROPERTIES (
  'transient_lastDdlTime'='1583763184')

“#事件#时间”字段都是时间戳类型。
我通过元数据查找一个orc文件 orc-metadata ,它们看起来像这样:

先用Hive和雅典娜。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题