配置单元执行错误

pgpifvop  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(406)

我对avro和hive还很陌生,在学习的过程中我有些困惑。使用 tblproperties('avro.schema.url'='somewhereinHDFS/categories.avsc') .
如果我运行这个 create 命令式

create table categories (id Int , dep_Id Int , name String) 
stored as avrofile  
tblproperties('avro.schema.url'=
'hdfs://quickstart.cloudera/user/cloudera/data/retail_avro_avsc/categories.avsc')

但我为什么要使用 id Int, dep_Id Int 在上面的命令,即使我给 avsc 包含完整架构的文件。

create table categories stored as avrofile
tblproperties('avro/schema.url'=
'hdfs://quickstart.cloudera/user/cloudera/data/retail_avro_avsc/categories.avsc')
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException 
Encountered AvroSerdeException determining schema. 
Returning signal schema to indicate problem: 
Neither avro.schema.literal nor avro.schema.url specified, 
can't determine table schema)

为什么配置单元需要指定模式,即使 avsc 文件存在并且已经包含架构?

de90aj5v

de90aj5v1#

你能试着这样做吗?

CREATE TABLE categories
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
    'avro.schema.url'='http://schema.avsc');

更多信息请点击此处https://cwiki.apache.org/confluence/display/hive/avroserde

1szpjjfi

1szpjjfi2#

创建外部配置单元表 orders_sqoop 从给定的avro模式文件和avro数据文件:

hive> create external table if not exists orders_sqoop
        stored as avro
        location '/user/hive/warehouse/retail_stage.db/orders'
        tblproperties('avro.schema.url'='/user/hive/warehouse/retail_stage.db/orders_schema/orders.avsc');

以上 create table 命令成功执行并创建 orders_sqoop table。
验证下表结构:

hive> show create table orders_sqoop;
OK
CREATE EXTERNAL TABLE `orders_sqoop`(
  `order_id` int COMMENT '', 
  `order_date` bigint COMMENT '', 
  `order_customer_id` int COMMENT '', 
  `order_status` string COMMENT '')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION
  'hdfs://quickstart.cloudera:8020/user/hive/warehouse/retail_stage.db/orders'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='false', 
  'avro.schema.url'='/user/hive/warehouse/retail_stage.db/orders_schema/orders.avsc', 
  'numFiles'='2', 
  'numRows'='-1', 
  'rawDataSize'='-1', 
  'totalSize'='660906', 
  'transient_lastDdlTime'='1563093902')
Time taken: 0.125 seconds, Fetched: 21 row(s)

上表按预期创建。

相关问题