使用pig脚本将文本文件转换为avro

xurqigkl  于 2021-06-25  发布在  Pig
关注(0)|答案(1)|浏览(371)

我正在使用pig脚本进行文本文件到avro的转换
我在/user/hduser/pig\u input/.dat有一个管道分隔格式的文本文件1|8|123|985|659856|10000000002546 1|8|123|985|659856|10000000002546 1|8|123|985|659856|10000000002546 1|8|123|985|659856|10000000002546 1|8|123|985|659856|10000000002546` 模式文件位于hdfs/user/hduser/pig\u schema\u files/.avsc

{
  "type" : "record",
  "name" : "import_dummy",
  "doc" : "import_123dummy",
  "fields" : [ {
  "name" : "ID",
  "type" : [ "string", "null" ],
  "columnName" : "ID",
  "sqlType" : "3"
  }, {
  "name" : "TRANS_O",
  "type" : [ "string", "null" ],
  "columnName" : "TRANS_O",
  "sqlType" : "3"
 }, {
 "name" : "CARD_O",
 "type" : [ "string", "null" ],
 "columnName" : "CARD_O",
 "sqlType" : "3"
 }, {
 "name" : "SEQ_O",
 "type" : [ "string", "null" ],
 "columnName" : "SEQ_O",
 "sqlType" : "1"
 }, {
 "name" : "DATE_O",
 "type" : [ "string", "null" ],
 "columnName" : "DATE_O",
 "sqlType" : "3"
 }],"tableName" : "123dummy"}

下面是我写的剧本

REGISTER /app/cloudera/parcels/CDH/lib/pig/piggybank.jar
REGISTER /app/cloudera/parcels/CDH/lib/pig/lib/avro-1.3.7.jar
REGISTER /app/cloudera/parcels/CDH/lib/pig/lib/jackson-core-asl.jar
REGISTER /app/cloudera/parcels/CDH/lib/pig/lib/jackson-mapper-asl.jar
REGISTER /app/cloudera/parcels/CDH/lib/pig/lib/json-simple.jar
REGISTER /app/cloudera/parcels/CDH/lib/pig/lib/snappy-java.jar

textfile = load 'user/hduser/pig_input/abc.dat' using pigStorage('|');
STORE textfile INTO '/user/hduser/pig_output/' 
    USING org.apache.pig.piggybank.storage.avro.AvroStorage('schema_file','/user/hduser/pig_schema_files/abc.avsc');

运行脚本后出现以下错误:

2015-02-03 09:46:56,369 [main] ERROR org.apache.pig.tools.grunt.Grunt  -                                         ERROR 6000:<file script.pig, line 9, column 0> 
    Output Location Validation Failed for: '/user/hduser/pig_output/
    More info to follow:
    Output schema is null!
6tdlim6h

6tdlim6h1#

与csv文件一起,我们必须读取字段名,以便在写入时 avro ,它将自动Map字段名。

textfile = load 'user/hduser/pig_input/abc.dat' using pigStorage('|') as (ID, TRANS_O,CARD_O,SEQ_O, DATE_O );
STORE textfile INTO '/user/hduser/pig_output/' USING org.apache.pig.piggybank.storage.avro.AvroStorage();

相关问题