pig:用于将单个文件中的记录类型写入多个输出

72qzrwbm  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(262)

我在一个文件中有以下数据

"HD",003498,"20160913:17:04:10","D3ZYE",1
"EH","XXX-1985977-1",1,"01","20151215","20151215","20151229","20151215","2304",,,"36-126481000",1340.74,61808.00,1126.62,0.00,214.12,0.00,0.00,0.00,"30","20151229","00653845",,,"PARTS","001","ABI","20151215","Y","Y","N","36-126481000",

我想使用pig来读取这个文件,然后根据第一列将其分隔为不同的文件。同样,我正在寻找一种方法,首先将记录视为以下构造:
rectypcd,记录数据
然后再把recorddata当作csv记录
在这方面,当我将它们存储在具有相同记录类型的单独文件中之后,我可以使用csv serde将它们简单地加载到它自己的外部配置单元表中

zysjyyx4

zysjyyx41#

根据您的情况,您可以在pig中使用split by
e、 g multiple=strip line by rectypecd case hd1当rectypecd=='hd',case hd2。。。
将hd1存储到op1中;将hd2存储到op2中;

相关问题