我在一个文件中有以下数据
"HD",003498,"20160913:17:04:10","D3ZYE",1
"EH","XXX-1985977-1",1,"01","20151215","20151215","20151229","20151215","2304",,,"36-126481000",1340.74,61808.00,1126.62,0.00,214.12,0.00,0.00,0.00,"30","20151229","00653845",,,"PARTS","001","ABI","20151215","Y","Y","N","36-126481000",
我想使用pig来读取这个文件,然后根据第一列将其分隔为不同的文件。同样,我正在寻找一种方法,首先将记录视为以下构造:
rectypcd,记录数据
然后再把recorddata当作csv记录
在这方面,当我将它们存储在具有相同记录类型的单独文件中之后,我可以使用csv serde将它们简单地加载到它自己的外部配置单元表中
1条答案
按热度按时间zysjyyx41#
根据您的情况,您可以在pig中使用split by
e、 g multiple=strip line by rectypecd case hd1当rectypecd=='hd',case hd2。。。
将hd1存储到op1中;将hd2存储到op2中;