为什么在输出文件中自动跳过头文件

izj3ouym  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(382)

我想在不跳过数据头的情况下存储数据
这是我的Pig剧本:

CRE_GM05 = LOAD '$input1' USING  PigStorage(;) AS (MGM_COMPTEUR:chararray,CIA_CD_CRV_CIA:chararray,CIA_DA_EM_CRV:chararray,CIA_CD_CTRL_BLCE:chararray,CIA_IDC_EXTR_RDJ:chararray,CIA_VLR_IDT_CRV_LOQ:chararray,CIA_VLR_REF_CRV:chararray,CIA_NO_SEQ_CRV:chararray,CIA_VLR_LG_ZON_RTG:chararray,CIA_HEU_CIA:chararray,CIA_TM_STP_CRE:chararray,CIA_CD_SI:chararray,CIA_VLR_1:chararray,CIA_DA_ARR_FIC:chararray,CIA_TY_ENR:chararray,CIA_CD_BTE:chararray,CIA_CD_PER:chararray,CIA_CD_EFS:chararray,CIA_CD_ETA_VAL_CRV:chararray,CIA_CD_EVE_CPR:int,CIA_CD_APLI_TDU:chararray,CIA_CD_STE_RTG:chararray,CIA_DA_TT_RTG:chararray,CIA_NO_ENR_RTG:chararray,CIA_DA_VAL_EVE:chararray,T32_001:chararray,TEC_013:chararray,TEC_014:chararray,DAT_001_X:chararray,DAT_002_X:chararray,TEC_001:chararray);
CRE_GM11 = LOAD '$input2' USING  PigStorage(;) AS (MGM_COMPTEUR:chararray,CIA_CD_CRV_CIA:chararray,CIA_DA_EM_CRV:chararray,CIA_CD_CTRL_BLCE:chararray,CIA_IDC_EXTR_RDJ:chararray,CIA_VLR_IDT_CRV_LOQ:chararray,CIA_VLR_REF_CRV:chararray,CIA_NO_SEQ_CRV:chararray,CIA_VLR_LG_ZON_RTG:chararray,CIA_HEU_CIA:chararray,CIA_TM_STP_CRE:chararray,CIA_CD_SI:chararray,CIA_VLR_1:chararray,CIA_DA_ARR_FIC:chararray,CIA_TY_ENR:chararray,CIA_CD_BTE:chararray,CIA_CD_PER:chararray,CIA_CD_EFS:chararray,CIA_CD_ETA_VAL_CRV:chararray,CIA_CD_EVE_CPR:int,CIA_CD_APLI_TDU:chararray,CIA_CD_STE_RTG:chararray,CIA_DA_TT_RTG:chararray,CIA_NO_ENR_RTG:chararray,CIA_DA_VAL_EVE:chararray,DAT_001_X:chararray,DAT_002_X:chararray,D08_001:chararray,PSE_001:chararray,PSE_002:chararray,PSE_003:chararray,RUB_001:chararray,RUB_002:chararray,RUB_003:chararray,RUB_004:chararray,RUB_005:chararray,RUB_006:chararray,RUB_007:chararray,RUB_008:chararray,RUB_009:chararray,RUB_010:chararray,TEC_001:chararray,TEC_002:chararray,TEC_003:chararray,TX_001_VLR:chararray,TX_001_DCM:chararray,D08_004:chararray,D11_004:chararray,RUB_016:chararray,T03_001:chararray);

-- Effectuer une jointure entre les deux tables

JOINED_TABLES = JOIN CRE_GM05 BY TEC_001, CRE_GM11 BY TEC_001;

-- Generer les colonnes 

DATA_GM05 = FOREACH JOINED_TABLES GENERATE 
        CRE_GM05::MGM_COMPTEUR  AS MGM_COMPTEUR,
        CRE_GM05::CIA_CD_CRV_CIA  AS CIA_CD_CRV_CIA,
        CRE_GM05::CIA_DA_EM_CRV   AS CIA_DA_EM_CRV,
        CRE_GM05::CIA_CD_CTRL_BLCE AS CIA_CD_CTRL_BLCE,
        CRE_GM05::CIA_IDC_EXTR_RDJ  AS CIA_IDC_EXTR_RDJ,
        CRE_GM05::CIA_VLR_IDT_CRV_LOQ AS CIA_VLR_IDT_CRV_LOQ,
        CRE_GM05::CIA_VLR_REF_CRV  AS CIA_VLR_REF_CRV,
        CRE_GM05::CIA_VLR_LG_ZON_RTG  AS CIA_VLR_LG_ZON_RTG,
        CRE_GM05::CIA_HEU_CIA AS CIA_HEU_CIA,
        CRE_GM05::CIA_TM_STP_CRE AS CIA_TM_STP_CRE,
        CRE_GM05::CIA_VLR_1 AS CIA_VLR_1,
        CRE_GM05::CIA_DA_ARR_FIC AS CIA_DA_ARR_FIC,
        CRE_GM05::CIA_TY_ENR AS CIA_TY_ENR,
        CRE_GM05::CIA_CD_BTE AS CIA_CD_BTE,
        CRE_GM05::CIA_CD_PER AS CIA_CD_PER,
        CRE_GM05::CIA_CD_EFS AS CIA_CD_EFS,
        CRE_GM05::CIA_CD_ETA_VAL_CRV AS CIA_CD_ETA_VAL_CRV,
        CRE_GM05::CIA_CD_EVE_CPR AS CIA_CD_EVE_CPR,
        CRE_GM05::CIA_CD_APLI_TDU AS CIA_CD_APLI_TDU,
        CRE_GM05::CIA_CD_STE_RTG AS CIA_CD_STE_RTG,
        CRE_GM05::CIA_DA_TT_RTG AS CIA_DA_TT_RTG,
        CRE_GM05::CIA_NO_ENR_RTG AS CIA_NO_ENR_RTG,
        CRE_GM05::CIA_DA_VAL_EVE AS CIA_DA_VAL_EVE,
        CRE_GM05::T32_001 AS T32_001,
        CRE_GM05::TEC_013 AS TEC_013,
        CRE_GM05::TEC_014 AS TEC_014,
        CRE_GM05::DAT_001_X AS DAT_001_X,
        CRE_GM05::DAT_002_X AS DAT_002_X,
        CRE_GM05::TEC_001 AS TEC_001;

STORE DATA_GM05 INTO '$OUTPUT_FILE' USING PigStorage(';');

它返回数据,但我丢失了第一行标题!
请注意,$input1和$input2变量是csv文件
我试过使用csvloader,但也不起作用。
我需要得到与标题存储输出请

nkkqxpd9

nkkqxpd91#

默认情况下,在pig最终输出中没有标题。另外,将头添加到最终输出将没有任何意义,因为行的顺序在pig输出中是不固定的。
如果要将头添加到最终输出,请将所有零件文件数据合并到本地文件系统中的一个文件中,在本地文件系统中可以显式添加头信息,或者使用配置单元表存储此pig脚本的输出。有一个hcatlog存储可以用于相同的。

相关问题