apachepig:联合的问题

wwtsj6pe  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(241)

apachepig:我有两个具有相同模式的数据集x,y。x有222条记录,y有70条记录。我需要合并两个,意思是垂直添加两个数据集。如果我使用union,问题是输出的记录数超过了预期的数量。即z=并集x,y;有888张唱片。谁能给点建议吗。
示例代码:

UI_INFO = filter LOGS BY hotel_book.step == 'UI_INFO';
LOGS_ITINERARY = filter LOGS BY hotel_book.step == 'CREATE_ITINERARY';
LOGS_JOINED = join UI_INFO by header.itinerary_id LEFT OUTER , LOGS_ITINERARY by header.itinerary_id;
LOGS_BOOK_COL1 = FOREACH LOGS_JOINED {
CURRENCY = LOGS_ITINERARY::hotel_book.itinerary.hotel.rooms.currency;
GENERATE UI_INFO::header.date_time AS date_time,
LOGS_ITINERARY::hotel_book.pay_at_hotel AS pay_at_hotel,
UI_INFO::header.referrer AS referrer,
UI_INFO::hotel_book.step AS stage,
FLATTEN( (IsEmpty(CURRENCY) ? TOBAG('unknown') : CURRENCY) ) AS currency;
};
REMAINING_LOGS = FILTER LOGS BY (hotel_book.step == 'CREATE_ITINERARY' OR hotel_book.step == 'PROVISIONAL_BOOK') 
LOGS_BOOK_COL2 = FOREACH REMAINING_LOGS {
CURRENCY = hotel_book.itinerary.hotel.rooms.currency;
GENERATE header.date_time AS date_time,
hotel_book.pay_at_hotel AS pay_at_hotel,
header.referrer AS referrer,
hotel_book.step AS stage,
FLATTEN( (IsEmpty(CURRENCY) ? TOBAG('unknown') : CURRENCY) ) AS currency;
};
LOGS_BOOK_COL = UNION LOGS_BOOK_COL1,LOGS_BOOK_COL2;

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题