sql配置单元pig-mapreduce

oewdyzsn  于 2021-05-30  发布在  Hadoop
关注(0)|答案(1)|浏览(231)

每行有5列,这5列通常用逗号分隔

1 column is name
2nd column is date_of_purchase
3rd column is product
4th column is mode of payment
5th column is total_amount

希望你能理解它包含的数据

surender,2014-03-09,TV,OFFLINE,20000
surender,2014-01-01,Mobile,ONLINE,18000
Raja,2014-09-21,Laptop,ONLINE,30000
Surender,2014-10-12,Laptop,ONLINE,40000
Raja,2014-FEB-11,MusicSystem,ONLINE,2000
Kumar,2014-07-09,Ipod,OFFLINE,4000
Kumar,2014-06-08,TV,ONLINE,20000
Raja,2014-11-07,SPeakers,OFFLINE,8000
Kumar,2014-10-18,Laptop,ONLINE,30000

我需要的是我想看看每个人通过在线模式和离线模式花了多少钱
基本上我需要减速机输出应该像下面

surender   OFFLINE   20000
surender   ONLINE    58000
Raja       OFFLINE   8000
Raja       ONLINE    32000
Kumar      OFFLINE    4000
Kumar      ONLINE    50000

最终的输出应该是这样的:

surender 20000  58000
Raja     8000   32000
Kumar     4000   50000

你可以给我一个Hive或Pig查询或一个mapreduce程序

gtlvzcf8

gtlvzcf81#

A = LOAD 'file_name' using PigStorage(',') as (name:chararray,date:chararray,product:chararray,mode:chararray,total:long);
B = GROUP A BY (name,mode);
C = FOREACH B GENERATE group.name as name,group.mode, SUM(total) as total;
D = GROUP C BY name;
E = FOREACH D GENERATE group, C.total;

如果像您提供的示例这样的数据有不同的拼写,则需要在分组之前将其转换为大写

相关问题