Pig需要找到马克斯

mmvthczy  于 2021-07-15  发布在  Pig
关注(0)|答案(1)|浏览(259)

我是一个新的Pig和工作的问题,我需要找到在这个数据集的球员与最大重量。以下是数据示例:

id,     weight,id,year, triples
(bayja01,210,bayja01,2005,6)
(crawfca02,225,crawfca02,2005,15)
(damonjo01,205,damonjo01,2005,6)
(dejesda01,190,dejesda01,2005,6)
(eckstda01,170,eckstda01,2005,7)

这是我的Pig剧本:

batters = LOAD 'hdfs:/user/maria_dev/pigtest/Batting.csv' using PigStorage(',');
realbatters = FILTER batters BY $1==2005;
triphitters = FILTER realbatters BY $9>5;
tripids =  FOREACH triphitters GENERATE $0 AS id,$1 AS YEAR, $9 AS Trips;
names = LOAD 'hdfs:/user/maria_dev/pigtest/Master.csv' 
using PigStorage(',');
weights = FOREACH names GENERATE $0 AS id, $16 AS weight;
get_ids = JOIN  weights BY (id), tripids BY(id);
wts  = FOREACH get_ids GENERATE MAX(get_ids.weight)as wgt;
DUMP wts;

当然,倒数第二行行不通。它告诉我我必须使用显式演员阵容。我已经弄清楚了过滤等-但无法弄清楚如何得到最终答案。

xlpyo6sf

xlpyo6sf1#

这个 MAX pig中的函数需要一袋值,并返回袋中的最高值。要创建包,必须首先 GROUP 您的数据:

get_ids = JOIN weights BY id, tripids BY id;

-- Drop columns we no longer need and rename for ease

just_ids_weights = FOREACH get_ids GENERATE
    weights::id AS id,
    weights:: weight AS weight;

-- Group the data by id value

gp_by_ids = GROUP just_ids_weights BY id;

-- Find maximum weight by id

wts = FOREACH gp_by_ids GENERATE 
   group AS id,
   MAX(just_ids_weights.weight) AS wgt;

如果您想要所有数据的最大重量,可以使用 GROUP ALL :

gp_all = GROUP just_ids_weights ALL;

was = FOREACH gp_all GENERATE
    MAX(just_ids_weights.weight) AS wgt;

相关问题