在pig脚本上进行排名时出错

vs91vp4v  于 2021-06-25  发布在  Pig
关注(0)|答案(1)|浏览(243)

我在hdfs文件中有以下数据:

(1,rohit1,CM,2014-12-31 13:10:23,2014-12-31 13:10:23,2015-02-02 23:23:45,9999-12-31 00:00:00)
(2,rohit2,GM,2014-12-28 14:20:23,2014-12-28 14:20:23,2015-02-02 23:23:45,9999-12-31 00:00:00)
(3,rohit3,CM,2014-12-27 17:40:53,2014-12-27 17:40:53,2015-02-02 23:23:45,9999-12-31 00:00:00)
(4,rohit4,CM,2015-01-20 16:30:26,2015-01-20 16:30:26,2015-02-02 23:23:45,9999-12-31 00:00:00)
(5,rohit5,CM,2015-01-22 14:20:25,2015-01-22 14:20:25,2015-02-02 23:23:45,9999-12-31 00:00:00)
(6,rohit6,GM,2015-01-24 14:20:34,2015-01-24 14:20:34,2015-02-02 23:23:45,9999-12-31 00:00:00)
(7,rohit7,CM,2015-01-25 11:50:58,2015-01-25 11:50:58,2015-02-02 23:23:45,9999-12-31 00:00:00)
(1,rohit1,KM,2014-12-21 13:10:23,2014-12-21 13:10:23,2015-02-01 13:23:45,9999-12-31 00:00:00)
(2,rohit9,GM,2014-12-21 14:20:23,2014-12-21 14:20:23,2015-02-01 13:23:45,9999-12-31 00:00:00)

我需要做的记录排名,并希望它被划分的id和降序更新。为此,我根据id将数据分组如下:

load file data in A,

final_data_group = group A by id;

ranking_data = RANK final_data_group by updated desc;

但给出以下错误:

2015-02-03 16:21:37,555 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025:
<line 10, column 46> Invalid field projection. Projected field [updated] does not exist in schema: group:int,filter_high_date_records:bag{:tuple(id:int,name:chararray,margin_value:chararray,created:chararray,updated:chararray,start_date:chararray,end_date:chararray)}.

有人能帮我解决这个问题吗?

zzoitvuj

zzoitvuj1#

你要找的是给组排名,这是很难与Pig内置功能。
不支持基于bag中的投影字段进行排名,只有通过元组文件才能进行排名。在这种情况下,您可以按组字段排序,这将起作用。
群上的Pig排运算
上面的链接解释了如何使用datafu库对组进行排序

相关问题