我有如下记录
1000, 1001, 1002 to 1999, 2000, 2001, 2002 to 2999, 3000, 3001, 3002 to 3999
我想用这样一种方式使用配置单元处理下面的记录集:reducer-1将处理数据1000到1999,reducer-2将处理数据2000到2999,reducer-3将处理数据3000到3999。请帮助我解决上述问题。
iecba09b1#
使用 DISTRIBUTE BY ,Map器输出将根据要传输到还原器进行处理的distribute by子句进行分组:
DISTRIBUTE BY
select ... from ... distribute by case when col between 1000 and 1999 then 1 when col between 2000 and 2999 then 2 when col between 3000 and 3999 then 3 end
或者只是 distribute by floor(col/1000)
distribute by floor(col/1000)
1条答案
按热度按时间iecba09b1#
使用
DISTRIBUTE BY
,Map器输出将根据要传输到还原器进行处理的distribute by子句进行分组:或者只是
distribute by floor(col/1000)