pig不将上述输入视为 null 基本上是 chararray ,所以所有的内置函数( is null, is not null )在这种情况下不起作用。您需要对所有字段进行分组,过滤掉空值并获得计数。你能试试下面的剧本吗? 输入
20|ABC;
21|XYZ;
25|null;
99|WER;
45|null;
89|FOY;
Pig手稿:
A = LOAD 'input' USING PigStorage('|') AS (f1:int,f2:chararray);
B = GROUP A ALL;
C = FOREACH B {
filterNull = FILTER A BY (f2!='null;');
GENERATE COUNT(A.f1) AS fieldA, COUNT(filterNull.f2) AS fieldB;
}
DUMP C;
fieldcount = load '/user/examples/stackoverflow/count.txt' using PigStorage('|') as (a:int, b:chararray);
fieldcount1 = FOREACH fieldcount GENERATE a, REPLACE(b,';','') as b;
fieldcount2 = GROUP fieldcount1 ALL;
fieldcount3 = FOREACH fieldcount2 {
a_cnt = FILTER fieldcount1 BY a is not null;
b_cnt = FILTER fieldcount1 BY b is not null and b != 'null' ;
GENERATE COUNT(a_cnt) as a_count, COUNT(b_cnt) as b_count;
}
003 Amit Delhi India 12000
004 Anil Delhi India 15000
005 Deepak Delhi India 34000
006 Fahed Agra India 45000
007 Ravi Patna India 98777
008 Avinash Punjab India 120000
009 Saajan Punjab India 54000
001 Harit Delhi India 20000
002 Hardy Agra India 20000
011 Banglore
都被空间隔开了 代码如下:
A = load '/edata' using PigStorage(' ') as (eid:int,name:chararray,city:chararray,country:chararray,salary:int);
s = group A ALL ;
result = foreach s generate COUNT(A.eid),COUNT(A.name),COUNT(A.country),COUNT(A.salary);
dump result ;
4条答案
按热度按时间oxf4rvwz1#
pig不将上述输入视为
null
基本上是chararray
,所以所有的内置函数(is null, is not null
)在这种情况下不起作用。您需要对所有字段进行分组,过滤掉空值并获得计数。你能试试下面的剧本吗?输入
Pig手稿:
输出:
qlvxas9a2#
输入:
脚本:
n3h0vuf23#
请找出获得输出的步骤
1dkrff034#
请找到答案:-我的样本数据是
都被空间隔开了
代码如下:
您将得到以下结果:-