清管器编程

qqrboqgw  于 2021-06-25  发布在  Pig
关注(0)|答案(4)|浏览(271)

如何编写pig查询以获取字段中存在值的计数?
例如:
a场| b场
20 | 公司;
21 | xyz;
25 |空;
99瓦;
45 |空;
89 |福伊;
所需o/p:字段a计数=6,字段b计数=4

oxf4rvwz

oxf4rvwz1#

pig不将上述输入视为 null 基本上是 chararray ,所以所有的内置函数( is null, is not null )在这种情况下不起作用。您需要对所有字段进行分组,过滤掉空值并获得计数。你能试试下面的剧本吗?
输入

20|ABC;
21|XYZ;
25|null;
99|WER;
45|null;
89|FOY;

Pig手稿:

A = LOAD 'input' USING PigStorage('|') AS (f1:int,f2:chararray);
B = GROUP A ALL;
C = FOREACH B {
                filterNull =  FILTER A BY (f2!='null;');
                GENERATE COUNT(A.f1) AS fieldA, COUNT(filterNull.f2) AS fieldB;
              }
DUMP C;

输出:

(6,4)
qlvxas9a

qlvxas9a2#

输入:

20|ABC
21|XYZ
25|null
99|WER
45|null
89|FOY

脚本:

inputData = LOAD 'input' using PigStorage('|');
grouped_input = GROUP inputData ALL;
counts = FOREACH grouped_input GENERATE COUNT($1), COUNT($2);
dump counts;
n3h0vuf2

n3h0vuf23#

请找出获得输出的步骤

fieldcount =  load '/user/examples/stackoverflow/count.txt' using PigStorage('|') as (a:int, b:chararray); 

fieldcount1 = FOREACH fieldcount GENERATE a, REPLACE(b,';','') as b;

fieldcount2 = GROUP fieldcount1 ALL;

fieldcount3 = FOREACH fieldcount2 {
    a_cnt = FILTER fieldcount1 BY a is not null;
    b_cnt = FILTER fieldcount1 BY b is not null and b != 'null' ;
    GENERATE COUNT(a_cnt) as a_count, COUNT(b_cnt) as b_count;
}
1dkrff03

1dkrff034#

请找到答案:-我的样本数据是

003 Amit Delhi India 12000 

004 Anil Delhi India 15000

005 Deepak Delhi India 34000

006 Fahed Agra India 45000

007 Ravi Patna India 98777

008 Avinash Punjab India 120000

009 Saajan Punjab India 54000

001 Harit Delhi India 20000

002 Hardy Agra India 20000

011  Banglore

都被空间隔开了
代码如下:

A = load '/edata' using PigStorage(' ') as (eid:int,name:chararray,city:chararray,country:chararray,salary:int);

s = group A ALL ;

result = foreach s generate COUNT(A.eid),COUNT(A.name),COUNT(A.country),COUNT(A.salary);

dump result ;

您将得到以下结果:-

(10,9,9,9)

相关问题