pig中的复杂数据类型问题

vsdwdz23  于 2021-06-25  发布在  Pig
关注(0)|答案(1)|浏览(290)

我有一个input.txt,如下所示:

{"charId":1111,"encounters":[{"alias":"A","guid":192,"data1":0,"data2":0,"temporary":1},{"alias":"B","guid":952,"data1":0,"data2":0,"temporary":1}]}
{"charId":2222,"encounters":[{"alias":"C","guid":544,"data1":0,"data2":0,"temporary":1}]}
{"charId":3333,"encounters":[]}

我的问题是如何获得如下输出:

(1111, A, 192, 0, 0, 1)
(1111, B, 952, 0, 0, 1)
(2222, C, 544, 0, 0, 1)
(3333,  ,    ,  ,  ,  )

p、 这是我的脚本,但它只输出前三行。

raw_data = LOAD 'input.txt' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]);

a = FOREACH raw_data GENERATE json#'charId' AS (charId:chararray), FLATTEN(json#'encounters') AS (encounters:map[]);

b = FOREACH a GENERATE charId, encounters#'alias' AS alias, encounters#'guid' AS guid, encounters#'data1' AS data1, encounters#'data2' AS data2, encounters#'temporary' AS temporary;

非常感谢你的帮助。我真的很感激。

ngynwnxp

ngynwnxp1#

原因是, Flatten 运算符将始终丢弃空Map,因此它不会包含在最终输出中。一个选择是你可以用下面的方法解决这个问题。我不会说这是最好的解决办法,但至少可以解决你的问题。
Pig手稿:

raw_data = LOAD 'input.txt' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]);
a = FOREACH raw_data GENERATE json#'charId' AS (charId:chararray), json#'encounters' AS (encounters:map[]);
b = FOREACH raw_data GENERATE json#'charId' AS (charId:chararray),flatten(json#'encounters') AS (encounters:map[]);
c = FILTER a By IsEmpty(encounters);
d = FOREACH c GENERATE charId,null AS alias,null AS guid,null AS data1,null AS data2,null AS temporary;
e = FOREACH b GENERATE charId, encounters#'alias' AS alias, encounters#'guid' AS guid, encounters#'data1' AS data1, encounters#'data2' AS data2, encounters#'temporary' AS temporary;
f = UNION e,d;
dump f;

输出:

(1111,A,192,0,0,1)
(1111,B,952,0,0,1)
(2222,C,544,0,0,1)
(3333,,,,,)

相关问题