如何选择性加入pig

bvpmtnay  于 2021-06-25  发布在  Pig
关注(0)|答案(1)|浏览(330)

我有两个数据集。。主数据.txt

{"id":"foo", "some_field:12354, "score":0}
{"id":"foobar", "some_field:12354, "score":0}

得分数据.txt

{"id":"foo", "score":1}
{"id":"foobar","score":20}

....
所以在主数据中。。分数初始化为0。。也。。主数据和分数数据有一些共同的ID。。
对于常见的id:我想用score\u数据中的score替换main\u数据中的“score”
如果元素不存在。。然后我想让分数为0。。

pb3s4cty

pb3s4cty1#

为什么要将“score”初始化为0?你可以跳过它,加入 main_data (左外侧)和 score_data . 无论您是否跳过,这都应该起作用:

main_data = LOAD USING SOME STORAGE; -- asume we have id as column
score_data = LOAD USING SOME STORAGE; -- asume we have id, score as columns
joined_data = JOIN main_data BY main_data::id LEFT OUTER, score_data BY score_data::id;
results = FOREACH joined_data GENERATE main_data::id, (score_data::score IS NULL ? 0 : score_data::score);
STORE results USING SOMETHING SOMEWHERE;

相关问题