将rdd存储为带分区的序列文件?

5f0d552i  于 2021-06-02  发布在  Hadoop
关注(0)|答案(0)|浏览(131)

我想将一个javardd存储为每小时分区的序列文件,有什么方法可以实现吗?
例如:
我有以下类型的记录:

time,a1,a2,a3,a4,a5,a6,a7,a8

我想把键设为a2,a3,a4,值设为这个键和分区列中的所有值。
在hdfs中,它将存储为:

output/time=12345/sequence_file_of_key_and_values

Sample input:
1486203462,1,45,66,77,ansh,72,976,58
1486203461,1,452,66,77,ansh5,456,8754,09865
1486203462,1,45,66,77,ansh9,772,976,5890
1486203461,1,452,66,77,ansh156,742,96,5951

输出如下:

output/time=1486203462/a sequence file with key as (1,45,66,77) and corresponding values as ((1486203462,1,45,66,77,ansh,72,976,58),
1486203462,1,45,66,77,ansh9,772,976,5890))

output/time=1486203461/a sequence file with key as (1,452,66,77) and corresponding values as ((1486203461,1,452,66,77,ansh5,456,8754,09865),(1486203461,1,452,66,77,ansh156,742,96,5951))

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题