输入Dataframe:
{
"F1" : "A",
"F2" : "B",
"F3" : [
{
"name" : "N1",
"sf1" : "val_1",
"sf2" : "val_2"
},
{
"name" : "N2",
"sf1" : "val_3",
"sf2" : "val_4"
}
],
"F4" : {
"SF1" : "val_5",
"SF2" : "val_6",
"SF3" : "val_7"
}
}
期望输出:
[
{
"F1" : "A",
"F2" : "B",
"F3_name" : "N1",
"F3_sf1" : "val_1",
"F3_sf2" : "val_2",
"F4_SF1" : "val_7",
"F4_SF2" : "val_8",
"F4_SF3" : "val_9",
},
{
"F1" : "A",
"F2" : "B",
"F3_name" : "N2",
"F3_sf1" : "val_3",
"F3_sf2" : "val_4",
"F4_SF1" : "val_7",
"F4_SF2" : "val_8",
"F4_SF3" : "val_9",
}
]
``` `F3` 是结构的数组。新的数据框应该是扁平的,并且已经根据数据框中的项目数将这一行转换成一行或多行(本例中为2行) `F3` .
我是spark&scala的新手。任何关于如何实现这一转变的想法都将非常有用。
谢谢!
2条答案
按热度按时间eulz3vhy1#
你也可以先用
explode
. 然后,您可以使用一系列别名(例如。,$"F3.name" as "F3_name"
):ftf50wuq2#
你可以用
inline
展开f3,然后*
要展开f4: