将Dataframe转换为嵌套的json输出

soat7uwm  于 2021-07-14  发布在  Spark
关注(0)|答案(2)|浏览(330)

我有一个Dataframe,它是从数据中派生出来的,它给了我这样的东西
IDIdentifier实际成本发生时间:1ABC12324212021-04-16t19:07:002xyz98712342021-04-16t19:25:272xyz98792872021-04-16t19:32:43112337392021-04-16t19:26:30356787852021-04-16t19:13:00
我的要求是最终转储文件应该将整个Dataframe作为一个嵌套的json,如下所示

{
"hits": [
    {
        "id": 1,
        "identifier": "abc123",
        "cost": [
            {
                "actual_cost": 24,
                "cost_incurred": 21,
                "timestamp": "2021-04-16T19:07:00"
            },
            {
                "actual_cost": 37,
                "cost_incurred": 39,
                "timestamp": "2021-04-16T19:26:30"
            }
        ]
    },
    {
        "id": 2,
        "identifier": "xyz987",
        "cost": [
            {
                "actual_cost": 12,
                "cost_incurred": 34,
                "timestamp": "2021-04-16T19:25:27"
            },
            {
                "actual_cost": 37,
                "cost_incurred": 39,
                "timestamp": "2021-04-16T19:26:30"
            }
        ]
    },
    {
        "id": 3,
        "identifier": "abc567",
        "cost": [
            {
                "actual_cost": 87,
                "cost_incurred": 85,
                "timestamp": "2021-04-16T19:13:00"
            }
        ]
    }
]
}

我正在查看map函数,但找不出将结果分组的方法。任何线索或解决方案将不胜感激。

a2mppw5e

a2mppw5e1#

to_json 将成为您的朋友:)以及一些分组和聚合:

df.createOrReplaceTempView("df")

result = spark.sql("""
    select 
        to_json(struct(collect_list(item) hits)) result 
    from (
        select 
            struct(
                id, identifier, collect_list(struct(actual_cost, cost_incurred, timestamp)) cost
            ) item 
        from df 
        group by id, identifier
    )
""")

result.show()
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|result                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"hits":[{"id":"2","identifier":"xyz987","cost":[{"actual_cost":"12","cost_incurred":"34","timestamp":"2021-04-16T19:25:27"},{"actual_cost":"92","cost_incurred":"87","timestamp":"2021-04-16T19:32:43"}]},{"id":"1","identifier":"abc123","cost":[{"actual_cost":"24","cost_incurred":"21","timestamp":"2021-04-16T19:07:00"},{"actual_cost":"37","cost_incurred":"39","timestamp":"2021-04-16T19:26:30"}]},{"id":"3","identifier":"abc567","cost":[{"actual_cost":"87","cost_incurred":"85","timestamp":"2021-04-16T19:13:00"}]}]}|
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
kq0g1dla

kq0g1dla2#

下面是如何使用groupby和tojson实现的

val resultDf = df.groupBy("id", "identifier")
  .agg(collect_list(struct("actual_cost", "cost_incurred", "timestamp")) as "cost")
  .toJSON
resultDf.show(false)

结果:

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value                                                                                                                                                                                  |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"id":2,"identifier":"xyz987","cost":[{"actual_cost":12,"cost_incurred":34,"timestamp":"2021-04-16T19:25:27"},{"actual_cost":92,"cost_incurred":87,"timestamp":"2021-04-16T19:32:43"}]}|
|{"id":1,"identifier":"abc123","cost":[{"actual_cost":24,"cost_incurred":21,"timestamp":"2021-04-16T19:07:00"},{"actual_cost":37,"cost_incurred":39,"timestamp":"2021-04-16T19:26:30"}]}|
|{"id":3,"identifier":"abc567","cost":[{"actual_cost":87,"cost_incurred":85,"timestamp":"2021-04-16T19:13:00"}]}                                                                        |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

如果你想排成一排

result.agg(to_json(collect_list(struct(result.columns.map(col): _*))).as("hits"))
.show(false)

结果:

+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|hits                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[{"id":2,"identifier":"xyz987","cost":[{"actual_cost":12,"cost_incurred":34,"timestamp":"2021-04-16T19:25:27"},{"actual_cost":92,"cost_incurred":87,"timestamp":"2021-04-16T19:32:43"}]},{"id":1,"identifier":"abc123","cost":[{"actual_cost":24,"cost_incurred":21,"timestamp":"2021-04-16T19:07:00"},{"actual_cost":37,"cost_incurred":39,"timestamp":"2021-04-16T19:26:30"}]},{"id":3,"identifier":"abc567","cost":[{"actual_cost":87,"cost_incurred":85,"timestamp":"2021-04-16T19:13:00"}]}]|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

相关问题