scala—如何从数组中获取字段并将它们连接到字符串(spark dataframe)

zbq4xfa0  于 2021-05-24  发布在  Spark
关注(0)|答案(1)|浏览(445)

我有数组列

|-- packages: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- packageId: string (nullable = true)
|    |    |-- triggers: map (nullable = true)
|    |    |    |-- key: string
|    |    |    |-- value: string (valueContainsNull = true)

如何获取包含所有packageid的新列
列示例:

"packages": [
        {
            "packageId": "package1",
            "triggers": {
                "1": "2"
            }
        },
       {
            "packageId": "package2",
            "triggers": {
                "1": "2",
                "2": "2"
            }
        }           
    ]

package1,package2

我用的是spark 2.4.5

ajsxfq5m

ajsxfq5m1#

df.withColumn("packageList", explode(df.col("packages").getField("packageId")))
  .groupBy(..)
  .agg(concat_ws(",", collect_set("packageList")))

这是我的工作

相关问题