scala—将json字符串列缩减为key/val列

cuxqih21  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(253)

我有一个具有以下结构的Dataframe:

| a        | b    |           c                                             |
-----------------------------------------------------------------------------
|01        |ABC   |    {"key1":"valueA","key2":"valueC"}                    |
|02        |ABC   |    {"key1":"valueA","key2":"valueC"}                    |
|11        |DEF   |    {"key1":"valueB","key2":"valueD", "key3":"valueE"}   |
|12        |DEF   |    {"key1":"valueB","key2":"valueD", "key3":"valueE"}   |

我想变成这样:

| a        | b    |      key         |       value     |
--------------------------------------------------------
|01        |ABC   |    key1          |     valueA      |
|01        |ABC   |    key2          |     valueC      |
|02        |ABC   |    key1          |     valueA      |
|02        |ABC   |    key2          |     valueC      |
|11        |DEF   |    key1          |     valueB      |
|11        |DEF   |    key2          |     valueD      |
|11        |DEF   |    key3          |     valueE      |
|12        |DEF   |    key1          |     valueB      |
|12        |DEF   |    key2          |     valueD      |
|12        |DEF   |    key3          |     valueE      |

以一种有效的方式,因为数据集可能相当大。

lkaoscv7

lkaoscv71#

尝试使用 from_json 那么函数 explode 阵列。 Example: ```
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
val df=Seq(("01","ABC","""{"key1":"valueA","key2":"valueC"}""")).toDF("a","b","c")
val Schema = MapType(StringType, StringType)
df.withColumn("d",from_json(col("c"),Schema)).selectExpr("a","b","explode(d)").show(10,false)
//+---+---+----+------+
//|a |b |key |value |
//+---+---+----+------+
//|01 |ABC|key1|valueA|
//|01 |ABC|key2|valueC|
//+---+---+----+------+

相关问题