转换Dataframe格式

t40tm48m  于 2021-05-17  发布在  Spark
关注(0)|答案(1)|浏览(503)

我的Dataframe格式如下-

|-- id: string (nullable = true)
 |-- epoch: string (nullable = true)
 |-- data: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

并转换为具有多个值-

|-- id: string (nullable = true)
 |-- epoch: string (nullable = true)
 |-- key: string (nullable = true)
 |-- value: string (nullable = true)

例子:
发件人:

1,12345, [pq -> r, ab -> c]

收件人:

1,12345, pq ,r
1,12345, ab ,c

我正在尝试此代码,但不起作用-

val array2Df = array1Df.flatMap(line =>
              line.getMap[String, String](2).map(
                (line.getString(0),line.getString(1),_)
            ))
kr98yfug

kr98yfug1#

尝试跟踪

val arrayData = Seq(
      Row("1","epoch_1",Map("epoch_1_key1"->"epoch_1_val1","epoch_1_key2"->"epoch_1_Val2")),
      Row("2","epoch_2",Map("epoch_2_key1"->"epoch_2_val1","epoch_2_key2"->"epoch_2_Val2"))
    )    

 val arraySchema = new StructType()
      .add("Id",StringType)
      .add("epoch", StringType)
      .add("data", MapType(StringType,StringType))

  val df = spark.createDataFrame(spark.sparkContext.parallelize(arrayData),arraySchema)
  df.printSchema()
  df.show(false)


之后,需要根据数据列进行分解。别忘了
导入org.apache.spark.sql.functions.explode

df.select($"Id",explode($"data")).show(false)

相关问题