scalaspark—将数据从一个Dataframe复制到另一个具有嵌套模式和相同列名的df中

sxpgvts3  于 2021-07-09  发布在  Spark
关注(0)|答案(1)|浏览(319)

df1-带数据的平面Dataframe

+---------+--------+-------+                                                    
|FirstName|LastName| Device|
+---------+--------+-------+
|   Robert|Williams|android|
|    Maria|Sharpova| iphone|
+---------+--------+-------+

root
 |-- FirstName: string (nullable = true)
 |-- LastName: string (nullable = true)
 |-- Device: string (nullable = true)

df2-具有相同列名的空Dataframe

+------+----+
|header|body|
+------+----+
+------+----+

root
 |-- header: struct (nullable = true)
 |    |-- FirstName: string (nullable = true)
 |    |-- LastName: string (nullable = true)
 |-- body: struct (nullable = true)
 |    |-- Device: string (nullable = true)

df2架构代码:

val schema = StructType(Array(
StructField("header", StructType(Array(
StructField("FirstName", StringType),
StructField("LastName", StringType)))), 
StructField("body", StructType(Array(
StructField("Device", StringType))))
))

带有来自df1的数据的df2将是最终输出。
对于复杂的模式,需要对多个列执行此操作,并使其可配置。必须在不使用case类的情况下执行此操作。
方法#1-使用schema.fields.mapMapdf1->df2?
方法#2-创建新的df并定义数据和模式?
方法#3-使用zip和map转换来定义“select col as col”查询。。不知道这是否适用于嵌套(structtype)架构
我该怎么做呢?

htzpubme

htzpubme1#

import spark.implicits._
import org.apache.spark.sql.functions._

val sourceDF = Seq(
  ("Robert", "Williams", "android"),
  ("Maria", "Sharpova", "iphone")
).toDF("FirstName", "LastName", "Device")

val resDF = sourceDF
  .withColumn("header", struct('FirstName, 'LastName))
  .withColumn("body", struct(col("Device")))
  .select('header, 'body)

resDF.printSchema
//  root
//  |-- header: struct (nullable = false)
//  |    |-- FirstName: string (nullable = true)
//  |    |-- LastName: string (nullable = true)
//  |-- body: struct (nullable = false)
//  |    |-- Device: string (nullable = true)

resDF.show(false)
//  +------------------+---------+
//  |header            |body     |
//  +------------------+---------+
//  |[Robert, Williams]|[android]|
//  |[Maria, Sharpova] |[iphone] |
//  +------------------+---------+

相关问题