当结构中的所有值都为空时,如何在scala spark中为结构设置空值?

webghufk  于 2021-05-17  发布在  Spark
关注(0)|答案(2)|浏览(391)

我有一个sparkscala数据框,其中一列是struct,当struct中的所有值都为null时,我想要null而不是objects。

val someDF = Seq(
  (8, null,null),
  (64, "mouse", "s"),
  (-27, "horse", "e")
).toDF("a", "b", "c")

def make_week_struct (week:String) : Column = {

   val summary = struct($"b", $"c").alias(s"wks_${week}_jrny")

   return summary
}

val week1_summary = make_week_struct("1")

var dd = someDF.select($"a",week1_summary)

display(dd)

样本数据

a       b        c       
8       null     null
64      mouse    s    
-27     horse    e

电流输出

a   wks_1_jrny
8   object:{a:null, b:null}
64  object:{a:"mouse", b:"s"}
-27 object:{a:"horse", b:"e"}

预期产量

a   wks_1_jrny
8   null
64  object:{a:"mouse", b:"s"}
-27 object:{a:"horse", b:"e"}
u1ehiz5o

u1ehiz5o1#

这也应该起作用:

import org.apache.spark.sql.functions._
import spark.implicits._

val df = List(
  (None, None),
  (None, Some("abc")),
  (Some(1), Some("xyz"))
).toDF("id", "name")

val structCols = Seq("id", "name")
val dataStruct = struct(structCols.map(col): _*)
val emptyStruct = struct(df.schema.fields.filter(f => structCols.contains(f.name)).map(f => lit(null).cast(f.dataType).as(f.name)):_*)

df
  .select(when(dataStruct.equalTo(emptyStruct), lit(null: StructType)).otherwise(dataStruct).as("col"))
  .show(false)
dzhpxtsq

dzhpxtsq2#

你也可以使用 to_json 函数筛选空json(&F) {} .

scala> 

dd
.withColumn("wks_1_jrny",
              when(
                to_json($"wks_1_jrny") =!= "{}", // Filter Empty Json values.
                $"wks_1_jrny"
              )
            )
.show(false)

+---+----------+
|a  |wks_1_jrny|
+---+----------+
|8  |null      |
|64 |[mouse,s] |
|-27|[horse,e] |
+---+----------+

相关问题