spark:从scala中的嵌套数组中删除第一个数组

dvtswwa3  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(426)

我有一个有两列的Dataframe。我想删除每个记录中嵌套数组的第一个数组。例如:-我有一个这样的df

+---+-------+--------+-----------+-------------+
|id |arrayField                                |
+---+------------------------------------------+
|1  |[[Akash,Kunal],[Sonu,Monu],[Ravi,Kishan]] |
|2  |[[Kunal, Mrinal],[Priya,Diya]]            |
|3  |[[Adi,Sadi]]                              |
+---+-------+---------+----------+-------------+

我想要我的输出像this:-

+---+-------+------+------+-------+
|id |arrayField                   |
+---+-----------------------------+
|1  |[[Sonu,Monu],[Ravi,Kishan]]  |
|2  |[[Priya,Diya]]               |
|3  | null                        |
+---+-------+------+------+-------+
nzkunb0c

nzkunb0c1#

来自spark-2.4使用 slice 功能。 Example: ```
df.show(10,false)
/*
+------------------------+
|arrayField |
+------------------------+
|[[A, k], [s, m], [R, k]]|
|[[k, M], [c, z]] |
|A, b |
+------------------------+

  • /

import org.apache.spark.sql.functions._

df.withColumn("sliced",expr("slice(arrayField,2,size(arrayField))")).
withColumn("arrayField",when(size(col("sliced"))==0,lit(null)).otherwise(col("sliced"))).
drop("sliced").
show()
/*
+----------------+
| arrayField|
+----------------+
|[[s, m], [R, k]]|
| c, z|
| null|
+----------------+

  • /

相关问题