在带有分隔值的Dataframe中求和

wd2eg0qa 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(227)

我随身带着下面的数据框。

val df1=Seq(
("1_2_3","5_10"),
("4_5_6","15_20")
)toDF("c1","c2")

+-----+-----+
|   c1|   c2|
+-----+-----+
|1_2_3| 5_10|
|4_5_6|15_20|
+-----+-----+

如何根据条件在单独的列中求和-
-省略第一列中分隔符“-”后的第三个值。
-将每列的第一个值相加，即在1_2_3和4_5_6中省略“_3”和“_6”，然后加1,5和2,10。同时加上15+4和20+5。
预期产量-

+-----+-----+-----+
|   c1|   c2|  res|
+-----+-----+-----+
|1_2_3| 5_10| 6_12|
|4_5_6|15_20|19_25|
+-----+-----+-----+

scala apache-spark

来源：https://stackoverflow.com/questions/63264973/finding-sum-in-a-spark-scala-dataframe-with-delimited-values

1条答案

按热度按时间

a14dhokn1#

试试这个- zip_with + split ```
val df1=Seq(
("1_2_3","5_10"),
("4_5_6","15_20")
)toDF("c1","c2")
df1.show(false)

df1.withColumn("res",
  expr("concat_ws('_', zip_with(split(c1, '_'), split(c2, '_'), (x, y) -> cast(x+y as int)))"))
  .show(false)

/**
  * +-----+-----+-----+
  * |c1   |c2   |res  |
  * +-----+-----+-----+
  * |1_2_3|5_10 |6_12 |
  * |4_5_6|15_20|19_25|
  * +-----+-----+-----+
  */

`update dynamically for 50 columns`
val end = 51 // 50 cols
val df = spark.sql("select '1_2_3' as c1")
val new_df = Range(2, end).foldLeft(df){(df, i) => df.withColumn(s"c$i", $"c1")}
new_df.show(false)
/**
* +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
* |c1 |c2 |c3 |c4 |c5 |c6 |c7 |c8 |c9 |c10 |c11 |c12 |c13 |c14 |c15 |c16 |c17 |c18 |c19 |c20 |c21 |c22 |c23 |c24 |c25 |c26 |c27 |c28 |c29 |c30 |c31 |c32 |c33 |c34 |c35 |c36 |c37 |c38 |c39 |c40 |c41 |c42 |c43 |c44 |c45 |c46 |c47 |c48 |c49 |c50 |
* +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
* |1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|
* +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
/
val res = new_df.withColumn("res", $"c1")
Range(2, end).foldLeft(res){(df4, i) =>
df4.withColumn("res",
expr(s"concat_ws('', zip_with(split(res, ''), split(${s"c$i"}, '_'), (x, y) -> cast(x+y as int)))"))
}
.show(false)
/*
* +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------+
* |c1 |c2 |c3 |c4 |c5 |c6 |c7 |c8 |c9 |c10 |c11 |c12 |c13 |c14 |c15 |c16 |c17 |c18 |c19 |c20 |c21 |c22 |c23 |c24 |c25 |c26 |c27 |c28 |c29 |c30 |c31 |c32 |c33 |c34 |c35 |c36 |c37 |c38 |c39 |c40 |c41 |c42 |c43 |c44 |c45 |c46 |c47 |c48 |c49 |c50 |res |
* +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------+
* |1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|1_2_3|50_100_150|
* +-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+----------+
*/

赞(0）回复(0）举报 2021-05-27

我来回答

在带有分隔值的Dataframe中求和

1条答案

相关问题

热门标签

最新问答