scala spark在没有for循环的情况下，基于dataframe中的另一列递增一列

eyh26e7m 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(383)

我有一个像下面这样的Dataframe。我想要一个名为cutofftype的新列—它不是当前单调递增的数字，而是在id列每次更改时重置为1。
df=df.orderby（“id”，“date”）.withcolumn（“cutofftype”，单调递增的\u id（）+1）

+------+---------------+----------+
|   ID |    date       |cutofftype|
+------+---------------+----------+
| 54441|     2016-06-20|         1|
| 54441|     2016-06-27|         2|
| 54441|     2016-07-04|         3|
| 54441|     2016-07-11|         4|
| 54500|     2016-05-02|         5|
| 54500|     2016-05-09|         6|
| 54500|     2016-05-16|         7|
| 54500|     2016-05-23|         8|
| 54500|     2016-06-06|         9|
| 54500|     2016-06-13|        10|
+------+---------------+----------+

目标如下：

+------+---------------+----------+
|   ID |    date       |cutofftype|
+------+---------------+----------+
| 54441|     2016-06-20|         1|
| 54441|     2016-06-27|         2|
| 54441|     2016-07-04|         3|
| 54441|     2016-07-11|         4|
| 54500|     2016-05-02|         1|
| 54500|     2016-05-09|         2|
| 54500|     2016-05-16|         3|
| 54500|     2016-05-23|         4|
| 54500|     2016-06-06|         5|
| 54500|     2016-06-13|         6|
+------+---------------+----------+

我知道这可以用for循环来实现-我想不用for循环来实现>>有出路吗？

scala apache-spark

来源：https://stackoverflow.com/questions/63792289/scala-spark-incrementing-a-column-based-on-another-column-in-dataframe-without-f

1条答案

按热度按时间

u5i3ibmn1#

按问题简单划分。你应该使用 window .

import org.apache.spark.sql.expressions.Window
val w = Window.partitionBy("ID").orderBy("date")

df.withColumn("cutofftype", row_number().over(w)).show()

+-----+----------+----------+
|   ID|      date|cutofftype|
+-----+----------+----------+
|54500|2016-05-02|         1|
|54500|2016-05-09|         2|
|54500|2016-05-16|         3|
|54500|2016-05-23|         4|
|54500|2016-06-06|         5|
|54500|2016-06-13|         6|
|54441|2016-06-20|         1|
|54441|2016-06-27|         2|
|54441|2016-07-04|         3|
|54441|2016-07-11|         4|
+-----+----------+----------+

赞(0）回复(0）举报 2021-05-27

我来回答

scala spark在没有for循环的情况下，基于dataframe中的另一列递增一列

1条答案

相关问题

热门标签

最新问答