使用scala将dataframe列名作为参数传递给函数?

gpnt7bae  于 2021-07-12  发布在  Spark
关注(0)|答案(1)|浏览(348)
I have a function that adds 2 columns:

def sum_num (num1: Int, num2: Int): Int = {
    return num1 + num2
}

我有一个Dataframedf,值如下

+----+----+----+
|col1|col2|col3|
+----+----+----+
|1   |2   |5   |
|7   |4   |4   |
+----+----+----+

我想添加一列并向函数传递列名,但下面的代码不起作用。它给出了一个错误发现列required is int

val newdf = df.withColumn("sum_of_cols1", sum_num($col1, $ col2))
              .withColumn("sum_of_cols2", sum_num($col1, $ col3))
vh0rcniy

vh0rcniy1#

将代码更改为:

import spark.implicits._

def sum_num (num1: Column, num2: Column): Column = {
  return num1 + num2
}

val newdf = df.withColumn("sum_of_cols1", sum_num($"col1", $"col2"))
  .withColumn("sum_of_cols2", sum_num($"col1", $"col3"))

必须在spark sql列上操作。你可以用它们做算术运算。看看可以使用的运算符

相关问题