import pyspark.sql.functions as f
df.select(*[f.sum(cols).alias(cols) for cols in df.columns]).show()
+----+---+---+
|val1| x| y|
+----+---+---+
| 36| 29|159|
+----+---+---+
import pyspark.sql.functions as F
from pyspark.sql.functions import udf
from pyspark.sql.types import *
tst= sqlContext.createDataFrame([(10,7,14),(5,1,4),(9,8,10),(2,6,90),(7,2,30),(3,5,11)],schema=['val1','x','y'])
tst_sum= tst.withColumn("sum_col",sum([tst[coln] for coln in tst.columns]))
2条答案
按热度按时间dfty9e191#
你可以通过求和函数来实现
hc8w905p2#
要将所有列求和为一个新列,可以使用python的sum函数来理解列表
结果:
注意:如果您从pyspark函数导入sum函数
from import pyspark.sql.functions import sum
然后你得把名字改成别的名字,比如from import pyspark.sql.functions import sum_pyspark