pyspark-undefined函数collect\u list

xlpyo6sf 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(462)

我使用的是Python2.6.6和spark 1.6.0。我有 df 这样地：

id | name      |  number |
-------------------------- 
1  | joe       | 148590  |
2  | bob       | 148590  |
2  | steve     | 279109  |
3  | sue       | 382901  |
3  | linda     | 148590  |

每当我试着像 df2 = df.groupBy('id','length','type').pivot('id').agg(collect_list('name')) ，我得到以下错误 pyspark.sql.utils.AnalysisException: u'undefined function collect_list;' 为什么会这样？
我也尝试过：hive\u context=hivecontext（sc） df2 = df.groupBy('id','length','type').pivot('id').agg(hive_context.collect_list('name')) 并获取错误： AttributeError: 'HiveContext' object has no attribute 'collect_list'

python DataFrame apache-spark pyspark

来源：https://stackoverflow.com/questions/62683121/pyspark-undefined-function-collect-list

1条答案

按热度按时间

azpvetkf1#

在这里 collect_list 看起来像是用户定义的函数。pysparkapi只支持sum、count等少数预定义函数
如果您引用的是任何其他代码，请确保在某处定义了collect\u list函数。要导入集体主义函数，请在顶部的下面一行添加

from pyspark.sql import functions as F

然后将代码更改为：

df.groupBy('id','length','type').pivot('id').agg(F.collect_list(name))

如果已经定义了，请尝试下面的代码段。

df.groupBy('id','length','type').pivot('id').agg({'name':'collect_list'})

赞(0）回复(0）举报 2021-05-27

我来回答

pyspark-undefined函数collect\u list

1条答案

相关问题

热门标签

最新问答