下面是为pandas-df编写的代码,由于内存问题,我不得不转移到pyspark,这就是为什么我需要转换此代码以便可以为spark-df执行。我试着直接运行这个,但是它产生了一个错误。在pyspark中,下面的代码有什么替代方法?
def units(x):
if x <= 0:
return 0
if x >= 1:
return 1
sets = df.applymap(units)
这里是我得到的错误:
AttributeErrorTraceback (most recent call last)
<ipython-input-20-7e54b4e7a7e7> in <module>()
----> 1 sets = pivoted.applymap(units)
/usr/lib/spark/python/pyspark/sql/dataframe.py in __getattr__(self, name)
1180 if name not in self.columns:
1181 raise AttributeError(
-> 1182 "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
1183 jc = self._jdf.apply(name)
1184 return Column(jc)
AttributeError: 'DataFrame' object has no attribute 'applymap'
1条答案
按热度按时间eit6fx6z1#
您可以将单位函数 Package 为自定义项: