python—只在Dataframe的字符串列中修剪空白

yquaqz18 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(393)

我试图在任何给定的Dataframe中删去左右空白，但仅限于字符串列（这样就不会改变Dataframe的模式）。另一种解决方案是修剪所有列，并推断模式或在修剪后替换模式。但我也不知道该怎么做。。。这就是我现在要做的。

from pyspark.sql.functions import col

mmDF.printSchema()
columnList = [item[0] for item in mmDF.dtypes if item[1].startswith('string')]

mmDF = mmDF.withColumn(col, func.ltrim(func.rtrim(mmDF[col] for mmDF_col in columnList)))

mmDF.show()

mmDF.printSchema()

修剪线导致错误：

TypeError: Invalid argument, not a string or column: <generator object <genexpr> at 0x0000027D5C63E248> of type <class 'generator'>. For column literals, use 'lit', 'array', 'struct' or 'create_map' function.

python apache-spark pyspark

来源：https://stackoverflow.com/questions/63158269/trimming-white-space-in-only-string-columns-of-a-dataframe