给定一个包含两列的表:deviceid和devicetype如果deviceid中的字符串长度为5,如何更新列devicetype:
from pyspark.sql.functions import * df.where(length(col("DEVICEID")) = 5).show()
wr98u20j1#
使用 when+otherwise 声明并检查 deviceid==5 更新新值。 Example: ```df=spark.createDataFrame([('abcde',1),('abc',2)],["DEVICEID","DEVICETYPE"])
when+otherwise
deviceid==5
Example:
from pyspark.sql.functions import *
df.withColumn("new_col",when(length(col("deviceid")) ==5,lit("new_length")).otherwise(col("DEVICEID"))).show()
1条答案
按热度按时间wr98u20j1#
使用
when+otherwise
声明并检查deviceid==5
更新新值。Example:
```df=spark.createDataFrame([('abcde',1),('abc',2)],["DEVICEID","DEVICETYPE"])
from pyspark.sql.functions import *
df.withColumn("new_col",when(length(col("deviceid")) ==5,lit("new_length")).otherwise(col("DEVICEID"))).show()
+--------+----------+----------+
|DEVICEID|DEVICETYPE| new_col|
+--------+----------+----------+
| abcde| 1|new_length|
| abc| 2| abc|
+--------+----------+----------+