我有两个pyspark dataframe,如下所示:
df1 = spark.createDataFrame(
["yes","no","yes23", "no3", "35yes", """41no["maybe"]"""],
"string"
).toDF("location")
df2 = spark.createDataFrame(
["yes","no"],
"string"
).toDF("location")
我想检查df1的位置列中的值是否为startswith,df2的位置列中的值是否为startswith,反之亦然。
比如:
df1.select("location").startsWith(df2.location)
下面是我在这里期望的输出:
+-------------+
| location|
+-------------+
| yes|
| no|
| yes23|
| no3|
+-------------+
1条答案
按热度按时间zsohkypk1#
在我看来,使用spark sql最简单: