检查第一个dataframe值startswith第二个dataframe值中的任何一个

wwodge7n  于 2021-05-16  发布在  Spark
关注(0)|答案(1)|浏览(325)

我有两个pyspark dataframe,如下所示:

df1 = spark.createDataFrame(
    ["yes","no","yes23", "no3", "35yes", """41no["maybe"]"""],
    "string"
).toDF("location")

df2 = spark.createDataFrame(
    ["yes","no"],
    "string"
).toDF("location")

我想检查df1的位置列中的值是否为startswith,df2的位置列中的值是否为startswith,反之亦然。
比如:

df1.select("location").startsWith(df2.location)

下面是我在这里期望的输出:

+-------------+
|     location|
+-------------+
|          yes|
|           no|
|        yes23|
|          no3|
+-------------+
zsohkypk

zsohkypk1#

在我看来,使用spark sql最简单:

df1.createOrReplaceTempView('df1')
df2.createOrReplaceTempView('df2')
joined = spark.sql("""
    select df1.*
    from df1
    join df2
    on df1.location rlike '^' || df2.location
""")

相关问题