检查第一个dataframe值startswith第二个dataframe值中的任何一个

wwodge7n 于 2021-05-16 发布在 Spark

关注(0)|答案(1)|浏览(325)

我有两个pyspark dataframe，如下所示：

df1 = spark.createDataFrame(
    ["yes","no","yes23", "no3", "35yes", """41no["maybe"]"""],
    "string"
).toDF("location")

df2 = spark.createDataFrame(
    ["yes","no"],
    "string"
).toDF("location")

我想检查df1的位置列中的值是否为startswith，df2的位置列中的值是否为startswith，反之亦然。
比如：

df1.select("location").startsWith(df2.location)

下面是我在这里期望的输出：

+-------------+
|     location|
+-------------+
|          yes|
|           no|
|        yes23|
|          no3|
+-------------+

DataFrame apache-spark pyspark apache-spark-sql

来源：https://stackoverflow.com/questions/65201174/check-first-dataframe-value-startswith-any-of-the-second-dataframe-value

1条答案

按热度按时间

zsohkypk1#

在我看来，使用spark sql最简单：

df1.createOrReplaceTempView('df1')
df2.createOrReplaceTempView('df2')
joined = spark.sql("""
    select df1.*
    from df1
    join df2
    on df1.location rlike '^' || df2.location
""")

赞(0）回复(0）举报 2021-05-17

我来回答

检查第一个dataframe值startswith第二个dataframe值中的任何一个

1条答案

相关问题

热门标签

最新问答