pyspark混合连接

shstlldc  于 2021-07-14  发布在  Spark
关注(0)|答案(1)|浏览(198)

我有两个Dataframe:left_df和right_df,它们有公共列要连接: ['col_1, 'col_2'] ,我想加入另一个条件: right_df.col_3.between(left_df.col_4, left_df.col_5)] 代码:

from pyspark.sql import functions as F

join_condition = ['col_1', 
                  'col_2', 
                  right_df.col_3.between(left_df.col_4, left_df.col_5)]
df = left_df.join(right_df, on=join_condition, how='left')

df.write.parquet('/tmp/my_df')

但我得到的错误如下:

TypeError: Column is not iterable

为什么我不能把这三个条件加在一起?

e0uiprwp

e0uiprwp1#

不能将字符串与列混合。表达式必须是字符串列表或列列表,而不是两者的混合。您可以将前两项转换为列表达式。

from pyspark.sql import functions as F

join_condition = [left_df.col_1 == right_df.col_1, 
                  left_df.col_2 == right_df.col_2, 
                  right_df.col_3.between(left_df.col_4, left_df.col_5)]

df = left_df.join(right_df, on=join_condition, how='left')

相关问题