pyspark Spark SQL将变量传递到查询

zysjyyx4  于 2023-01-25  发布在  Spark
关注(0)|答案(3)|浏览(229)

我到处寻找这个问题的答案,尝试了所有的方法,但似乎都不起作用,我尝试在python中引用spark.sql查询中的变量赋值,运行python 3和spark 2.3.1版本。

bkt = 1

prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
                FROM pwrcrv_tmp\
                where EXTR_CURR_NUM_CYC_DLQ=$bkt\
                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
                group by ((year(fdr_date))*100)+month(fdr_date)\
                order by ((year(fdr_date))*100)+month(fdr_date)")

prime.show(50)

错误:

prime = spark.sql(s"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts                FROM pwrcrv_tmp         where EXTR_CURR_NUM_CYC_DLQ=$bkt                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')                group by ((year(fdr_date))*100)+month(fdr_date)                order by ((year(fdr_date))*100)+month(fdr_date)")
                                                                                                                                                                                                                                                                                                                                                                                         ^
SyntaxError: invalid syntax
vpfxa7rd

vpfxa7rd1#

我发现正确的语法埋在这个数据库后。
https://forums.databricks.com/questions/115/how-do-i-pass-parameters-to-my-sql-statements.html
您在查询前面添加一个小写的f,并在查询中用大括号括住变量名。

bkt = 1

prime = spark.sql(f"SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts\
            FROM pwrcrv_tmp\
            where EXTR_CURR_NUM_CYC_DLQ={bkt}\
            and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')\
            group by ((year(fdr_date))*100)+month(fdr_date)\
            order by ((year(fdr_date))*100)+month(fdr_date)")

prime.show(50)
6uxekuva

6uxekuva2#

这应该行得通

p_filename ='some value'
z='some value'

query = "INSERT into default.loginfordetails (filename,logdesc) values ('{}','{}') ".format(p_filename,z)
 
spark.sql(query)
oogrdqng

oogrdqng3#

由于您的查询使用多行查询,因此建议您采用良好的编码风格。为了更容易地使用""" """,并避免斜线一起作为一个很好的编码风格"\",确实有很多问题或转换,以解析在python和pyspark一般或当使用nbconvert为笔记本到脚本和variables括号{}内的查询或使用.format(bkt)
选件{}

bkt=1;
prime = spark.sql(f"""SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts
                FROM pwrcrv_tmp
                where EXTR_CURR_NUM_CYC_DLQ={bkt}
                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')
                group by ((year(fdr_date))*100)+month(fdr_date)
                order by ((year(fdr_date))*100)+month(fdr_date)""")

prime.show(50);

选件.format()

bkt=1;
prime = spark.sql(f"""SELECT ((year(fdr_date))*100)+month(fdr_date) as fdr_year, count(*) as counts
                FROM pwrcrv_tmp
                where EXTR_CURR_NUM_CYC_DLQ={}
                and EXTR_ACCOUNT_TYPE in('PS','PT','PD','PC','HV','PA')
                group by ((year(fdr_date))*100)+month(fdr_date)
                order by ((year(fdr_date))*100)+month(fdr_date)""".format(bkt)

prime.show(50);

相关问题