基于spark scala中的3种场景在配置单元表中插入/更新记录

bvjxkvbb 于 2021-05-27 发布在 Spark

关注(0)|答案(1)|浏览(278)

我有一个源表，我想根据以下场景将数据更新/插入到输出表中。
源表：

Name|age|dept|sal |school|college|deg|blood_group
aaa |10 |ece |1000|svv   |sas    |be |0+
bbb |20 |it  |2000|svv   |sas    |be |A+

scenario 1: If value name,age,dept doesn't exists on output table,create new record
scenario 2: If value name,age,dept exists on output table , if no changes in school,college then do nothing
scenario 3: If value name,age,dept exists on output table , if changes in school,college then do nothing then create new record

I want to insert data's into output table based on above scenario using either spark sql or spark scala dataframe.

Please suggest me.

apache-spark apache-spark-sql spark-streaming

来源：https://stackoverflow.com/questions/62713168/insert-update-records-in-hive-table-based-on-3-scenarios-in-spark-scala

1条答案

按热度按时间

3j86kqsm1#

我不太确定这是否管用
在编写之前，请先在代码中调用配置单元表，然后用它创建一个表/Dataframe，并将其称为prior\u df

from pyspark.sql import HiveContext
     hive_context = HiveContext(sc)
     bank = hive_context.table("default.bank")
     bank.show()

bank.registerTempTable("bank_temp")
hive_context.sql("select * from bank_temp").show()

现在，与前面的\u df表连接，因为您已经有了一个条件，使用withcolumn和when condition为“no action/filter”事务创建一个新的列。前一个参数将帮助您获得前一个值
将新的df写入配置单元表位置

赞(0）回复(0）举报 2021-05-27

我来回答

基于spark scala中的3种场景在配置单元表中插入/更新记录

1条答案

相关问题

热门标签

最新问答