我有两个Dataframe如下
Dataframe一
+--------------------------------------------
|______subject_______________|______marks___|
| Maths | 89 |
| English | 90 |
| Religion | 80 |
---------------------------------------------
Dataframe二
+-------------------------------------------------------------
|______name__________________|______subject__________________|
| Liza | [Maths] |
| Inter | [Religion, English] |
| Ovin | [Maths, Religion, English] |
--------------------------------------------------------------
预期产量
+-------------------------------------------------------------
|______name__________________|______marks____________________|
| Liza | [89] |
| Inter | [80, 90] |
| Religion | [89, 80, 90] |
--------------------------------------------------------------
为了得到上面的输出,我需要连接dataframeone和dataframetwo。但在dataframetwo中,subject列具有数组,而dataframeone具有字符串值。我尝试了下面的代码,错误后跟
val newDataframe = dataframeTwo.withColumn("myMarks", struct('marks))
val studentMarksDataframe = dataframeOne.join(newDataframe, array_contains(subject, subject)).agg(collect_list('myMarks))
错误
线程“main”org.apache.spark.sql.analysisexception中的异常:引用“unicode”不明确,可能是:subject,subject
如何解决上述问题?
1条答案
按热度按时间tyg4sfes1#
您可以尝试: