spark sql中的外部联接失败,原因是analysisexception:群集上的未解析运算符(spark 2.3.2)

j8yoct9x  于 2021-05-17  发布在  Spark
关注(0)|答案(0)|浏览(285)

在spark 2.3.2上,我尝试使用以下代码连接两个数据集:

List<String> filterColList = Arrays.asList("a","b","c","d");
scala.collection.Seq<String> filterColSeq = JavaConverters.asScalaIteratorConverter(filterColList.iterator()).asScala().toSeq();
joinedDataset = s3extractedData.join(broadcast(inputDieDataset), filterColSeq, "full_outer" );

当我尝试连接从两个csv文件读取的两个数据集时,它在本地工作。但是,在spark cluster上,连接失败,出现以下错误。在集群上,一个数据集来自avro文件,另一个来自csv文件。

Exception in thread "main" org.apache.spark.sql.AnalysisException: unresolved operator 'Project [coalesce(a#67, a#10) AS a#3770, coalesce(b#68, b#11) AS b#3771, coalesce(c#69, c#12) AS c#3772, coalesce(d#70, d#13) AS d#3773, ...];;

'Project [coalesce(fablot#76, fablot#10) AS fablot#3770, coalesce(wafer#77, wafer#11) AS wafer#3771, coalesce(diex#90, diex#12) AS diex#3772, coalesce(diey#91, diey#13) AS diey#3773, dsid#67, teststep#68, process#69, dslot#70, testerno#71, stageno#72, foupno#73, foupslot#74, backsideid#75, teststart#78, testend#79, testqty#80, wafersize#81, testerrecipe#82, testerpgm#83, proberrecipe#84, chucktemp#85, performanceboard#86, probecard#87, notch#88, ... 243 more fields]

+- Join FullOuter, ((((a#67 = a#10) && (b#68 = cast(b#11 as int))) && (c#69 = cast(c#12 as int))) && (d#70 = cast(d#13 as int)))

   :- SubqueryAlias extractedData

   :  +- Relation[a#67,b#68,c#69,d#70,... 240 more fields] avro

   +- ResolvedHint (broadcast)

      +- SubqueryAlias uploadedData

         +- Relation[a#10,b#11,c#12,d#13,custom_param1#14,custom_param2#15,custom_param3#16] csv

at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41)
at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:92)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$3.apply(CheckAnalysis.scala:356)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$3.apply(CheckAnalysis.scala:354)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:354)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:92)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:74)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:3301)
at org.apache.spark.sql.Dataset.join(Dataset.scala:916)
at com.wdc.ddp.dataextractor.S3DataExtractor.adaExecutor(S3DataExtractor.java:97)
at com.wdc.ddp.dataextractor.AdaDataExtractionDriver.startSparkDriver(AdaDataExtractionDriver.java:159)
at com.wdc.ddp.dataextractor.AdaDataExtractionDriver.main(AdaDataExtractionDriver.java:55)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

我也尝试使用“outer”作为连接操作类型,但仍然得到相同的错误。只有“内部”连接工作没有任何问题。
有什么问题吗?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题