运行spark sql user spark shell，异常抛出[原因：java.lang.illegalargumentexception:字段“id”不存在]

首先，使用spark sql命令创建数据集：

spark.sql("select id ,a.userid,regexp_replace(b.tradeno,',','|') as TradeNo
,Amount ,TradeType ,TxTypeId
,regexp_replace(title,',','|') as title
,status ,tradetime ,TradeStatus
,regexp_replace(otherside,',','') as otherside
from
(
    select userid 
    from tableA
    where daykey='2018-10-30'
    group by userid
) a 
left join tableb b
on a.userid=b.userid 
where b.userid is not null")

结果是：

dataset: org.apache.spark.sql.DataFrame = [id: bigint, userid: int ... 9 more fields]

然后，使用以下命令将数据集导出为csv：

dataset.coalesce(40).write.option("delimiter", ",").option("charset", "utf-8").csv("/binlog_test/mycsv.excel")

spark任务运行时，出现以下错误：
驱动程序堆栈跟踪：
org.apache.spark.scheduler.dagscheduler.org$apache$spark$scheduler$dagscheduler$$failjobandindependentstages（dagscheduler）。scala:1430)在org.apache.spark.scheduler.dagscheduler$$anonfun$abortstage$1.apply（dagscheduler。scala:1418)在org.apache.spark.scheduler.dagscheduler$$anonfun$abortstage$1.apply（dagscheduler。scala:1417)在scala.collection.mutable.resizablearray$class.foreach（resizablearray。scala:59)在scala.collection.mutable.arraybuffer.foreach（arraybuffer。scala:48)在org.apache.spark.scheduler.dagscheduler.abortstage（dagscheduler。scala:1417)位于org.apache.spark.scheduler.dagscheduler$$anonfun$handletasksetfailed$1.apply（dagscheduler）。scala:797)在org.apache.spark.scheduler.dagscheduler$$anonfun$handletasksetfailed$1.apply（dagscheduler）。scala:797)在scala.option.foreach（option。scala:257)在org.apache.spark.scheduler.dagscheduler.handletasksetfailed（dagscheduler。scala:797)位于org.apache.spark.scheduler.dagschedulereventprocessloop.doonreceive（dagscheduler。scala:1645)在org.apache.spark.scheduler.dagschedulereventprocessloop.onreceive（dagscheduler。scala:1600)位于org.apache.spark.scheduler.dagschedulereventprocessloop.onreceive（dagscheduler。scala:1589)在org.apache.spark.util.eventloop$$anon$1.run（eventloop。scala:48)在org.apache.spark.scheduler.dagscheduler.runjob（dagscheduler。scala:623)在org.apache.spark.sparkcontext.runjob（sparkcontext。scala:1930)在org.apache.spark.sparkcontext.runjob（sparkcontext。scala:1943)在org.apache.spark.sparkcontext.runjob（sparkcontext。scala:1963)在org.apache.spark.sql.execution.datasources.fileformatwriter$$anonfun$write$1.apply$mcv$sp（fileformatwriter）。scala:127) ... 69更多原因：java.lang.illegalargumentexception:字段“id”不存在。在org.apache.spark.sql.types.structtype$$anonfun$fieldindex$1.apply（structtype。scala:290)在org.apache.spark.sql.types.structtype$$anonfun$fieldindex$1.apply（structtype。scala:290)在scala.collection.maplike$class.getorelse（maplike。scala:128)在scala.collection.abstractmap.getorelse（map。scala:59)在org.apache.spark.sql.types.structtype.fieldindex（structtype。scala:289)在org.apache.spark.sql.hive.orc.orcreation$$anonfun$6.apply（orcfileformat。scala:308)在org.apache.spark.sql.hive.orc.orcreation$$anonfun$6.apply（orcfileformat。scala:308)在scala.collection.traversablelike$$anonfun$map$1.apply（traversablelike。scala:234)在scala.collection.traversablelike$$anonfun$map$1.apply（traversablelike。scala:234)在scala.collection.iterator$class.foreach（iterator。scala:893)在scala.collection.abstractiterator.foreach（迭代器。scala:1336)在scala.collection.iterablelike$class.foreach（iterablelike。scala:72)在org.apache.spark.sql.types.structtype.foreach（structtype。scala:96) 在scala.collection.traversablelike$class.map（traversablelike。scala:234)在org.apache.spark.sql.types.structtype.map（structtype。scala:96)在org.apache.spark.sql.hive.orc.orcrelation$.setrequiredcolumns（orcfileformat。scala:308)位于org.apache.spark.sql.hive.orc.orcfileformat$$anonfun$buildreader$2.apply（orcfileformat）。scala:140)在org.apache.spark.sql.hive.orc.orcfileformat$$anonfun$buildreader$2.apply（orcfileformat）。scala:129)位于org.apache.spark.sql.execution.datasources.fileformat$$anon$1.apply（文件格式）。scala:138)位于org.apache.spark.sql.execution.datasources.fileformat$$anon$1.apply（文件格式）。scala:122)在org.apache.spark.sql.execution.datasources.filescanrdd$$anon$1.nextiterator（filescanrdd。scala:168)位于org.apache.spark.sql.execution.datasources.filescanrdd$$anon$1.hasnext（filescanrdd）。scala:109)位于org.apache.spark.sql.catalyst.expressions.generatedclass$generateEditor.processnext（未知源代码）org.apache.spark.sql.execution.bufferedrowtiterator.hasnext（bufferedrowtiterator。java:43)在org.apache.spark.sql.execution.whistagecodegenexec$$anonfun$8$$anon$1.hasnext（whistagecodegenexec。scala:377)在scala.collection.iterator$$anon$11.hasnext（iterator。scala:408)在org.apache.spark.shuffle.sort.bypassmergesortshufflewriter.write（bypassmergesortshufflewriter。java:126)在org.apache.spark.scheduler.shufflemaptask.runtask（shufflemaptask。scala:96)在org.apache.spark.scheduler.shufflemaptask.runtask（shufflemaptask。scala:53)在org.apache.spark.scheduler.task.run（task。scala:99)在org.apache.spark.executor.executor$taskrunner.run（executor。scala:325)位于java.util.concurrent.threadpoolexecutor.runworker（threadpoolexecutor。java:1142)在java.util.concurrent.threadpoolexecutor$worker.run（threadpoolexecutor。java:617)在java.lang.thread.run（线程。java:745)
但是，当我直接执行join操作use hive，并用join结果创建一个新表，最后用sparksql命令导出数据集时，一切都很顺利。

运行spark sql user spark shell，异常抛出[原因：java.lang.illegalargumentexception:字段“id”不存在]

暂无答案！

相关问题

热门标签

最新问答