在emr中陷入了多个Parquet文件和连接的“shuffle files lost for host”

cx6n0qe3  于 2021-05-16  发布在  Spark
关注(0)|答案(0)|浏览(416)

我试图在多个Dataframe上执行一个查询,每个帧由大约4个Parquet组成,除了其中一个由大约1800个Parquet文件组成。
emr示例配置为自动缩放。当我尝试运行一个包含3个以上连接的查询时,执行会被卡住。
我尽了最大努力,增加了超时时间,实现了洗牌、动态分配和广播。以下是spark配置:

spark.network.timeout=4800
spark.executor.heartbeatInterval=4200
spark.sql.broadcastTimeout=3600
spark.sql.autoBroadcastJoinThreshold=209715200
spark.shuffle.service.enabled=true
spark.dynamicAllocation.enabled=true

这是我在最后得到的输出日志,没有进一步的错误/异常。

.Logging$class.logInfo(Logging.scala:54)ogger{39} : Shuffle files lost for host: ip-152-20-116-20.eu-central-2.compute.internal (epoch 7)
.Logging$class.logInfo(Logging.scala:54)ogger{39} : Shuffle files lost for host: ip-172-23-216-128.eu-central-1.compute.internal (epoch 8)
.Logging$class.logInfo(Logging.scala:54)ogger{39} : Shuffle files lost for host: ip-172-23-219-85.eu-central-1.compute.internal (epoch 9)
.Logging$class.logInfo(Logging.scala:54)ogger{39} : Shuffle files lost for host: ip-172-23-218-123.eu-central-1.compute.internal (epoch 4)
.Logging$class.logInfo(Logging.scala:54)ogger{39} : Shuffle files lost for host: ip-172-23-216-84.eu-central-1.compute.internal (epoch 5)
.Logging$class.logInfo(Logging.scala:54)ogger{39} : Shuffle files lost for host: ip-172-23-218-159.eu-central-1.compute.internal (epoch 6)
.Logging$class.logInfo(Logging.scala:54)ogger{39} : Shuffle files lost for host: ip-172-23-219-86.eu-central-1.compute.internal (epoch 7)

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题