spark在从msql选择10gb数据时提供oom

tct7dpnv  于 2021-05-29  发布在  Spark
关注(0)|答案(1)|浏览(457)

我在一个独立的集群中运行我的作业,其中有一个主集群和一个从集群,我的spark集群配置如下:

CPU cores - 4
RAM - 16GB

当我在做的时候,我在给予 -executor-cores 3 --executor-memory 10g . 这项工作在5gb的数据下运行良好。我懂了 block manager size 502MB 当我运行作业时在日志中。
代码结构:

df = sc.read.format('jdbc').options(driver='com.mysql.jdbc.Driver', url=jdbc_url, dbtable=query_str,numPartitions=12,partitionColumn="cord_uid",lowerBound=1,upperBound=12).load()
df.write.csv(s3_bucketname, mode="append")

错误日志

2020-06-05 16:08:26,064 INFO scheduler.TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 476230 ms on <container_ip>(executor 0) (9/11)
2020-06-05 16:09:14,289 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20200605154949-0013/0 is now EXITED (Command exited with code 1)
2020-06-05 16:09:14,315 INFO cluster.StandaloneSchedulerBackend: Executor app-20200605154949-0013/0 removed: Command exited with code 1
2020-06-05 16:09:14,323 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20200605154949-0013/1 on worker-20200605114514-172.26.0.5-41969 (<container_ip>:41969) with 3 core(s)
2020-06-05 16:09:14,324 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20200605154949-0013/1 on hostPort <container_ip>:41969 with 3 core(s), 10.0 GB RAM
2020-06-05 16:09:14,324 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20200605154949-0013/1 is now RUNNING
2020-06-05 16:09:14,334 ERROR scheduler.TaskSchedulerImpl: Lost executor 0 on <container_ip>: Remote RPC client disassociated. Likely due to containers exceeding thresholds, or network issues. Check driver logs for WARN messages.
Caused by: java.lang.OutOfMemoryError: Java heap space

我是否需要通过设置一些属性来清理块管理器内存,因为我可以看到即使对于已完成的任务它也没有释放内存?

yuvru6vn

yuvru6vn1#

尝试设置下面的config选项&使用多个执行器并行获取数据,因此获得oom的可能性非常低。

partitionColumn - Partition Column
lowerBound  - <lowest partition number>
upperBound  - <largest partition number>
numPartitions - <number of partitions>

相关问题