scala程序运行非常慢

6kkfgxo0  于 2021-05-29  发布在  Spark
关注(0)|答案(1)|浏览(372)

我在Yarn模式下运行scala spark程序,程序即使在30分钟后也无法完成,因此不得不取消作业。请让我知道什么调整需要做,使程序运行更快。
对scala的了解有限。
我们用pyspark写的程序,10分钟内就完成了。
算法没有那么复杂。没有连接。我首先学习了scala,然后开始编写代码,但没有python那么好。
从文件创建Dataframe
查找数据被加载到列表并广播
将3列发送到子程序,子程序计算数量(无需复杂计算)并从查找数据中获取描述,然后将单个值返回到Dataframe。
将数据写入csv文件。
从子程序创建自定义项
记录计数:1.4亿大小:30 gb列:21部分文件数:200

spark-submit --master yarn \
--driver-memory 5G \
--executor-cores=3 \ 
--executor-memory=4G \ 
--conf spark.driver.memoryOverhead=512 \ 
--conf spark.executor.memoryOverhead=512 \ 
--conf spark.dynamicAllocation.minExecutors=5 \ 
--conf spark.dynamicAllocation.maxExecutors=10 \ 
--conf spark.dynamicAllocation.initialExecutors=5 \ 
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \ 
--conf "spark.executor.extraJavaOptions=-XX:+UseConcMarkSweepGC" \ 
--conf "spark.driver.extraJavaOptions=-XX:+UseConcMarkSweepGC" \ 
--class org.spark.masking.MainProgram ./ScalaProgramming.jar'

下面的日志来自gc。请让我知道如果任何其他日志是必需的。
有一大堆的旗子,我不能把它们贴在这里。

--conf "spark.executor.extraJavaOptions=-XX:+UseConcMarkSweepGC -XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark" \
--conf "spark.driver.extraJavaOptions=-XX:+UseConcMarkSweepGC -XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark" \

2.213: [GC (CMS Initial Mark) [1 CMS-initial-mark: 0K(1398144K)] 536935K(2027264K), 0.1361808 secs] [Times: user=0.11 sys=0.04, real=0.13 secs] 
2.349: [CMS-concurrent-mark-start]
2.350: [CMS-concurrent-mark: 0.001/0.001 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
2.350: [CMS-concurrent-preclean-start]
2.350: [Preclean SoftReferences, 0.0000071 secs]2.350: [Preclean WeakReferences, 0.0000046 secs]2.350: [Preclean FinalReferences, 0.0000045 secs]2.350: [Preclean PhantomReferences, 0.0000048 secs]2.353: [CMS-concurrent-preclean: 0.003/0.003 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2.353: [CMS-concurrent-abortable-preclean-start]
2.467: [GC (Allocation Failure) 2.467: [ParNew2.480: [SoftReference, 0 refs, 0.0000342 secs]2.480: [WeakReference, 701 refs, 0.0000763 secs]2.480: [FinalReference, 7789 refs, 0.0201415 secs]2.500: [PhantomReference, 0 refs, 3 refs, 0.0000190 secs]2.500: [JNI Weak Reference, 0.0000281 secs]: 559232K->25830K(629120K), 0.0335811 secs] 559232K->25830K(2027264K), 0.0336755 secs] [Times: user=0.33 sys=0.04, real=0.04 secs] 
3.814: [CMS-concurrent-abortable-preclean: 1.065/1.462 secs] [Times: user=5.10 sys=0.25, real=1.46 secs] 
3.815: [GC (CMS Final Remark) [YG occupancy: 307047 K (629120 K)]3.815: [Rescan (parallel) , 0.0060711 secs]3.821: [weak refs processing3.821: [SoftReference, 0 refs, 0.0000093 secs]3.821: [WeakReference, 0 refs, 0.0000066 secs]3.821: [FinalReference, 0 refs, 0.0000063 secs]3.821: [PhantomReference, 0 refs, 0 refs, 0.0000079 secs]3.821: [JNI Weak Reference, 0.0000094 secs], 0.0000570 secs]3.821: [class unloading, 0.0062279 secs]3.827: [scrub symbol table, 0.0056485 secs]3.833: [scrub string table, 0.0005033 secs][1 CMS-remark: 0K(1398144K)] 307047K(2027264K), 0.0197652 secs] [Times: user=0.14 sys=0.00, real=0.02 secs] 
3.835: [CMS-concurrent-sweep-start]
3.835: [CMS-concurrent-sweep: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
3.835: [CMS-concurrent-reset-start]
3.864: [CMS-concurrent-reset: 0.029/0.029 secs] [Times: user=0.06 sys=0.03, real=0.03 secs] 
8.655: [GC (Allocation Failure) 8.655: [ParNew8.668: [SoftReference, 0 refs, 0.0000357 secs]8.668: [WeakReference, 1908 refs, 0.0001536 secs]8.668: [FinalReference, 8067 refs, 0.0204567 secs]8.689: [PhantomReference, 0 refs, 9 refs, 0.0000229 secs]8.689: [JNI Weak Reference, 0.0000224 secs]: 585062K->40277K(629120K), 0.0336923 secs] 585062K->40277K(2027264K), 0.0337921 secs] [Times: user=0.27 sys=0.02, real=0.03 secs] 
13.569: [GC (Allocation Failure) 13.569: [ParNew13.615: [SoftReference, 0 refs, 0.0000415 secs]13.615: [WeakReference, 841 refs, 0.0001029 secs]13.615: [FinalReference, 2145 refs, 0.0053077 secs]13.620: [PhantomReference, 0 refs, 13 refs, 0.0000211 secs]13.620: [JNI Weak Reference, 0.0000214 secs]: 599509K->69888K(629120K), 0.0506452 secs] 599509K->148597K(2027264K), 0.0507616 secs] [Times: user=0.27 sys=0.11, real=0.06 secs] 
15.596: [GC (Allocation Failure) 15.596: [ParNew17.013: [SoftReference, 0 refs, 0.0000447 secs]17.013: [WeakReference, 444 refs, 0.0000785 secs]17.013: [FinalReference, 8380 refs, 0.0125680 secs]17.026: [PhantomReference, 0 refs, 3 refs, 0.0000196 secs]17.026: [JNI Weak Reference, 0.0000143 secs]: 629120K->69888K(629120K), 1.4302181 secs] 707829K->260902K(2027264K), 1.4303383 secs] [Times: user=31.46 sys=0.10, real=1.43 secs] 
19.026: [GC (CMS Initial Mark) [1 CMS-initial-mark: 191014K(1398144K)] 574937K(2027264K), 0.0405526 secs] [Times: user=0.39 sys=0.01, real=0.04 secs] 
19.067: [CMS-concurrent-mark-start]
19.117: [CMS-concurrent-mark: 0.048/0.050 secs] [Times: user=0.88 sys=0.01, real=0.05 secs] 
19.117: [CMS-concurrent-preclean-start]
19.117: [Preclean SoftReferences, 0.0000081 secs]19.117: [Preclean WeakReferences, 0.0000339 secs]19.117: [Preclean FinalReferences, 0.0000053 secs]19.117: [Preclean PhantomReferences, 0.0000060 secs]19.122: [CMS-concurrent-preclean: 0.005/0.005 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 
19.122: [CMS-concurrent-abortable-preclean-start]
21.344: [GC (Allocation Failure) 21.345: [ParNew22.007: [SoftReference, 0 refs, 0.0000454 secs]22.007: [WeakReference, 4452 refs, 0.0004887 secs]22.008: [FinalReference, 4098 refs, 0.0080919 secs]22.016: [PhantomReference, 0 refs, 11 refs, 0.0000249 secs]22.016: [JNI Weak Reference, 0.0000233 secs]: 629120K->69724K(629120K), 0.6716374 secs] 820134K->384114K(2027264K), 0.6719494 secs] [Times: user=14.39 sys=0.21, real=0.67 secs] 
 CMS: abort preclean due to time 24.240: [CMS-concurrent-abortable-preclean: 3.341/5.119 secs] [Times: user=23.16 sys=0.40, real=5.12 secs] 
24.241: [GC (CMS Final Remark) [YG occupancy: 230617 K (629120 K)]24.241: [Rescan (parallel) , 0.0045982 secs]24.246: [weak refs processing24.246: [SoftReference, 0 refs, 0.0000095 secs]24.246: [WeakReference, 0 refs, 0.0000068 secs]24.246: [FinalReference, 0 refs, 0.0000064 secs]24.246: [PhantomReference, 0 refs, 0 refs, 0.0000083 secs]24.246: [JNI Weak Reference, 0.0000154 secs], 0.0000646 secs]24.246: [class unloading, 0.0243305 secs]24.270: [scrub symbol table, 0.0141372 secs]24.284: [scrub string table, 0.0009220 secs][1 CMS-remark: 314389K(1398144K)] 545007K(2027264K), 0.0463381 secs] [Times: user=0.11 sys=0.01, real=0.04 secs] 
24.288: [CMS-concurrent-sweep-start]
24.336: [CMS-concurrent-sweep: 0.049/0.049 secs] [Times: user=0.05 sys=0.00, real=0.05 secs] 
24.336: [CMS-concurrent-reset-start]
24.391: [CMS-concurrent-reset: 0.055/0.055 secs] [Times: user=0.00 sys=0.05, real=0.06 secs] 
87.840: [GC (Allocation Failure) 87.840: [ParNew88.140: [SoftReference, 0 refs, 0.0000504 secs]88.140: [WeakReference, 1758 refs, 0.0002394 secs]88.140: [FinalReference, 5047 refs, 0.0074780 secs]88.148: [PhantomReference, 0 refs, 16 refs, 0.0000204 secs]88.148: [JNI Weak Reference, 0.0000115 secs]: 614858K->69888K(629120K), 0.3078355 secs] 913711K->452374K(2027264K), 0.3080004 secs] [Times: user=6.56 sys=0.32, real=0.31 secs] 
616.341: [GC (Allocation Failure) 616.341: [ParNew616.377: [SoftReference, 0 refs, 0.0000486 secs]616.377: [WeakReference, 1923 refs, 0.0001649 secs]616.377: [FinalReference, 467 refs, 0.0009059 secs]616.378: [PhantomReference, 0 refs, 24 refs, 0.0000197 secs]616.378: [JNI Weak Reference, 0.0000125 secs]: 629120K->35351K(629120K), 0.0365310 secs] 1011606K->445836K(2027264K), 0.0366766 secs] [Times: user=0.42 sys=0.12, real=0.03 secs] 
org.apache.spark.SparkException: Job 3 cancelled because SparkContext was shut down
Heap
 par new generation   total 629120K, used 293922K [0x0000000680000000, 0x00000006aaaa0000, 0x00000006eaaa0000)
  eden space 559232K,  46% used [0x0000000680000000, 0x000000068fc82d28, 0x00000006a2220000)
  from space 69888K,  50% used [0x00000006a6660000, 0x00000006a88e5e60, 0x00000006aaaa0000)
  to   space 69888K,   0% used [0x00000006a2220000, 0x00000006a2220000, 0x00000006a6660000)
 concurrent mark-sweep generation total 1398144K, used 410484K [0x00000006eaaa0000, 0x0000000740000000, 0x00000007c0000000)
 Metaspace       used 97258K, capacity 98404K, committed 98636K, reserved 1134592K
  class space    used 13754K, capacity 14029K, committed 14116K, reserved 1048576K
guicsvcw

guicsvcw1#

首先,我会停止运行所有的垃圾收集输出都被记录下来。这对您没有多大帮助,因为垃圾收集是任何正常运行程序的一部分。
其次,我要看几件事:
你的数据集有多大?
你的算法是什么样子的?
如果算法连接多个RDD,然后对它们进行过滤,则尝试先过滤一个(或两个)输入RDD,然后连接。这将创建较少的参与联接的记录。
就速度而言;这在一定程度上取决于数据集的大小,也取决于算法。
如果你还没有研究它,看看什么是洗牌。如果您可以减少随机数,并将其推迟到处理结束时,通常您会得到更快的程序。
---添加更多信息后编辑---
看起来您正在4 gb块中处理大约30 gb。很有可能有相当多的中间对象被创建和丢弃,这会导致4gb内存块中的高ram波动。我会看看您的代码,看看是否有任何类型的模式大量使用对象(比如多个字符串串联,不在一行上或不使用format方法)。
此外,您还对jvm进行了调优,以使用可能最慢的垃圾收集器。通过切换到并行垃圾收集器,您将获得大约27%的速度增益,使用g1将获得大约22%的增益。请记住,速度上的差异还伴随着内存开销上的差异,因此要监视ram和速度变化中的gc调优。
最后,如果您在ram中维护一个非常大的rdd,请检查您的spark作业是否溢出到磁盘。如果没有这两组代码(python和scala),很难看到作业打算做什么,也很难看到它是否在spark下以低效的方式进行。

相关问题