pyspark使用sockettimeoutexception停止

a9wyjsp7  于 2021-06-08  发布在  Kafka
关注(0)|答案(1)|浏览(906)

我们目前正在尝试使用pyspark 2.2.0在dataproc群集上运行spark作业,但spark作业在看似随机的时间过后停止,并显示以下错误消息:

17/07/25 00:52:48 ERROR org.apache.spark.api.python.PythonRDD: Error while sending iterator
java.net.SocketTimeoutException: Accept timed out
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
at java.net.ServerSocket.implAccept(ServerSocket.java:545
at java.net.ServerSocket.accept(ServerSocket.java:513)
at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:702)

这个错误有时可能只需要几分钟,也可能需要3个小时。从个人经验来看,spark作业在出错前会运行大约30分钟到1小时。
一旦spark作业遇到错误,它就停止了。不管我等多久,它什么也不输出。在yarn resourcemanager上,应用程序状态仍然标记为“running”,我必须按ctrl+c终止程序。此时,应用程序被标记为“已完成”。
我使用 /path/to/spark/bin/spark-submit --jars /path/to/jar/spark-streaming-kafka-0-8-assembly_2.11-2.2.0.jar spark_job.py 主节点控制台上的命令。jar文件是必需的,因为spark作业将来自kafka(与spark作业在同一集群上运行)的消息流化,并将一些消息推回到同一个kafka到不同的主题。
我已经在这个网站上查看了一些其他的答案(主要是这个和这个),它们有一定的帮助,但我们还无法在日志中找到可能是什么导致遗嘱执行人死亡的地方。到目前为止,我已经通过yarn resourcemanager监视了任务期间的节点,并查看了位于中的日志 /var/logs/hadoop-yarn 每个节点中的目录。我能在日志里找到的唯一“线索”是 org.apache.spark.executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM 这是唯一一行写在死亡执行者日志上的。
作为最后的努力,我们试图增加集群的内存大小,希望这个问题能马上解决,但事实并非如此。最初,该集群运行在一个1主2工作集群上,该集群具有4vcpu、15gb内存。我们创建了一个新的dataproc集群,这次有1个主进程和3个工作进程,每个工作进程都有8vcpu52gb内存(主进程的规格与以前相同)。
我们想知道的是:
1在何处/如何查看导致执行人终止的异常?
2这是如何配置spark的问题吗?
三。dataproc图像版本为“预览”。那可能是错误的原因吗?
最终,
4我们如何解决这个问题?我们还能采取什么措施?
这个spark作业需要在不确定的时间内连续地从kafka流出来,所以我们希望这个错误被修复,而不是延长错误发生所需的时间。
以下是一些来自yarn resourcemanager的屏幕截图,以演示我们看到的内容:
群集度量

执行人摘要

这些屏幕截图来自spark作业因错误而停止之前。
这是spark配置文件,位于 /path/to/spark/conf/spark-defaults.conf (未更改dataproc的默认设置):

spark.master yarn
spark.submit.deployMode client
spark.yarn.jars=local:/usr/lib/spark/jars/*
spark.eventLog.enabled true
spark.eventLog.dir hdfs://highmem-m/user/spark/eventlog

# Dynamic allocation on YARN

spark.dynamicAllocation.enabled true
spark.dynamicAllocation.minExecutors 1
spark.executor.instances 10000
spark.dynamicAllocation.maxExecutors 10000
spark.shuffle.service.enabled true
spark.scheduler.minRegisteredResourcesRatio 0.0

spark.yarn.historyServer.address highmem-m:18080
spark.history.fs.logDirectory hdfs://highmem-m/user/spark/eventlog

spark.executor.cores 2
spark.executor.memory 4655m
spark.yarn.executor.memoryOverhead 465

# Overkill

spark.yarn.am.memory 4655m
spark.yarn.am.memoryOverhead 465

spark.driver.memory 3768m
spark.driver.maxResultSize 1884m
spark.rpc.message.maxSize 512

# Add ALPN for Bigtable

spark.driver.extraJavaOptions 
spark.executor.extraJavaOptions 

# Disable Parquet metadata caching as its URI re-encoding logic does

# not work for GCS URIs (b/28306549). The net effect of this is that

# Parquet metadata will be read both driver side and executor side.

spark.sql.parquet.cacheMetadata=false

# User-supplied properties.

# Mon Jul 24 23:12:12 UTC 2017

spark.executor.cores=4
spark.executor.memory=18619m
spark.driver.memory=3840m
spark.driver.maxResultSize=1920m
spark.yarn.am.memory=640m
spark.executorEnv.PYTHONHASHSEED=0

我不太清楚那辆车在哪儿 User-supplied properties 来自。
编辑:
关于集群的一些附加信息:我使用zookeeper、kafka和jupyter初始化操作脚本,这些脚本位于 https://github.com/GoogleCloudPlatform/dataproc-initialization-actions 按照zookeeper->kafka->jupyter的顺序(不幸的是,我现在没有足够的声誉发布超过2个链接)
编辑2:
从@dennis的有见地的问题来看,我们运行spark作业的同时特别关注使用了更高堆上存储内存的执行器。我注意到的是,与其他执行器相比,总是来自worker#0的执行器具有更高的存储内存使用率。worker#0的执行者的stdout文件始终为空。这三行在stderr中重复了很多次:

17/07/27 16:32:01 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:01 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to 
17/07/27 16:32:01 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to 
17/07/27 16:32:04 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:04 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to 
17/07/27 16:32:04 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to 
17/07/27 16:32:07 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:07 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to 
17/07/27 16:32:07 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to 
17/07/27 16:32:09 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:09 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to 
17/07/27 16:32:09 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to 
17/07/27 16:32:10 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:10 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to 
17/07/27 16:32:10 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to 
17/07/27 16:32:13 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:13 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to 
17/07/27 16:32:13 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to 
17/07/27 16:32:14 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:14 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to 
17/07/27 16:32:14 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to 
17/07/27 16:32:15 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:15 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to 
17/07/27 16:32:15 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to 
17/07/27 16:32:18 INFO kafka.utils.VerifiableProperties: Verifying properties
17/07/27 16:32:18 INFO kafka.utils.VerifiableProperties: Property group.id is overridden to 
17/07/27 16:32:18 INFO kafka.utils.VerifiableProperties: Property zookeeper.connect is overridden to

好像每1~3秒重复一次。
对于来自其他工作节点的其他执行器的stdout和stderr,它们是空的。
编辑3:
正如@dennis的评论中提到的,我们保留了spark作业使用的kafka主题,复制因子为1。我还发现我忘了在kafka配置文件中将worker#2添加到zookeeper.connect,也忘了在spark中为来自kafka的消费者流式消息提供组id。我已经修复了这些位置(用复制因子3重新制作主题),并观察到现在的工作负载主要集中在worker#1上。按照@dennis的建议,我已经跑了 sudo jps 在ssh绑定到worker#1并获得以下输出之后: [Removed this section to save character space; it was only the error messages from a failed call to jmap so it didn't hold any useful information] 编辑4:
我现在在worker#1 executors的stdout文件中看到:

2017-07-27 22:16:24
Full thread dump OpenJDK 64-Bit Server VM (25.131-b11 mixed mode):
===Truncated===
Heap
 PSYoungGen      total 814592K, used 470009K [0x000000063c180000, 0x000000069e600000, 0x00000007c0000000)
  eden space 799744K, 56% used [0x000000063c180000,0x0000000657e53598,0x000000066ce80000)
  from space 14848K, 97% used [0x000000069d780000,0x000000069e5ab1b8,0x000000069e600000)
  to   space 51200K, 0% used [0x0000000698200000,0x0000000698200000,0x000000069b400000)
 ParOldGen       total 574464K, used 180616K [0x0000000334400000, 0x0000000357500000, 0x000000063c180000)
  object space 574464K, 31% used [0x0000000334400000,0x000000033f462240,0x0000000357500000)
 Metaspace       used 49078K, capacity 49874K, committed 50048K, reserved 1093632K
  class space    used 6054K, capacity 6263K, committed 6272K, reserved 1048576K

2017-07-27 22:06:44
Full thread dump OpenJDK 64-Bit Server VM (25.131-b11 mixed mode):
===Truncated===
Heap
 PSYoungGen      total 608768K, used 547401K [0x000000063c180000, 0x000000066a280000, 0x00000007c0000000)
  eden space 601088K, 89% used [0x000000063c180000,0x000000065d09c498,0x0000000660c80000)
  from space 7680K, 99% used [0x0000000669b00000,0x000000066a2762c8,0x000000066a280000)
  to   space 36864K, 0% used [0x0000000665a80000,0x0000000665a80000,0x0000000667e80000)
 ParOldGen       total 535552K, used 199304K [0x0000000334400000, 0x0000000354f00000, 0x000000063c180000)
  object space 535552K, 37% used [0x0000000334400000,0x00000003406a2340,0x0000000354f00000)
 Metaspace       used 48810K, capacity 49554K, committed 49792K, reserved 1093632K
  class space    used 6054K, capacity 6263K, committed 6272K, reserved 1048576K

错误发生时,从worker#2接收到一个执行器 SIGNAL TERM 被贴上了死亡的标签。在这个时候,它是唯一死去的遗嘱执行人。
奇怪的是,大约10分钟后,spark的工作又恢复了。查看spark ui界面,只有worker#1中的执行器是活动的,其余的都是死的。这是第一次发生。
编辑5:
再次遵循@dennis的建议(谢谢,@dennis!),这次跑了 sudo -u yarn jmap -histo <pid> . 这是十大最占用内存的课程之一 CoarseGrainedExecutorBackend 大约10分钟后:

num     #instances         #bytes  class name
----------------------------------------------
   1:        244824      358007944  [B
   2:        194242      221184584  [I
   3:       2062554      163729952  [C
   4:        746240       35435976  [Ljava.lang.Object;
   5:           738       24194592  [Lorg.apache.spark.unsafe.memory.MemoryBlock;
   6:        975513       23412312  java.lang.String
   7:        129645       13483080  java.io.ObjectStreamClass
   8:        451343       10832232  java.lang.StringBuilder
   9:         38880       10572504  [Z
  10:        120807        8698104  java.lang.reflect.Field

此外,我还遇到了一种新类型的错误,导致执行器死亡。它生成了一些在spark ui中突出显示的失败任务,并在executor的stderr中发现:

17/07/28 00:44:03 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 6821.0 (TID 2585)
java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:156)
    at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84)
    at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66)
    at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:367)
    at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:366)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:366)
    at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:361)
    at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:736)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:342)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
17/07/28 00:44:03 ERROR org.apache.spark.executor.Executor: Exception in task 0.1 in stage 6821.0 (TID 2586)
java.lang.AssertionError: assertion failed
    at scala.Predef$.assert(Predef.scala:156)
    at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84)
    at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66)
    at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:367)
    at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:366)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:366)
    at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
    at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:361)
    at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:736)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:342)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
17/07/28 00:44:03 ERROR org.apache.spark.util.Utils: Uncaught exception in thread stdout writer for /opt/conda/bin/python
java.lang.AssertionError: assertion failed: Block rdd_5480_0 is not locked for reading
    at scala.Predef$.assert(Predef.scala:170)
    at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:299)
    at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:720)
    at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:516)
    at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:46)
    at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:35)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
    at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:509)
    at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:333)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1954)
    at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)
17/07/28 00:44:03 ERROR org.apache.spark.util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[stdout writer for /opt/conda/bin/python,5,main]
java.lang.AssertionError: assertion failed: Block rdd_5480_0 is not locked for reading
    at scala.Predef$.assert(Predef.scala:170)
    at org.apache.spark.storage.BlockInfoManager.unlock(BlockInfoManager.scala:299)
    at org.apache.spark.storage.BlockManager.releaseLock(BlockManager.scala:720)
    at org.apache.spark.storage.BlockManager$$anonfun$1.apply$mcV$sp(BlockManager.scala:516)
    at org.apache.spark.util.CompletionIterator$$anon$1.completion(CompletionIterator.scala:46)
    at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:35)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator$class.foreach(Iterator.scala:893)
    at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
    at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:509)
    at org.apache.spark.api.python.PythonRunner$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:333)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1954)
    at org.apache.spark.api.python.PythonRunner$WriterThread.run(PythonRDD.scala:269)

编辑6:
这一次,我接受了 jmap 运行40分钟后:

num     #instances         #bytes  class name
----------------------------------------------
   1:         23667      391136256  [B
   2:         25937       15932728  [I
   3:        159174       12750016  [C
   4:           334       10949856  [Lorg.apache.spark.unsafe.memory.MemoryBlock;
   5:         78437        5473992  [Ljava.lang.Object;
   6:        125322        3007728  java.lang.String
   7:         40931        2947032  java.lang.reflect.Field
   8:         63431        2029792  com.esotericsoftware.kryo.Registration
   9:         20897        1337408  com.esotericsoftware.kryo.serializers.UnsafeCacheFields$UnsafeObjectField
  10:         20323         975504  java.util.HashMap

这些都是实验的结果 ps ux :

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
yarn       601  0.8  0.9 3008024 528812 ?      Sl   16:12   1:17 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Dproc_nodema
yarn      6086  6.3  0.0  96764 24340 ?        R    18:37   0:02 /opt/conda/bin/python -m pyspark.daemon
yarn      8036  8.2  0.0  96296 24136 ?        S    18:37   0:00 /opt/conda/bin/python -m pyspark.daemon
yarn      8173  9.4  0.0  97108 24444 ?        S    18:37   0:00 /opt/conda/bin/python -m pyspark.daemon
yarn      8240  9.0  0.0  96984 24576 ?        S    18:37   0:00 /opt/conda/bin/python -m pyspark.daemon
yarn      8329  7.6  0.0  96948 24720 ?        S    18:37   0:00 /opt/conda/bin/python -m pyspark.daemon
yarn      8420  8.5  0.0  96240 23788 ?        R    18:37   0:00 /opt/conda/bin/python -m pyspark.daemon
yarn      8487  6.0  0.0  96864 24308 ?        S    18:37   0:00 /opt/conda/bin/python -m pyspark.daemon
yarn      8554  0.0  0.0  96292 23724 ?        S    18:37   0:00 /opt/conda/bin/python -m pyspark.daemon
yarn      8564  0.0  0.0  19100  2448 pts/0    R+   18:37   0:00 ps ux
yarn     31705  0.0  0.0  13260  2756 ?        S    17:56   0:00 bash /hadoop/yarn/nm-local-dir/usercache/<user_name>/app
yarn     31707  0.0  0.0  13272  2876 ?        Ss   17:56   0:00 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java 
yarn     31713  0.4  0.7 2419520 399072 ?      Sl   17:56   0:11 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx6
yarn     31771  0.0  0.0  13260  2740 ?        S    17:56   0:00 bash /hadoop/yarn/nm-local-dir/usercache/<user_name>/app
yarn     31774  0.0  0.0  13284  2800 ?        Ss   17:56   0:00 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java 
yarn     31780 11.1  1.4 21759016 752132 ?     Sl   17:56   4:31 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx1
yarn     31883  0.1  0.0  96292 27308 ?        S    17:56   0:02 /opt/conda/bin/python -m pyspark.daemon

这个 pidCoarseGrainedExecutorBackEnd31780 在这种情况下。
编辑7:
增加的 heartbeatInterval 在Spark设置没有改变任何东西,这是有道理的事后诸葛亮。
我创建了一个简短的bash脚本,用控制台使用者读取kafka的内容5秒钟,然后将消息写入文本文件。文本文件被上传到hadoop,其中spark流来自hadoop。我们通过这个方法测试了超时是否与Kafka有关。
从hadoop流式处理并从spark输出到kafka导致sockettimeout
从Kafka直接流,而不是从Spark输出到Kafka造成插座超时
从hadoop流式处理而不是从spark输出到kafka导致sockettimeout
所以我们继续假设Kafka与超时无关。
我们安装了stackdriver监控来查看超时时的内存使用情况。没有什么真正有趣的指标;整个过程中,内存使用看起来相对稳定(对于最繁忙的节点,最多只能保持在10~15%左右)。
我们猜测可能是工作节点之间的通信导致了问题。目前,我们的数据通信量非常低,因此即使一个工人也可以相对轻松地处理所有工作负载。
在单节点集群上运行spark作业,同时从另一个集群的kafka代理进行流式传输,似乎停止了sockettimeout。。。除了 AssertionError 上面的记录现在经常发生。
根据@dennis的建议,我创建了一个新的集群(也是单节点的),这次没有jupyter初始化脚本,这意味着spark现在在pythonv2.7.9上运行(没有anaconda)。第一次运行时,遇到了spark SocketTimeoutException 在15秒内。第二次只跑了2个多小时,同样的失败 AssertionError . 我开始怀疑这是否是spark内部的问题。第三次跑了大约40分钟,然后撞上了一辆卡车 SocketTimeoutException .

bprjcwpo

bprjcwpo1#

我的一个客户看到各种生产pyspark作业(spark版本2.2.1)在google cloud dataproc中断断续续地失败,其堆栈跟踪与您的非常相似:

ERROR org.apache.spark.api.python.PythonRDD: Error while sending iterator
    java.net.SocketTimeoutException: Accept timed out
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
        at java.net.ServerSocket.implAccept(ServerSocket.java:545)
        at java.net.ServerSocket.accept(ServerSocket.java:513)
        at org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:711)

我发现在dataproc集群vms上禁用ipv6似乎可以解决这个问题。一种方法是将这些行添加到dataproc init脚本中,以便在集群创建时运行它们:

printf "\nnet.ipv6.conf.default.disable_ipv6 = 1\nnet.ipv6.conf.all.disable_ipv6=1\n" >> /etc/sysctl.conf
sysctl -p

相关问题