Zipkin dependencies的坑之二: 心跳超时和Executor OOM
上回说为了解决吞吐问题, 将zipkin-dependencies的版本升级到了2.3.0.
好景不长, 从某一天开始作业运行报错:
Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10000 milliseconds]. This timeout is controlled by spark.executor.heartbeatInterval
...
19/09/18 08:33:20 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 4)
java.lang.OutOfMemoryError: Java heap space
...
解决方案
最新版本(2.3.0)目前不支持额外的spark和elasticsearch-spark的配置, 已经提交了PR
-
超时的解决方案: 为spark指定配置
spark.executor.heartbeatInterval=600000 spark.network.timeout=600000
-
OOM解决方案: 根据实际情况通过
es.input.max.docs.per.partition
配置executor的数量. 调整运行内存及spark.executor.memory