Zipkin dependencies的坑之二: 心跳超时和Executor OOM

上回说为了解决吞吐问题, 将zipkin-dependencies的版本升级到了2.3.0.

好景不长, 从某一天开始作业运行报错:

Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10000 milliseconds]. This timeout is controlled by spark.executor.heartbeatInterval

...

19/09/18 08:33:20 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 4)
java.lang.OutOfMemoryError: Java heap space

...

解决方案

最新版本(2.3.0)目前不支持额外的spark和elasticsearch-spark的配置, 已经提交了PR

  1. 超时的解决方案: 为spark指定配置

    spark.executor.heartbeatInterval=600000
    spark.network.timeout=600000
    
  2. OOM解决方案: 根据实际情况通过es.input.max.docs.per.partition配置executor的数量. 调整运行内存及spark.executor.memory

comments powered by Disqus