上回说为了解决吞吐问题, 将zipkin-dependencies的版本升级到了2.3.0.

好景不长, 从某一天开始作业运行报错:

1
2
3
4
5
6
7
8
9
Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10000 milliseconds]. This timeout is controlled by spark.executor.heartbeatInterval

...

19/09/18 08:33:20 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 4)
java.lang.OutOfMemoryError: Java heap space

...

解决方案

最新版本(2.3.0)目前不支持额外的spark和elasticsearch-spark的配置, 已经提交了PR

  1. 超时的解决方案: 为spark指定配置

    1
    2
    
    spark.executor.heartbeatInterval=600000
    spark.network.timeout=600000
  2. OOM解决方案: 根据实际情况通过es.input.max.docs.per.partition配置executor的数量. 调整运行内存及spark.executor.memory