Envoy listener filter times out 问题

最近在看 openservicemesh 相关内容,这周更新了 main 分支的代码之后。发现原本 v0.5.0 时可以正常代理的 mysql 流量,在新的 commit 中无法代理了。

开启 envoy 的 filter debug 日志后发现出现了超时。

[2020-12-09 08:54:42.285][15][debug][filter] [source/extensions/filters/listener/original_dst/original_dst.cc:18] original_dst: New connection accepted
[2020-12-09 08:54:42.285][15][debug][filter] [source/extensions/filters/listener/http_inspector/http_inspector.cc:38] http inspector: new connection accepted
[2020-12-09 08:54:42.285][15][trace][filter] [source/extensions/filters/listener/http_inspector/http_inspector.cc:105] http inspector: recv: 0
[2020-12-09 08:54:57.286][15][debug][conn_handler] [source/server/connection_handler_impl.cc:273] listener filter times out after 15000 ms

贴一下 outbound listener 的配置片段:

{
    "@type": "type.googleapis.com/envoy.admin.v3.ListenersConfigDump",
    "version_info": "1",
    "dynamic_listeners": [{
        "name": "outbound_listener",
        "active_state": {
            "version_info": "1",
            "listener": {
                "@type": "type.googleapis.com/envoy.config.listener.v3.Listener",
                "name": "outbound_listener",
                "address": {
                    "socket_address": {
                        "address": "0.0.0.0",
                        "port_value": 15001
                    }
                },
                "filter_chains": [
                    //省略其他 filter chain
                    {
                    "filters": [{
                        "name": "envoy.filters.network.tcp_proxy",
                        "typed_config": {
                            "@type": "type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy",
                            "stat_prefix": "passthrough-outbound",
                            "cluster": "passthrough-outbound"
                        }
                    }],
                    "name": "outbound-egress-filter-chain"
                }],
                "listener_filters": [{
                        "name": "envoy.filters.listener.original_dst"
                    },
                    {
                        "name": "envoy.filters.listener.http_inspector"
                    }
                ],
                "traffic_direction": "OUTBOUND"
            },
            "last_updated": "2020-12-08T10:07:41.191Z"
        }
    }]
}

分析

Envoy 那边看到了有个 issue,为此增加了 timeout 的设置,默认 15s。

  • listener_filters_timeout:默认 15s。等待 listener filter 完成的超时时间,如果设置为 0,则不超时。如果没有设置下面的配置为true,则会直接关闭连接。
  • continue_on_listener_filters_timeout:默认 false。超时时是否使用默认的 filter_chain 创建连接。

对比之前的版本,不同之处是在 filter 中多了 envoy.filters.listener.http_inspector。一个 http_inspector 为何 hold 住整个 follow。

正好昨天配置了 Clion 用于源码阅读,看下 http_inspector 的源码:

http_inspector.cconRead方法,在做MSG_PEEK是铁定读不到数据的,见上面的日志,同时查看 stats http_inspector.read_error: 0。可见,返回了ParseState::Continue

Api::IoError::IoErrorCode::Again的意思是:No data available right now, try again later. 对于ParseState::Continue的解释是:Parser expects more data.

screenshot 2020-12-09 at 16.55.37

由于onRead返回了ParseState::ContinueonAccept方法会返回 Network::FilterStatus::StopIteration。返回之前会注册个针对Event::FileReadyType::Read | Event::FileReadyType::Closed 的 callback,用于读取到数据或者关闭文件时进行回调处理。

Network::FilterStatus::StopIterationStop executing further filters.

screenshot 2020-12-09 at 16.52.20

source/server/connection_handler_impl.cccontinueFilterChain方法,看注释:Blocking at the filter but no error。等待上面的 callback 被调用。

screenshot 2020-12-09 at 17.13.28

由于前面直接从for loop 中跳出,迭代并没有执行到end,会启动一个超时的 timer。 screenshot 2020-12-09 at 17.20.17

参考:

comments powered by Disqus