背景

Nginx运行在kubernets中, 反向代理service提供服务.

kubernetes版本v1.9.1+a0ce1bc657.

问题:

配置如下:

1
2
3
location ^~/info {
    proxy_pass: http://serviceName:port;
}

删除并重建Service的时候, nginx会出现下面的问题:

connect() failed (113: No route to host) … upstream: “xxxxx”

分析

通过google发现, 是nginx的dns解析方案的问题.

nginx官方的说明:

  • If the domain name can’t be resolved, NGINX fails to start or reload its configuration.
  • NGINX caches the DNS records until the next restart or configuration reload, ignoring the records’ TTL values.
  • We can’t specify another load‑balancing algorithm, nor can we configure passive health checks or other features defined by parameters to the server directive, which we’ll describe in the next section.

意思是说, nginx在启动的时候就会解析proxy_pass后的域名, 并把ip缓存下来, 而且没有TTL. 只有在restart或者reload的时候才会再次解析.

解决方案

使用nginx pod的解析服务器作为resolver:

1
2
3
4
5
6
7
#nginx conf
resolver NAME_SERVER  valid=30s ipv6=off;
set $service "http://serviceName:port";

location ^~/info {
    proxy_pass: $service;
}

使用shell获取pod中使用的解析服务器

1
NAME_SERVER=`cat /etc/resolv.conf | grep "nameserver" | awk '{print $2}' | tr '\n' ' '`

参考: Nginx proxy_pass with $remote_addr

Nginx with dynamic upstreams

另一个问题

serviceName could not be resolved (3: Host not found)

service的短名称是解析不了的, 需要使用serviceName.namespace.svc.clusterName.