Nginx返回502错误与Gunicorn/Daphne与突发连接

lmvvr0a8 于 4个月前发布在 Nginx

关注(0)|答案(1)|浏览(74)

Kubernetes部署包含2个容器：

Django应用程序与Daphne，绑定在Unix套接字上
Nginx在它前面作为反向代理

UDS在应用程序和Nginx容器之间共享为卷，内存类型。
x1c 0d1x的数据
配置 * 工作 * 如预期，除非有突发连接，例如，当pod重新启动，所有连接从killed pod传播到其余的pod。当连接被终止，我们可以观察到HTTP 502错误，上游服务器不可用/var/run/daphne. sock。它持续一段时间，如果我们切换到TCP端口而不是UDS，它的工作稍微好一点，但502错误仍然存在。
Nginx配置：

worker_processes 1;

user nobody nogroup;
error_log /var/log/nginx/error.log warn;
pid /tmp/nginx.pid;

events {
  worker_connections 16384;
}

http {
  include mime.types;
  # fallback in case we can't determine a type
  default_type application/octet-stream;
  sendfile on;
  access_log off;

  upstream ws_server {
    # fail_timeout=0 means we always retry an upstream even if it failed
    # to return a good HTTP response
    # for UNIX domain socket setups
    server unix:/var/run/daphne.sock fail_timeout=0;
  }

  server {
    listen 8443 ssl reuseport;
    ssl_certificate /etc/nginx/ssl/TLS_CRT;
    ssl_certificate_key /etc/nginx/ssl/TLS_KEY;
    ssl_client_certificate /etc/nginx/ssl/CA_CRT;
    ssl_protocols TLSv1.2 TLSv1.3;
    client_max_body_size 5M;
    client_body_buffer_size 1M;
    keepalive_timeout 65;

    location / {
      access_log /dev/stdout;

      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_set_header Host $http_host;

      proxy_http_version 1.1;
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection "upgrade";

      proxy_redirect off;
      proxy_pass http://ws_server;
    }

    location /nginx_status {
      stub_status on;
    }
  }
}

字符串
应用程序启动如下：

daphne app.asgi:application \
            -u /var/run/daphne.sock \
            --websocket_timeout 86400 \
            --websocket_connect_timeout 30

型
让我们假设我们有50000个WebSocket连接处理10个pod。AWS ALB用于Nginx的前面，并在此部署的目标组上使用最少连接路由算法。由于这些连接是长期存在的，我们有，经过一段时间，每个pod约5000个连接。缩小或重新启动一个pod，并将这约5000个连接分散到9个剩余的pod上（直到新的Pod准备好接收流量-通过活动和就绪探测器，以及就绪门）将导致大量的HTTP 502错误。注意：在终止WebSocket连接时发送关闭并不是解决方案。Pod可能会意外重启，例如。由于OOM或故障节点...
为什么知道为什么以及如何缓解这个问题？这听起来很愚蠢，9个pod不能立即处理5000个连接。

nginx

来源：https://stackoverflow.com/questions/77726173/nginx-returns-502-errors-with-gunicorn-daphne-with-bursts-connections

1条答案

按热度按时间

cl25kdpy1#

这是aws-load-balancer-controller的一个已知问题，即ALB可能在入口控制器注销Pod后向Pod发送请求。
当一个pod正在Terminating时，它会收到一个SIGTERM连接，要求它完成工作，之后它会继续删除pod。在pod开始终止的同时，aws-load-balancer-controller会收到更新的对象，强制它开始从目标组中删除pod并初始化drainage。
这两个过程-kubelet级别的信号处理和从TG中删除Pod IP-彼此解耦，并且SIGTERM可能在目标组中的目标开始耗尽之前或同时被处理。
因此，在目标组开始自己的耗尽过程之前，pod可能不可用。这可能导致连接丢失，因为LB仍在尝试向正确关闭的pod发送请求。LB将依次回复5xx响应。
您可以跟踪GitHub问题的解决进度。

赞(0）回复(0）举报 4个月前

我来回答

Nginx返回502错误与Gunicorn/Daphne与突发连接

1条答案

相关问题

热门标签

最新问答