Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xhttp+nginx正常工作一段时间后断联,手动重启xray或nginx其中一方即恢复正常,回退2024.12.31版本一切正常 #4373

Closed
4 tasks done
emiyalee1005 opened this issue Feb 8, 2025 · 32 comments

Comments

@emiyalee1005
Copy link

emiyalee1005 commented Feb 8, 2025

完整性要求

  • 我保证阅读了文档,了解所有我编写的配置文件项的含义,而不是大量堆砌看似有用的选项或默认值。
  • 我提供了完整的配置文件和日志,而不是出于自己的判断只给出截取的部分。
  • 我搜索了 issues, 没有发现已提出的类似问题。
  • 问题在 Release 最新的版本上可以成功复现

描述

如题,实验了有半个月,配置是基于https://github.com/XTLS/Xray-examples/tree/main/VLESS-XHTTP3-Nginx
100%排除客户端到服务端的网络问题,切回2024.12.31版本可以一直正常使用,2025.01.30和2025.1.1版本在断联时在服务端日志会看到大量version invalid之类的信息,但是只要重启xray或nginx任意一方立刻恢复正常。
个人推测是新版本某个改动导致nginx和xray之间的socket通信出了问题?

服务器:oracle arm64
系统:ubuntu 24
nginx:1.26.3

重现方式

2025.1.1及以后的版本,使用一段时间后,最短的十几分钟后就出现断联,随后只要重启xray亦或nginx立刻恢复正常

客户端配置


{
  "log": {
    "loglevel": "debug"
  },
  "inbounds": [
    {
      "listen": "127.0.0.1",
      "port": 10814,
      "protocol": "socks",
      "settings": {
        "udp": true
      },
      "sniffing": {
        "enabled": true,
        "destOverride": [
          "fakedns+others"
        ],
        "routeOnly": true
      }
    }
  ],
  "outbounds": [
    {
      "protocol": "vless",
      "settings": {
        "vnext": [
          {
            "address": "**********",
            "port": 443,
            "users": [
              {
                "encryption": "none",
                "id": "********************"
              }
            ]
          }
        ]
      },
      "streamSettings": {
        "network": "xhttp",
        "xhttpSettings": {
          "path": "/abc",
          "mode": "stream-one"
        },
        "security": "tls",
        "tlsSettings": {
          "serverName": "***************",
          "alpn": [
            "h3",
            "h2",
            "h1"
          ]
        }
      }
    },
    {
      "protocol": "socks",
      "settings": {
        "servers": [
          {
            "address": "********************",
            "port": 34218,
            "users": [
              {
                "user": "**************",
                "pass": "********************"
              }
            ]
          }
        ]
      },
      "streamSettings": {
        "network": "tcp",
        "security": "none"
      },
      "tag": "gn"
    },
    {
      "protocol": "freedom",
      "tag": "Direct",
      "settings": {
        "domainStrategy": "UseIP"
      }
    },
    {
      "protocol": "dns",
      "tag": "Dns-Out",
      "settings": {
        "nonIPQuery": "skip"
      }
    }
  ],
  "dns": {
    "servers": [
      "https://8.8.8.8/dns-query",
      "https://1.1.1.1/dns-query",
      {
        "address": "https+local://223.6.6.6/dns-query",
        "domains": [
          "geosite:cn"
        ]
      },
      {
        "address": "https+local://120.53.53.53/dns-query",
        "domains": [
          "geosite:cn"
        ]
      },
      "localhost"
    ]
  },
  "routing": {
    "domainStrategy": "IPIfNonMatch",
    "rules": [
      {
        "ip": [
          "***************"
        ],
        "outboundTag": "gn"
      },
      {
        "domain": [
          "***************"
        ],
        "outboundTag": "gn"
      },
      {
        "domain": [
          "geosite:cn"
        ],
        "outboundTag": "Direct"
      },
      {
        "ip": [
          "geoip:cn",
          "geoip:private"
        ],
        "outboundTag": "Direct"
      },
      {
        "outboundTag": "Dns-Out",
        "network": "tcp,udp",
        "port": "53"
      }
    ]
  }
}

服务端配置


//nginx
server {
  listen [::]:443 ssl ipv6only=off reuseport;
  listen [::]:443 quic reuseport ipv6only=off;
  server_name *************;

location / {
proxy_pass http://localhost:8002/;
}

http2 on;
ssl_certificate /home/ssl/fullchain.cer;
ssl_certificate_key /home/ssl/private.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;

client_header_timeout 5m;
keepalive_timeout 5m;
# 在 location 后填写 /你的 path/
location /abc/ {
client_max_body_size 0;
grpc_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
client_body_timeout 5m;
grpc_read_timeout 315;
grpc_send_timeout 5m;
grpc_pass unix:/dev/shm/xrxh.socket;
}
}

//xray
{
"log": {
"loglevel": "debug"
},
"inbounds": [
{
"listen": "/dev/shm/xrxh.socket,0666",
"protocol": "vless",
"settings": {
"clients": [
{
"id": "" //我是以这个帐户连入
},
{
"id": "
", //未使用
"email": "abc@aaa.com"
}
],
"decryption": "none"
},
"streamSettings": {
"network": "xhttp",
"xhttpSettings": {
"mode": "auto",
"path": "/abc"
}
},
"sniffing": {
"enabled": true,
"destOverride": [
"fakedns+others"
],
"routeOnly": true
}
}
],
"outbounds": [
{
"protocol": "freedom",
"tag": "direct"
},
{
"protocol": "blackhole",
"tag": "block"
},
{
"protocol": "wireguard",
"settings": {
"secretKey": "
",
"address": [
"
",
"********"
],
"peers": [
{
"publicKey": "
",
"endpoint": "
***"
}
]
},
"tag": "wireguard-1"
}
],
"routing": {
"rules": [
{
"type": "field",
"user": [
"abc@aaa.com"
],
"ip": [
"geoip:private"
],
"outboundTag": "block"
},
{
"type": "field",
"user": [
"abc@aaa.com"
],
"outboundTag": "wireguard-1"
}
]
}
}

客户端日志


2025/02/09 05:26:01.250493 [Info] [2271167988] transport/internet/splithttp: XHTTP is dialing to tcp:*.*.*.*:443, mode stream-one, HTTP version 2, host abc.com
2025/02/09 05:26:01.250604 [Info] [2271167988] proxy/vless/outbound: tunneling request to tcp:www.google.com:443 via *.*.*.*:443
2025/02/09 05:26:01.832433 [Info] [2219275785] app/proxyman/inbound: connection ends > proxy/socks: connection ends > context canceled
2025/02/09 05:26:01.838597 [Info] [2485557641] proxy/socks: TCP Connect request to tcp:www.google.com:443
2025/02/09 05:26:01.839443 [Debug] app/dns: domain www.google.com will use DNS in order: [DOH//8.8.8.8 DOH//1.1.1.1 DOHL//223.6.6.6 DOHL//120.53.53.53 localhost]
  2025/02/09 05:26:01.839504 [Debug] app/dns: DOH//8.8.8.8 cache HIT www.google.com -> [142.250.197.164 2404:6800:4005:823::2004]
2025/02/09 05:26:01.839515 [Info] [2485557641] app/dispatcher: default route for tcp:www.google.com:443
2025/02/09 05:26:01.839520 [Info] [2485557641] transport/internet/splithttp: XHTTP is dialing to tcp:*.*.*.*:443, mode stream-one, HTTP version 2, host abc.com
2025/02/09 05:26:01.839562 from tcp:127.0.0.1:64882 accepted tcp:www.google.com:443
2025/02/09 05:26:01.839670 [Info] [2485557641] proxy/vless/outbound: tunneling request to tcp:www.google.com:443 via *.*.*.*:443
2025/02/09 05:26:02.157268 [Info] [1080914123] app/proxyman/inbound: connection ends > proxy/socks: connection ends > context canceled
2025/02/09 05:26:02.165125 [Info] [2731262781] proxy/socks: TCP Connect request to tcp:www.google.com:443
2025/02/09 05:26:02.166138 [Debug] app/dns: domain www.google.com will use DNS in order: [DOH//8.8.8.8 DOH//1.1.1.1 DOHL//223.6.6.6 DOHL//120.53.53.53 localhost]
  2025/02/09 05:26:02.166177 [Debug] app/dns: DOH//8.8.8.8 cache HIT www.google.com -> [142.250.197.164 2404:6800:4005:823::2004]
2025/02/09 05:26:02.166190 [Info] [2731262781] app/dispatcher: default route for tcp:www.google.com:443
2025/02/09 05:26:02.166224 from tcp:127.0.0.1:64883 accepted tcp:www.google.com:443
2025/02/09 05:26:02.166320 [Info] [2731262781] transport/internet/splithttp: XHTTP is dialing to tcp:*.*.*.*:443, mode stream-one, HTTP version 2, host abc.com
2025/02/09 05:26:02.166447 [Info] [2731262781] proxy/vless/outbound: tunneling request to tcp:www.google.com:443 via *.*.*.*:443
2025/02/09 05:26:02.430282 [Info] [3350235458] app/proxyman/inbound: connection ends > proxy/socks: connection ends > context canceled
2025/02/09 05:26:02.470432 [Info] [2271167988] app/proxyman/inbound: connection ends > proxy/socks: connection ends > context canceled
2025/02/09 05:26:03.060507 [Info] [2485557641] app/proxyman/inbound: connection ends > proxy/socks: connection ends > context canceled
2025/02/09 05:26:03.386826 [Info] [2731262781] app/proxyman/inbound: connection ends > proxy/socks: connection ends > context canceled

服务端日志


//xray
Feb 08 21:19:08 instance-20250106-0509 xray[931]: 2025/02/08 21:19:08.199679 [Info] [4115563481] app/proxyman/inbound: connection ends > proxy/vless/inbound: connection ends > proxy/vless/inbound: failed to transfer request payload > body>
Feb 08 21:19:11 instance-20250106-0509 xray[931]: 2025/02/08 21:19:11.981683 [Info] [1288055270] app/proxyman/inbound: connection ends > proxy/vless/inbound: connection ends > proxy/vless/inbound: failed to transfer request payload > body>
Feb 08 21:19:14 instance-20250106-0509 xray[931]: 2025/02/08 21:19:14.150042 [Info] [868563740] proxy/vless/inbound: firstLen = 0
Feb 08 21:19:14 instance-20250106-0509 xray[931]: 2025/02/08 21:19:14.150523 [Info] [868563740] app/proxyman/inbound: connection ends > proxy/vless/encoding: failed to read request version > EOF
Feb 08 21:19:19 instance-20250106-0509 xray[931]: 2025/02/08 21:19:19.033295 [Info] [1637337060] proxy/vless/inbound: firstLen = 0
Feb 08 21:19:19 instance-20250106-0509 xray[931]: 2025/02/08 21:19:19.033651 [Info] [1637337060] app/proxyman/inbound: connection ends > proxy/vless/encoding: failed to read request version > EOF
Feb 08 21:19:41 instance-20250106-0509 xray[931]: 2025/02/08 21:19:41.991675 [Info] [3950893679] proxy/vless/inbound: firstLen = 0
Feb 08 21:19:41 instance-20250106-0509 xray[931]: 2025/02/08 21:19:41.992538 [Info] [3950893679] app/proxyman/inbound: connection ends > proxy/vless/encoding: failed to read request version > EOF
Feb 08 21:19:50 instance-20250106-0509 xray[931]: 2025/02/08 21:19:50.826198 [Info] [959533932] proxy/vless/inbound: firstLen = 0
Feb 08 21:19:50 instance-20250106-0509 xray[931]: 2025/02/08 21:19:50.826604 [Info] [959533932] app/proxyman/inbound: connection ends > proxy/vless/encoding: failed to read request version > EOF

//nginx
2025/02/08 21:31:35 [crit] 994#994: accept4() failed (24: Too many open files)
2025/02/08 21:31:35 [alert] 994#994: 2866 socket() failed (24: Too many open files) while connecting to upstream, client: ::ffff:..., server: abc.com, request: "GET /abc/************************ HTTP/2.0", upstream: "grpc://unix:/dev/shm/xrxh.socket:", host: "abc.com", referrer: "https://abc.com/abc/*************************?x_padding=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
2025/02/08 21:31:35 [alert] 994#994: 2867 socket() failed (24: Too many open files) while connecting to upstream, client: ::ffff:..., server: abc.com, request: "POST /abc/************************ HTTP/2.0", upstream: "grpc://unix:/dev/shm/xrxh.socket:", host: "abc.com", referrer: "https://abc.com/abc/*************************?x_padding=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
2025/02/08 21:31:36 [crit] 994#994: accept4() failed (24: Too many open files)
2025/02/08 21:31:36 [alert] 994#994: 2866 socket() failed (24: Too many open files) while connecting to upstream, client: ::ffff:..., server: abc.com, request: "GET /abc/************************ HTTP/2.0", upstream: "grpc://unix:/dev/shm/xrxh.socket:", host: "abc.com", referrer: "https://abc.com/abc/*************************?x_padding=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
2025/02/08 21:31:36 [alert] 994#994: 2867 socket() failed (24: Too many open files) while connecting to upstream, client: ::ffff:..., server: abc.com, request: "POST /abc/************************ HTTP/2.0", upstream: "grpc://unix:/dev/shm/xrxh.socket:", host: "abc.com", referrer: "https://abc.com/abc/*************************?x_padding=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
2025/02/08 21:31:36 [crit] 994#994: accept4() failed (24: Too many open files)
2025/02/08 21:31:36 [alert] 994#994: 1714 socket() failed (24: Too many open files) while connecting to upstream, client: ::ffff:..., server: abc.com, request: "GET /abc/************************ HTTP/2.0", upstream: "grpc://unix:/dev/shm/xrxh.socket:", host: "abc.com", referrer: "https://abc.com/abc/*************************?x_padding=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
2025/02/08 21:31:36 [alert] 994#994: 1717 socket() failed (24: Too many open files) while connecting to upstream, client: ::ffff:..., server: abc.com, request: "POST /abc/************************ HTTP/2.0", upstream: "grpc://unix:/dev/shm/xrxh.socket:", host: "abc.com", referrer: "https://abc.com/abc/*************************?x_padding=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
2025/02/08 21:31:36 [alert] 994#994: 2444 socket() failed (24: Too many open files) while connecting to upstream, client: ::ffff:..., server: abc.com, request: "GET /abc/************************ HTTP/2.0", upstream: "grpc://unix:/dev/shm/xrxh.socket:", host: "abc.com", referrer: "https://abc.com/abc/*************************?x_padding=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
2025/02/08 21:31:36 [alert] 994#994: 2445 socket() failed (24: Too many open files) while connecting to upstream, client: ::ffff:..., server: abc.com, request: "POST /abc/************************ HTTP/2.0", upstream: "grpc://unix:/dev/shm/xrxh.socket:", host: "abc.com", referrer: "https://abc.com/abc/*************************?x_padding=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
2025/02/08 21:31:37 [crit] 994#994: accept4() failed (24: Too many open files)
2025/02/08 21:31:37 [alert] 994#994: 2446 socket() failed (24: Too many open files) while connecting to upstream, client: ::ffff:..., server: abc.com, request: "GET /abc/************************ HTTP/2.0", upstream: "grpc://unix:/dev/shm/xrxh.socket:", host: "abc.com", referrer: "https://abc.com/abc/*************************?x_padding=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
2025/02/08 21:31:37 [alert] 994#994: 2447 socket() failed (24: Too many open files) while connecting to upstream, client: ::ffff:..., server: abc.com, request: "POST /abc/************************ HTTP/2.0", upstream: "grpc://unix:/dev/shm/xrxh.socket:", host: "abc.com", referrer: "https://abc.com/abc/*************************?x_padding=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

@emiyalee1005
Copy link
Author

已补上日志

@emiyalee1005 emiyalee1005 changed the title xhttp+nginx正常工作一段时间后断联,手动重启xray或nginx其中一方即恢复正常,回退2024.12.30版本一切正常 xhttp+nginx正常工作一段时间后断联,手动重启xray或nginx其中一方即恢复正常,回退2024.12.31版本一切正常 Feb 8, 2025
@RPRX
Copy link
Member

RPRX commented Feb 9, 2025

  1. v25.1.1 几乎没有代码层面的改动,最相关的就是升级了 quic-go
  2. 但是你客户端 alpn 同时写三个,此时用到的是 h2 而非 h3
  3. 服务端 domain socket 还能 too many open files,不太清楚是怎么回事,可能有巨量连接开了没关?内存或磁盘满了?
  4. 因为这个一般常见于 TCP,对于 DS 我还是第一次见,你搜一下
  5. 重启服务端 Xray / Nginx 会导致 socket 全部被释放,所以能恢复正常

@RPRX RPRX closed this as not planned Won't fix, can't repro, duplicate, stale Feb 9, 2025
@emiyalee1005
Copy link
Author

emiyalee1005 commented Feb 9, 2025 via email

@RPRX
Copy link
Member

RPRX commented Feb 9, 2025

我的看法是有很多人在用 Nginx+XHTTP 的组合,如果新版有问题早就有很多人说了

应该是你哪里配置比较奇怪,你应该把 Nginx 配置发出来让别人看看,在群里问问

@emiyalee1005
Copy link
Author

emiyalee1005 commented Feb 9, 2025 via email

@xxxsen
Copy link

xxxsen commented Feb 9, 2025

有碰到类似的问题,版本2025.01.30,但是mode填的是auto,后面看了下是ng的连接数被打满(4进程,单进程2048连接数),然后ng的错误日志里面刷了一堆的worker_connections are not enough错误,正常是不可能把连接数打满的(之前的版本一直正常),然后把模式改成packet-up就没再出现问题了。

====

使用auto的场景下,重启ng后,发现连接数会缓慢增长(中间也有减少的情况,但是整体上是一直增长的),怀疑是部分异常case下连接用完后没有正常关闭

@wtfr-dot
Copy link

wtfr-dot commented Feb 9, 2025

我升级到2025.1.30也出现了这个问题:[crit] 741#741: accept4() failed (24: Too many open files),重启xray就恢复正常,然后过一阵又出现,再重启又正常

@RPRX
Copy link
Member

RPRX commented Feb 9, 2025

猜想可能是 stream-up 服务端的新行为(保活)导致连接没断开?不过按理说 POST 应该和 GET 同步断开才对

@emiyalee1005 提供的信息可能有误,因为 v25.1.1 服务端还没这个行为,你要说明 Xray 服务端、客户端分别是什么版本

你们试试双端均为 v25.1.30 时:

  1. 只用 packet-up(应该没问题)
  2. 只用 stream-one(应该也没问题)
  3. 只用 stream-up(应该有问题)
  4. 服务端设置 "scStreamUpServerSecs": -1 以关闭新行为,然后只用 stream-up

既然可能存在 bug,v25.2.9 也要 nightly 了,解决完这个再正式发版,也可能是 Nginx 哪里需要特殊设置

@RPRX RPRX reopened this Feb 9, 2025
@xxxsen
Copy link

xxxsen commented Feb 9, 2025

4. scStreamUpServerSecs

双端 v25.1.30

测试了下stream-up, 一秒一次curl, 隔一段时间后, nginx侧出现很多ESTABLISH状态的链接(client->ng, ng->server), 同时这些链接隔很长一段时间后还继续保持着, 在服务端重启后, ng->server链接立即被关闭, 但是client->ng方向的链接隔了一段时间才关闭。

服务端增加参数: "scStreamUpServerSecs": -1 重新按上面的步骤测试, 在经过一段时间后, nginx侧还是会出现很多无法关闭的链接, 然后隔了一段时间后才逐个关闭。

相关的配置

客户端配置:

{
  "outbounds": [
    {
      "protocol": "vless",
      "settings": {
        "vnext": [
          {
            ...
          }
        ]
      },
      "streamSettings": {
        "network": "xhttp",
        "xhttpSettings": {
          "mode": "stream-up",
          "host": "abc.test.com",
          "path": "/path"
        },
        "security": "tls",
        "tlsSettings": {
          "serverName": "abc.test.com",
          "fingerprint": "chrome"
        }
      },
      "tag": "out-default"
    }
  ]
}

服务端配置:

{
  "log": {
    "loglevel": "debug",
    "access": "/tmp/xray-server/xray_access.log",
    "error": "/tmp/xray-server/xray_error.log"
  },
  "inbounds": [
    {
      "listen": "0.0.0.0",
      "port": 10086,
      "protocol": "vless",
      "settings": {
        "clients": [
          {
               ...
          }
        ],
        "decryption": "none"
      },
      "streamSettings": {
        "network": "xhttp",
        "xhttpSettings": {
          "path": "/path",
          "scStreamUpServerSecs": -1
        }
      }
    }
  ],
  "outbounds": [
    {
      "protocol": "freedom",
      "tag": "direct"
    }
  ],
  "routing": {}
}

ng配置:

server
{
    include nginx_svr_bind;
    server_name  abc.test.com;

    client_header_timeout 2m;
    keepalive_timeout 2m;

    location /path {
        set $upstream "grpc://xray-server:10086";
        client_max_body_size 0;
        client_body_timeout 2m;
        grpc_pass $upstream;
        grpc_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        grpc_read_timeout 10m;
        grpc_send_timeout 10m;
    }
}

@RPRX
Copy link
Member

RPRX commented Feb 9, 2025

@xxxsen packet-up 和 stream-one 呢?双端 v25.1.1 和 v24.12.31 呢?

@xxxsen
Copy link

xxxsen commented Feb 9, 2025

@xxxsen packet-up 和 stream-one 呢?双端 v25.1.1 和 v24.12.31 呢?

stream-one跟packet-up 在v25.1.30 下测试,都是能在一段时间后自动关闭链接的(server侧没有开启"scStreamUpServerSecs": -1参数)

v24.12.31, v25.1.1 测试stream-up 是没问题的,没有链接增长,在停止流量一段时间后也能正常关闭连接

@RPRX

@emiyalee1005
Copy link
Author

猜想可能是 stream-up 服务端的新行为(保活)导致连接没断开?不过按理说 POST 应该和 GET 同步断开才对

@emiyalee1005 提供的信息可能有误,因为 v25.1.1 服务端还没这个行为,你要说明 Xray 服务端、客户端分别是什么版本

你们试试双端均为 v25.1.30 时:

  1. 只用 packet-up(应该没问题)
  2. 只用 stream-one(应该也没问题)
  3. 只用 stream-up(应该有问题)
  4. 服务端设置 "scStreamUpServerSecs": -1 以关闭新行为,然后只用 stream-up

既然可能存在 bug,v25.2.9 也要 nightly 了,解决完这个再正式发版,也可能是 Nginx 哪里需要特殊设置

其实严格来讲,我是这样的连接的:
客户端(25.1.30/xhttp/stream-one)->代理服务器A(25.1.30/xhttp/auto(stream-up))->cloudflare->代理服务器B(25.1.30/xhttp,问题就出在这里)

服务器B如果用1.30就会死掉。至于25.1.1版本我记错了,我没装过,因为是pre-release。这个问题我试了3次,可以肯定切回12.31版本就没事,一升级立刻出事,期间其他所有东西都没动过。

目前我把服务器B的nginx的最大打开文件数和连接数加大后,1.30已经可以连续一天使用不挂掉了。

还有一个要提的是,我服务器B运行着一个网站(非代理,就正常静态页面),日活大概2-3k左右,这个也可能是造成 nginx连接数比较多的一个原因

@emiyalee1005
Copy link
Author

其实我改了nginx配置,现在已经正常了,正准备close issue,但是看到其他用户反馈类似问题,所以先留着或者@RPRX你看要不要关闭此issue

@emiyalee1005
Copy link
Author

emiyalee1005 commented Feb 10, 2025

  1. v25.1.1 几乎没有代码层面的改动,最相关的就是升级了 quic-go
  2. 但是你客户端 alpn 同时写三个,此时用到的是 h2 而非 h3
  3. 服务端 domain socket 还能 too many open files,不太清楚是怎么回事,可能有巨量连接开了没关?内存或磁盘满了?
  4. 因为这个一般常见于 TCP,对于 DS 我还是第一次见,你搜一下
  5. 重启服务端 Xray / Nginx 会导致 socket 全部被释放,所以能恢复正常

问题2:我这里h3跑不通,所以只能用h2,只是考虑全写了软件可以自动适配模式(期间切换其他网络万一又支持了),但是h2优先级会高于h3吗?有没类似auto模式能自适应?

@RPRX
Copy link
Member

RPRX commented Feb 10, 2025

@emiyalee1005 默认配置的 Nginx 仍会出问题?如果是的话则应当修复

@xxxsen 的说法只有新版的 stream-up 有这个问题,且与 "scStreamUpServerSecs": -1 无关,但这样的话又不该只有新版出问题

我正在检查是否客户端没有同步关闭 stream-up 和 stream-down,服务端依赖于客户端的同步关闭行为,我看看能不能不依赖

问题2:我这里h3跑不通,所以只能用h2,只是考虑全写了软件可以自动适配模式(期间切换其他网络万一又支持了),但是h2优先级会高于h3吗?有没类似auto模式能自适应?

防止引入未知 bug,尚未合并 #4320

@emiyalee1005
Copy link
Author

@emiyalee1005 默认配置的 Nginx 仍会出问题?如果是的话则应当修复

@xxxsen 的说法只有新版的 stream-up 有这个问题,且与 "scStreamUpServerSecs": -1 无关,但这样的话又不该只有新版出问题

我正在检查是否客户端没有同步关闭 stream-up 和 stream-down,服务端依赖于客户端的同步关闭行为,我看看能不能不依赖

问题2:我这里h3跑不通,所以只能用h2,只是考虑全写了软件可以自动适配模式(期间切换其他网络万一又支持了),但是h2优先级会高于h3吗?有没类似auto模式能自适应?

防止引入未知 bug,尚未合并 #4320

是,模版里那个nginx配置会出错,但是不确定是不是和我的伪装网站本身流量比较大有关系(几年前只是为了伪装,却无心插柳柳成荫)

@emiyalee1005
Copy link
Author

猜想可能是 stream-up 服务端的新行为(保活)导致连接没断开?不过按理说 POST 应该和 GET 同步断开才对
@emiyalee1005 提供的信息可能有误,因为 v25.1.1 服务端还没这个行为,你要说明 Xray 服务端、客户端分别是什么版本
你们试试双端均为 v25.1.30 时:

  1. 只用 packet-up(应该没问题)
  2. 只用 stream-one(应该也没问题)
  3. 只用 stream-up(应该有问题)
  4. 服务端设置 "scStreamUpServerSecs": -1 以关闭新行为,然后只用 stream-up

既然可能存在 bug,v25.2.9 也要 nightly 了,解决完这个再正式发版,也可能是 Nginx 哪里需要特殊设置

其实严格来讲,我是这样的连接的: 客户端(25.1.30/xhttp/stream-one)->代理服务器A(25.1.30/xhttp/auto(stream-up))->cloudflare->代理服务器B(25.1.30/xhttp,问题就出在这里)

服务器B如果用1.30就会死掉。至于25.1.1版本我记错了,我没装过,因为是pre-release。这个问题我试了3次,可以肯定切回12.31版本就没事,一升级立刻出事,期间其他所有东西都没动过。

目前我把服务器B的nginx的最大打开文件数和连接数加大后,1.30已经可以连续一天使用不挂掉了。

还有一个要提的是,我服务器B运行着一个网站(非代理,就正常静态页面),日活大概2-3k左右,这个也可能是造成 nginx连接数比较多的一个原因

补充三个不确定的问题:

  1. ZOOM会议时会间歇性断联,然后自动立刻又恢复链接
    2.H3模式貌似过不了cloudflare(A和B都是境外vps,不过墙)
    3.广电网络貌似也不支持H3?但是我用hysteria还是能用,只是1-2小时后会被间歇性阻断UDP,但是xray的h3从始至终都连不上

@RPRX
Copy link
Member

RPRX commented Feb 10, 2025

https://github.com/XTLS/Xray-examples/blob/main/VLESS-XHTTP3-Nginx/nginx.conf 这个模板?没有写最大打开文件数和连接数,我的意思是不是 Nginx 缺省值会导致出错

还有我看了下客户端应该是有同时关闭 stream-up 上下行的,不过保险起见我准备给服务端也加一个,等下你测测

@emiyalee1005
Copy link
Author

https://github.com/XTLS/Xray-examples/blob/main/VLESS-XHTTP3-Nginx/nginx.conf 这个模板?没有写最大打开文件数和连接数,我的意思是不是 Nginx 缺省值会导致出错

还有我看了下客户端应该是有同时关闭 stream-up 上下行的,不过保险起见我准备给服务端也加一个,等下你测测

nginx.service里加上,ubuntu默认就1024打开数
LimitNPROC=10000
LimitNOFILE=1000000

然后nginx.conf
user nginx;
worker_processes 4;
worker_rlimit_nofile 1000000;

error_log /var/log/nginx/error.log notice;
pid /var/run/nginx.pid;

events {
worker_connections 2048;
}

@RPRX
Copy link
Member

RPRX commented Feb 10, 2025

我又看了下,服务端也有同步关闭,但是 stream-up 如果没有等到 stream-down 可能会无限等待,且服务端新行为还会保活

但我不解的是,即使是 v25.1.30,服务端加上 "scStreamUpServerSecs": -1 就和以前的行为完全一致,按理来说不会有问题

@emiyalee1005 你把 Nginx 恢复到以前的配置然后测试一下,还有你一开始说的 v25.1.1 也有问题属实太坑了,无端浪费脑细胞

@RPRX
Copy link
Member

RPRX commented Feb 10, 2025

又又又研究了一下,stream-up 服务端同步关闭的只是 request.Body,应该并不会导致 ServeHTTP() 被 return,正在修复

它并非 v25.1.30 新引入的问题,只是 stream-up 服务端的新行为放大了它,虽然客户端代码是同步关闭 POST 和 GET

所以尚不清楚为什么这个问题会被触发,也不清楚为什么 @xxxsen 测的是即使 "scStreamUpServerSecs": -1 也有问题

但无论如何,这部分修复代码是有必要的

@RPRX RPRX closed this as completed in dcd7e92 Feb 10, 2025
@RPRX
Copy link
Member

RPRX commented Feb 10, 2025

@emiyalee1005 @xxxsen 以默认配置的 Nginx 测试 dcd7e92

@RPRX
Copy link
Member

RPRX commented Feb 10, 2025

@KobeArthurScofield 似乎 GitHub 的自动构建全挂了?能编译,但后面的其它流程出错了

@KobeArthurScofield
Copy link
Contributor

似乎 GitHub 的自动构建全挂了?能编译,但后面的其它流程出错了

Actions Cache 不知何故清空了,到后面复制其它文件找不到路径就会直接报错

@KobeArthurScofield
Copy link
Contributor

提了个 pr #4378,合并后应该可以手动更新 geodata 之类的 assets 来刷新 Cache,刷新后只要不清掉,依赖这些 assets 的 actions 就能正常运行。
不然就得等到定时刷新。

@emiyalee1005
Copy link
Author

@emiyalee1005 @xxxsen 以默认配置的 Nginx 测试 dcd7e92

频率降低了,半天出现了一次

@RPRX
Copy link
Member

RPRX commented Feb 11, 2025

@emiyalee1005 在此基础上试一下服务端配置 "scStreamUpServerSecs": -1 看看有没有差别

@emiyalee1005
Copy link
Author

@emiyalee1005 在此基础上试一下服务端配置 "scStreamUpServerSecs": -1 看看有没有差别

我改天试试

@RPRX RPRX mentioned this issue Feb 18, 2025
4 tasks
@RPRX
Copy link
Member

RPRX commented Feb 20, 2025

我发现上次改过之后还有一种情况,就是假如 stream-up 服务端只收到了 POST,由于 maybeReapSession 那里没有调用 s.uploadQueue.Close(),再加上 Read 没被调用导致 h.reader 仍为 nil 的话也 Close 不到,同理 Push 里的判断也是失效的,也就是说在这种情况下会不断允许单个 POST 连上 stream-up 服务端并被保活,我想这才是问题的关键,新的 commit 全修好了

顺便发现 XHTTP 服务端的机制允许前面的包是 packet-up、后面的包是 stream-up,当然来源也可以不同,这是什么 Switch

RPRX added a commit that referenced this issue Feb 20, 2025

Verified

This commit was created on github.com and signed with GitHub’s verified signature.
RPRX added a commit that referenced this issue Feb 20, 2025

Verified

This commit was created on github.com and signed with GitHub’s verified signature.
…verConn instead of recover()

#4373 (comment)

#4406 (comment)
RPRX added a commit that referenced this issue Feb 20, 2025

Verified

This commit was created on github.com and signed with GitHub’s verified signature.
…verConn instead of recover()

#4373 (comment)

#4406 (comment)
RPRX added a commit that referenced this issue Feb 20, 2025

Verified

This commit was created on github.com and signed with GitHub’s verified signature.
…verConn instead of recover()

#4373 (comment)

#4406 (comment)
@RPRX
Copy link
Member

RPRX commented Feb 20, 2025

@emiyalee1005 测试 b786a50

@emiyalee1005
Copy link
Author

@emiyalee1005 测试 b786a50

测试了半天,一切正常

@RPRX
Copy link
Member

RPRX commented Feb 23, 2025

@emiyalee1005 Great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants