Description
Issue:
On our live production server, the Max Coroutine limit of 6,000 reached (within one day only), after which we increased the limit to 60,000 and still it reached after two days despite the fact that we are in pilot phase where only one tester is testing the swoole based websocket (broadcasting) server.
Looks like Coroutines are leaking
1. What did you do? If possible, provide a simple script for reproducing the error.
- We have a custom process (Call it, Main Custom Process - MCP) attached to the websocket server.
- From inside this MCP, we create additional custom processes (to fetch third-party data) with Coroutine Context parameter as
True
- In order to fetch third-party data continuously after certain interval, we make use of Swoole Timer
- Inside the Swoole Timer we make use of
go()
to interact asynchronously with external sources like database and third-party APIs. - In one custom process, we also use
Http\Coroutine\Client
- In use-case of code-reload, we kill the child custom processes of MCP and then also MCP, which causes the MCP to be re-created by Swoole Manager Process which results in re-creation of the new child processes of MCP (Which is how we reload the custom processes)
- We are also using the signal
SIGCHLD
andProcess::wait()
as below, assuming it will also clear the Timers, Event Loop and Coroutines created inside child processes.
Process::signal(SIGCHLD, static function ($sig) {
while ($ret = Process::wait(true)) {
/* clean up then event loop will exit */
Timer::clearAll();
}
});
And in the onBeforeReload()
Event, we pass SIGTERM
to the child processes of MCP, and MCP as below:
$pidFiles = glob(__DIR__ . '/process_pids/*.pid');
$mainProcessData = null;
foreach ($pidFiles as $processPidFile) {
$pid = intval(shell_exec('cat ' . $processPidFile));
// We kill the Main Process manually in the End
if (strpos($processPidFile, 'MainProcess') !== false) {
$mainProcessData = [
'pidFile' => $processPidFile,
'pid' => $pid,
];
continue;
}
// Processes that do not have a timer or loop will exit automatically after completing their tasks.
// Therefore, some processes might have already terminated before reaching this point
// So here we need to check first if the process is running by passing signal_no param as 0, as per documentation
// Doc: https://wiki.swoole.com/en/#/process/process?id=kill
if (Process::kill($pid, 0)) {
output('-- Killing Process -----> ' . $processPidFile);
Process::kill($pid, SIGTERM);
}
// Delete the PID File
unlink($processPidFile);
}
// Kill the (Custom) MainProcess
if (Process::kill($mainProcessData['pid'], 0)) {
output('Killing Main Process');
Process::kill($mainProcessData['pid'], SIGTERM);
}
unlink($mainProcessData['pidFile']);
Here is our Repo
2. What did you expect to see?
No Server crash due to Max Coroutine limit exceed with almost no traffic in our production.
3. What did you see instead?
PHP Warning: Swoole\Process::start(): exceed max number of coroutine 60000 in .../swoole-serv/app/Core/Processes/MainProcess.php on line 104 PHP Warning: Swoole\Process::start(): Swoole\Timer->onTimeout handler error in .../swoole-serv/app/Core/Processes/MainProcess.php on line 104
Where MainProcess.php
file contains code for creating the child processes of MCP
4. What version of Swoole are you using (show your php --ri swoole
)?
swoole
Swoole => enabled
Author => Swoole Team <team@swoole.com>
Version => 5.1.5
Built => Nov 14 2024 13:43:57
coroutine => enabled with boost asm context
epoll => enabled
eventfd => enabled
signalfd => enabled
cpu_affinity => enabled
spinlock => enabled
rwlock => enabled
sockets => enabled
openssl => OpenSSL 3.0.13 30 Jan 2024
dtls => enabled
http2 => enabled
json => enabled
curl-native => enabled
pcre => enabled
c-ares => 1.27.0
zlib => 1.3
brotli => E16781312/D16781312
mutex_timedlock => enabled
pthread_barrier => enabled
futex => enabled
mysqlnd => enabled
async_redis => enabled
coroutine_pgsql => enabled
Directive => Local Value => Master Value
swoole.enable_coroutine => On => On
swoole.enable_library => On => On
swoole.enable_fiber_mock => Off => Off
swoole.enable_preemptive_scheduler => On => On
swoole.display_errors => On => On
swoole.use_shortname => On => On
swoole.unixsock_buffer_size => 8388608 => 8388608
5. What is your machine environment used (show your uname -a
& php -v
& gcc -v
) ?
uname -a
Linux 6.8.0-1016-oracle #17-Ubuntu SMP Wed Nov 6 23:01:02 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
php -v
PHP 8.3.13 (cli) (built: Oct 30 2024 11:28:41) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.3.13, Copyright (c) Zend Technologies
with Zend OPcache v8.3.13, Copyright (c), by Zend Technologies
gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-linux-gnu/13/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 13.3.0-6ubuntu2~24.04' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-13 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-libstdcxx-backtrace --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04)