Skip to content

Coroutines are leaking #5618

Open
Open
@mohsin-devdksa

Description

@mohsin-devdksa

Issue:

On our live production server, the Max Coroutine limit of 6,000 reached (within one day only), after which we increased the limit to 60,000 and still it reached after two days despite the fact that we are in pilot phase where only one tester is testing the swoole based websocket (broadcasting) server.

Looks like Coroutines are leaking

1. What did you do? If possible, provide a simple script for reproducing the error.

  • We have a custom process (Call it, Main Custom Process - MCP) attached to the websocket server.
  • From inside this MCP, we create additional custom processes (to fetch third-party data) with Coroutine Context parameter as True
  • In order to fetch third-party data continuously after certain interval, we make use of Swoole Timer
  • Inside the Swoole Timer we make use of go() to interact asynchronously with external sources like database and third-party APIs.
  • In one custom process, we also use Http\Coroutine\Client
  • In use-case of code-reload, we kill the child custom processes of MCP and then also MCP, which causes the MCP to be re-created by Swoole Manager Process which results in re-creation of the new child processes of MCP (Which is how we reload the custom processes)
  • We are also using the signal SIGCHLD and Process::wait() as below, assuming it will also clear the Timers, Event Loop and Coroutines created inside child processes.
Process::signal(SIGCHLD, static function ($sig) {

            while ($ret = Process::wait(true)) {
                /* clean up then event loop will exit */
                Timer::clearAll();
            }
});

And in the onBeforeReload() Event, we pass SIGTERM to the child processes of MCP, and MCP as below:

$pidFiles = glob(__DIR__ . '/process_pids/*.pid');

$mainProcessData = null;

foreach ($pidFiles as $processPidFile) {
    $pid = intval(shell_exec('cat ' . $processPidFile));
    
    // We kill the Main Process manually in the End
    if (strpos($processPidFile, 'MainProcess') !== false) {
        $mainProcessData = [
            'pidFile' => $processPidFile,
            'pid' => $pid,
        ];

        continue;
    }

    // Processes that do not have a timer or loop will exit automatically after completing their tasks.
    // Therefore, some processes might have already terminated before reaching this point
    // So here we need to check first if the process is running by passing signal_no param as 0, as per documentation
    // Doc: https://wiki.swoole.com/en/#/process/process?id=kill
    if (Process::kill($pid, 0)) {
        output('-- Killing Process -----> ' . $processPidFile);
        Process::kill($pid, SIGTERM);
    }

    // Delete the PID File
    unlink($processPidFile);
}

// Kill the (Custom) MainProcess
if (Process::kill($mainProcessData['pid'], 0)) {
    output('Killing Main Process');
    Process::kill($mainProcessData['pid'], SIGTERM);
}

unlink($mainProcessData['pidFile']);

Here is our Repo

2. What did you expect to see?

No Server crash due to Max Coroutine limit exceed with almost no traffic in our production.

3. What did you see instead?

PHP Warning: Swoole\Process::start(): exceed max number of coroutine 60000 in .../swoole-serv/app/Core/Processes/MainProcess.php on line 104 PHP Warning: Swoole\Process::start(): Swoole\Timer->onTimeout handler error in .../swoole-serv/app/Core/Processes/MainProcess.php on line 104
Where MainProcess.php file contains code for creating the child processes of MCP

4. What version of Swoole are you using (show your php --ri swoole)?

swoole

Swoole => enabled
Author => Swoole Team <team@swoole.com>
Version => 5.1.5
Built => Nov 14 2024 13:43:57
coroutine => enabled with boost asm context
epoll => enabled
eventfd => enabled
signalfd => enabled
cpu_affinity => enabled
spinlock => enabled
rwlock => enabled
sockets => enabled
openssl => OpenSSL 3.0.13 30 Jan 2024
dtls => enabled
http2 => enabled
json => enabled
curl-native => enabled
pcre => enabled
c-ares => 1.27.0
zlib => 1.3
brotli => E16781312/D16781312
mutex_timedlock => enabled
pthread_barrier => enabled
futex => enabled
mysqlnd => enabled
async_redis => enabled
coroutine_pgsql => enabled

Directive => Local Value => Master Value
swoole.enable_coroutine => On => On
swoole.enable_library => On => On
swoole.enable_fiber_mock => Off => Off
swoole.enable_preemptive_scheduler => On => On
swoole.display_errors => On => On
swoole.use_shortname => On => On
swoole.unixsock_buffer_size => 8388608 => 8388608

5. What is your machine environment used (show your uname -a & php -v & gcc -v) ?

uname -a

Linux 6.8.0-1016-oracle #17-Ubuntu SMP Wed Nov  6 23:01:02 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

php -v

PHP 8.3.13 (cli) (built: Oct 30 2024 11:28:41) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.3.13, Copyright (c) Zend Technologies
    with Zend OPcache v8.3.13, Copyright (c), by Zend Technologies

gcc -v

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-linux-gnu/13/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 13.3.0-6ubuntu2~24.04' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-13 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-libstdcxx-backtrace --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-13-fG75Ri/gcc-13-13.3.0/debian/tmp-gcn/usr --enable-offload-defaulted --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04) 

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions