Add shard connection backoff policy #473

dkropachev · 2025-05-30T08:03:57Z

Introduce ShardReconnectionPolicy and its implementations:

NoDelayShardConnectionBackoffPolicy: no delay or concurrency limit, ensures at most one pending connection per host+shard.
LimitedConcurrencyShardConnectionBackoffPolicy: limits pending concurrent connections to max_concurrent per host with backoff between shard connections.

The idea of this PR is to shift responsibility of scheduling HostConnection._open_connection_to_missing_shard from HostConnection to ShardConnectionBackoffPolicy, that gives ShardConnectionBackoffPolicy control over process of opening connections.

This feature enables finer control over process of creating per shard connections, helping to prevent connections storms.

Fixes: #483

Solutions tested and rejected

Naive delay

Description

Policy would introduce a delay instead of executing connection creation request right away.
Policy would remember last time when connection creation was scheduled to and when it tries to schedule next request it would make sure that there is time between old and new request execution is equal or more than delay it is configured with.

Results

It worked fine when cluster operates in a normal way.

However, during testing with artificial delays, it became clear that this approach breaks down when the time to establish a
connection exceeds the configured delay.
In such cases, connections begin to pile up - the greater the connection initialization time relative to the delay, the faster they accumulate.

This becomes especially problematic during connection storms.
As the cluster becomes overloaded and connection initialization slows down, the delay-based throttling loses its effectiveness. In other words, the more the cluster suffers, the less effective the policy becomes.

Solution

The solution was to give the policy direct control over the connection initialization process.
This allows the policy to track how many connections are currently pending and apply delays after connections are created, rather than before.
That change ensures the policy remains effective even under heavy load.

This behavior is exactly what has been implemented in this PR.

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.
I added appropriate Fixes: annotations to PR description.

mykaul · 2025-06-05T06:57:15Z

Shouldn't we have some warning / info level log when backoff is taking place?

dkropachev · 2025-06-05T10:26:00Z

Shouldn't we have some warning / info level log when backoff is taking place?

I would rather not do it, it is not useful and can potentially pollute the log

Lorak-mmk · 2025-06-06T10:41:09Z

Do you know what caused the test failure?

  =================================== FAILURES ===================================
  ___________________________ TypeTests.test_datetype ____________________________
  
  self = <tests.unit.test_types.TypeTests testMethod=test_datetype>
  
      def test_datetype(self):
          now_time_seconds = time.time()
          now_datetime = datetime.datetime.fromtimestamp(now_time_seconds, tz=datetime.timezone.utc)
      
          # Cassandra timestamps in millis
          now_timestamp = now_time_seconds * 1e3
      
          # same results serialized
  >       self.assertEqual(DateType.serialize(now_datetime, 0), DateType.serialize(now_timestamp, 0))
  E       AssertionError: b'\x00\x00\x01\x97<\x17\xda\xf9' != b'\x00\x00\x01\x97<\x17\xda\xf8'

it is a unit test that at the first glance should be fully deterministic. Failure is unexpected.
From the assertion it looks like some off-by-one error.

dkropachev · 2025-06-06T10:44:03Z

Do you know what caused the test failure?

  =================================== FAILURES ===================================
  ___________________________ TypeTests.test_datetype ____________________________
  
  self = <tests.unit.test_types.TypeTests testMethod=test_datetype>
  
      def test_datetype(self):
          now_time_seconds = time.time()
          now_datetime = datetime.datetime.fromtimestamp(now_time_seconds, tz=datetime.timezone.utc)
      
          # Cassandra timestamps in millis
          now_timestamp = now_time_seconds * 1e3
      
          # same results serialized
  >       self.assertEqual(DateType.serialize(now_datetime, 0), DateType.serialize(now_timestamp, 0))
  E       AssertionError: b'\x00\x00\x01\x97<\x17\xda\xf9' != b'\x00\x00\x01\x97<\x17\xda\xf8'

it is a unit test that at the first glance should be fully deterministic. Failure is unexpected. From the assertion it looks like some off-by-one error.

It is known issue, conversion goes wrong somewhere

cassandra/cluster.py

cassandra/policies.py

Lorak-mmk

General comment: integration tests for new policies are definitely needed here.

cassandra/policies.py

tests/unit/test_policies.py

cassandra/policies.py

cassandra/cluster.py

mykaul · 2025-06-15T11:30:27Z

The patchset lacks documentation, which would have helped to understand the feature and when/how to use it. Is documentation a separate repo / commit?

cassandra/policies.py

dkropachev · 2025-06-17T19:51:49Z

The patchset lacks documentation, which would have helped to understand the feature and when/how to use it. Is documentation a separate repo / commit?

I have added documentation to all classes.
Way it is done here in repo, small features are documented at the docstring of the classes, big ones get .rst in the docs/.
I personally think it is a small one, so it has no separate file in the docs/, let me know if you want to see one.

mykaul · 2025-06-18T08:30:03Z

The patchset lacks documentation, which would have helped to understand the feature and when/how to use it. Is documentation a separate repo / commit?

I have added documentation to all classes. Way it is done here in repo, small features are documented at the docstring of the classes, big ones get .rst in the docs/. I personally think it is a small one, so it has no separate file in the docs/, let me know if you want to see one.

I don't think it's such a small feature, and I think details might be missing. I did skim briefly over the code - so I might have missed it - where's the random jitter discussed, so multiple clients when do a concurrent backoff? (again - may have missed it!)

dkropachev · 2025-06-18T10:31:36Z

The patchset lacks documentation, which would have helped to understand the feature and when/how to use it. Is documentation a separate repo / commit?

I have added documentation to all classes. Way it is done here in repo, small features are documented at the docstring of the classes, big ones get .rst in the docs/. I personally think it is a small one, so it has no separate file in the docs/, let me know if you want to see one.

I don't think it's such a small feature, and I think details might be missing. I did skim briefly over the code - so I might have missed it - where's the random jitter discussed, so multiple clients when do a concurrent backoff? (again - may have missed it!)

ok, I will add it, jitter comes from ExponentialReconnectionPolicy or from ConstantShardConnectionBackoffSchedule

cassandra/policies.py

cassandra/cluster.py

tests/integration/long/test_policies.py

cassandra/policies.py

tests/unit/test_shard_aware.py

dkropachev · 2025-07-03T06:05:58Z

@Lorak-mmk , done, all comments addressed please take a look

Lorak-mmk

It looks much better now, especially documentation-wise!
It would be good to describe this new policy in docs/ if we want people to use it.
Before merging it would be great to run some real-world scenario and see if new policy can help with cluster overload. Is that something that could be done with SCT?

Note: I did not yet read "LimitedConcurrencyShardConnectionBackoffPolicy". I'll have a few more comments there.

Lorak-mmk · 2025-07-03T11:57:20Z

cassandra/cluster.py

    def schedule(self, delay, fn, *args, **kwargs):
+        if self.is_shutdown:
+            return
        if delay:
            self._insert_task(delay, (fn, args, tuple(kwargs.items())))


Commit: "Make _Scheduler that has been shutdown ignore schedule requests "

Maybe it would be a better behavior to throw in such case, to let the calling code know about the issue, and e.g. perform some graceful shutdown?

Shutdown is done gracefully, just new requests that could be scheduled during shutdown are ignored.
Don't see any reason to allow it to schedule more, when scheduler is shutdown it means that cluster has been shutdown.
So, user is done.

Beside that, we already have illusive issue that we can't catch: #209
Why contribute to this problem even more?

Lorak-mmk · 2025-07-03T13:14:25Z

cassandra/policies.py

+class _NoDelayShardConnectionBackoffScheduler(ShardConnectionScheduler):
+    """
+    A scheduler for ``cassandra.policies.NoDelayShardConnectionBackoffPolicy``.
+
+    A shard connection backoff policy with no delay between attempts.
+    Ensures that at most one pending request connection per (host, shard) pair.
+    If connection attempts for the same (host, shard) it is silently dropped.
+    """
+
+    scheduler: _Scheduler
+    already_scheduled: set[tuple[str, int]]
+    lock: Lock
+    is_shutdown: bool = False
+
+    def __init__(self, scheduler: _Scheduler):
+        self.scheduler = scheduler
+        self.already_scheduled = set()
+        self.lock = Lock()
+
+    def _execute(
+            self,
+            host_id: str,
+            shard_id: int,
+            method: Callable[[], None],
+    ) -> None:
+        if self.is_shutdown:
+            return
+        try:
+            method()
+        finally:
+            with self.lock:
+                self.already_scheduled.remove((host_id, shard_id))
+
+    def schedule(
+            self,
+            host_id: str,
+            shard_id: int,
+            method: Callable[[], None],
+    ) -> bool:
+        with self.lock:
+            if self.is_shutdown or (host_id, shard_id) in self.already_scheduled:
+                return False
+            self.already_scheduled.add((host_id, shard_id))
+
+        self.scheduler.schedule(0, self._execute, host_id, shard_id, method)
+        return True
+
+    def shutdown(self):
+        with self.lock:
+            self.is_shutdown = True
+


In further commits you modify scheduler to do nothing on shutdown.
Here you also perform those checks. Do we need this redundancy?

It still looks to me like the better solution is to handle shutdown in scheduler, by throwing when trying to schedule new task (and erroring out already scheduled tasks if possible).

It is pretty consistent with rest of the code: Cluster, Session, HostConnection, everything that holds live data does the same.
Do not see any problem doing it here too.

_Scheduler is defined on cluster level, while ShardConnectionScheduler on session level, so when ShardConnectionScheduler is getting shut down _Scheduler can still be operational.
I don't see any reason to throw exception here, missed attempt to create connection on cluster shutdown is not big deal.

cassandra/policies.py

Lorak-mmk · 2025-07-03T13:42:05Z

cassandra/policies.py

+    ) -> None:
+        if self.is_shutdown:
+            return
+        try:
+            method()
+        finally:
+            with self.lock:
+                self.already_scheduled.remove((host_id, shard_id))
+
+    def schedule(
+            self,
+            host_id: str,
+            shard_id: int,
+            method: Callable[[], None],
+    ) -> bool:
+        with self.lock:
+            if self.is_shutdown or (host_id, shard_id) in self.already_scheduled:
+                return False
+            self.already_scheduled.add((host_id, shard_id))
+


I don't know exact semantics of method with regards for e.g. error handling, but I suspect there may be a race possible.
Let's assume that method handles errors, and thus schedules re-connection in case the connection fails.

schedule called for some shard

After some time, _execute is called

method is called a part of it. Connection fails, and thus method calls schedule again internally.

schedule rejects the request, because it is already pending.

Is that an issue?

It is not the issue, first, it is not happening, second HostConnection is responsible to find if it needs connection and schedule request to open it, if request fails same logic will be triggered again.

Lorak-mmk · 2025-07-03T13:45:19Z

tests/unit/test_shard_aware.py

+    def test_shard_aware_reconnection_policy_no_delay(self):
+        # with NoDelayReconnectionPolicy all the connections should be created right away
+        self._test_shard_aware_reconnection_policy(4, NoDelayShardConnectionBackoffPolicy(), 4)
+
+    def _test_shard_aware_reconnection_policy(self, shard_count, shard_connection_backoff_policy, expected_connections):
        """
        Test that on given a `shard_aware_port` on the OPTIONS message (ShardInfo class)
-        the next connections would be open using this port
+        It checks that:
+        1. Next connections are opened using this port
+        2. Connection creation pase matches `shard_connection_backoff_policy`


Commit: "Introduce NoDelayShardConnectionBackoffPolicy "

You introduced changes to a unit test. It now verifies that connection policy is used.
Is that that right commits for those changes? Connection policy is only integrated into driver few commits later, so I expect this unit test to fail in this commit. But I may be misunderstanding something of course! If that is the case, let me know.

You are correct, it fails, moved it to integration commit.

Lorak-mmk · 2025-07-03T13:54:30Z

cassandra/policies.py

+    @abstractmethod
+    def schedule(
+            self,
+            host_id: str,
+            shard_id: int,
+            method: Callable[[], None],
+    ) -> bool:


What will be shard_id for C* clusters? Will it be set to 0, or will be (contrary to type hint), a None?
Could you point me to the place in the code responsible for this?

This API works only for scylla, when sharding information is present, in rest of the cases it is not used.

List of places where it is called:

python-driver/cassandra/pool.py

Lines 488 to 489 in a83038c

self._session.shard_connection_backoff_scheduler.schedule(

self.host.host_id, shard_id, partial(self._open_connection_to_missing_shard, shard_id))

python-driver/cassandra/pool.py

Lines 499 to 500 in a83038c

self._session.shard_connection_backoff_scheduler.schedule(

self.host.host_id, shard_id, partial(self._open_connection_to_missing_shard, shard_id))

python-driver/cassandra/pool.py

Lines 610 to 611 in a83038c

self._session.shard_connection_backoff_scheduler.schedule(

self.host.host_id, connection.features.shard_id, partial(self._open_connection_to_missing_shard, connection.features.shard_id))

python-driver/cassandra/pool.py

Lines 853 to 854 in a83038c

self._session.shard_connection_backoff_scheduler.schedule(

self.host.host_id, shard_id, partial(self._open_connection_to_missing_shard, shard_id))

cassandra/policies.py

Lorak-mmk · 2025-07-03T14:32:22Z

tests/integration/long/test_policies.py


 def setup_module():
+    os.environ['SCYLLA_EXT_OPTS'] = "--smp 4"
    use_cluster('test_cluster', [4])



This is exactly why CCM taking such parameters through env is absolutely abysmal choice. It gives us absolutely no hope of ever running tests concurrently.

True, let's address it in CCM and later fix it in our CICD

Lorak-mmk · 2025-07-03T14:34:43Z

tests/integration/long/test_policies.py

+
+        # Since scheduled calls executed in a separate thread we need to give them some time to complete
+        time.sleep(0.2)
+


:(

Can we get rid of it too?
I see your are using a mocked scheduler - perhaps we can use it to "move time forward" and run things instantly when we want?

I can't remove sleep completely, but now it is conditional

Commit introduces two abstract classes: 1. `ShardConnectionBackoffPolicy` - a base class for policy that controls pase of shard connections creation 2. Auxiliary `ShardConnectionScheduler` - a scheduler that is instatiated by `ShardConnectionBackoffPolicy` at session initialization

This policy is implementation of ShardConnectionBackoffPolicy. It implements same behavior that driver currently has: 1. No delay between creating shard connections 2. It avoids creating multiple connections to same host_id, shard_id

This is required by upcoming LimitedConcurrencyShardConnectionBackoffPolicy.

There is no reason to accept schedule requests when cluster is shutting down.

Add code that integrates ShardConnectionBackoffPolicy into: 1. Cluster 2. Session 3. HostConnection Main idea is to put ShardConnectionBackoffPolicy in control of shard connection creation proccess. Removing duplicate logic from HostConnection that tracks pending connection creation requests.

This policy is an implementation of `ShardConnectionBackoffPolicy`. Its primary purpose is to prevent connection storms by imposing restrictions on the number of concurrent pending connections per host and backoff time between each connection attempt.

Tests cover: 1. LimitedConcurrencyShardConnectionBackoffPolicy 2. NoDelayShardConnectionBackoffPolicy For both Scylla and Cassandra backend.

dkropachev · 2025-07-04T03:37:47Z

It looks much better now, especially documentation-wise! It would be good to describe this new policy in docs/ if we want people to use it.

Done, added section to docs/faq.rst

Before merging it would be great to run some real-world scenario and see if new policy can help with cluster overload. Is that something that could be done with SCT?

There is no python loader there, but we can emulate this issue locally, no need to run it on cloud, only difference is to overload real cluster you need way more connections.

Sole goal of `ShardConnectionBackoffPolicy` existance is to fight connection storms. So, this commit adds connection storms section to `docs/faq.rst`

dkropachev force-pushed the dk/add-connection-pool-delay branch 4 times, most recently from 0b80886 to f62dfa3 Compare June 3, 2025 03:42

dkropachev changed the title 1 Add shard-aware reconnection policies with support for scheduling constraints Jun 3, 2025

dkropachev requested a review from Lorak-mmk June 3, 2025 03:45

dkropachev marked this pull request as ready for review June 3, 2025 03:45

dkropachev mentioned this pull request Jun 4, 2025

Delay for per-shard reconnection #483

Open

dkropachev force-pushed the dk/add-connection-pool-delay branch 2 times, most recently from dbb3ad1 to cbb4719 Compare June 4, 2025 17:53

Lorak-mmk requested changes Jun 6, 2025

View reviewed changes

dkropachev force-pushed the dk/add-connection-pool-delay branch 4 times, most recently from a43ccd1 to b0fd069 Compare June 7, 2025 04:47

dkropachev requested a review from Lorak-mmk June 7, 2025 04:48

dkropachev force-pushed the dk/add-connection-pool-delay branch 2 times, most recently from f47313f to 9dfd9ec Compare June 13, 2025 06:20

Lorak-mmk requested changes Jun 13, 2025

View reviewed changes

dkropachev force-pushed the dk/add-connection-pool-delay branch 2 times, most recently from aebc540 to 61668de Compare June 13, 2025 17:58

dkropachev requested a review from Lorak-mmk June 13, 2025 18:02

dkropachev self-assigned this Jun 13, 2025

mykaul reviewed Jun 15, 2025

View reviewed changes

cassandra/policies.py Outdated Show resolved Hide resolved

mykaul reviewed Jun 15, 2025

View reviewed changes

cassandra/policies.py Outdated Show resolved Hide resolved

dkropachev changed the title ~~Add shard-aware reconnection policies with support for scheduling constraints~~ Add shard connection backoff policy Jun 17, 2025

dkropachev force-pushed the dk/add-connection-pool-delay branch from 806aba9 to 2584555 Compare June 17, 2025 15:51

dkropachev force-pushed the dk/add-connection-pool-delay branch from 2584555 to 8f3670e Compare June 18, 2025 10:31

Lorak-mmk requested changes Jun 22, 2025

View reviewed changes

dkropachev force-pushed the dk/add-connection-pool-delay branch 2 times, most recently from 37465f4 to 40dc7b6 Compare July 3, 2025 06:05

dkropachev requested a review from Lorak-mmk July 3, 2025 06:05

dkropachev requested a review from mykaul July 3, 2025 06:06

dkropachev force-pushed the dk/add-connection-pool-delay branch from 40dc7b6 to 3d97ecd Compare July 3, 2025 06:38

Lorak-mmk requested changes Jul 3, 2025

View reviewed changes

dkropachev force-pushed the dk/add-connection-pool-delay branch from 3d97ecd to 06f19e3 Compare July 3, 2025 23:06

dkropachev added 7 commits July 3, 2025 19:53

Introduce NoDelayShardConnectionBackoffPolicy

d1f19fb

This policy is implementation of ShardConnectionBackoffPolicy. It implements same behavior that driver currently has: 1. No delay between creating shard connections 2. It avoids creating multiple connections to same host_id, shard_id

Make _Scheduler submit right away if delay is 0

568c2d9

This is required by upcoming LimitedConcurrencyShardConnectionBackoffPolicy.

Make _Scheduler that has been shutdown ignore schedule requests

a7294c0

There is no reason to accept schedule requests when cluster is shutting down.

Implementa integration tests for shard connection backof policies

f71e7c9

Tests cover: 1. LimitedConcurrencyShardConnectionBackoffPolicy 2. NoDelayShardConnectionBackoffPolicy For both Scylla and Cassandra backend.

dkropachev force-pushed the dk/add-connection-pool-delay branch from 06f19e3 to f71e7c9 Compare July 3, 2025 23:53

dkropachev requested a review from Lorak-mmk July 3, 2025 23:53

dkropachev force-pushed the dk/add-connection-pool-delay branch 3 times, most recently from fa82cdd to 41b5ea8 Compare July 4, 2025 13:56

Add connection storms documentation

088053b

Sole goal of `ShardConnectionBackoffPolicy` existance is to fight connection storms. So, this commit adds connection storms section to `docs/faq.rst`

dkropachev force-pushed the dk/add-connection-pool-delay branch from 41b5ea8 to 088053b Compare July 4, 2025 14:08

	self._session.shard_connection_backoff_scheduler.schedule(
	self.host.host_id, shard_id, partial(self._open_connection_to_missing_shard, shard_id))

	self._session.shard_connection_backoff_scheduler.schedule(
	self.host.host_id, connection.features.shard_id, partial(self._open_connection_to_missing_shard, connection.features.shard_id))


		# Since scheduled calls executed in a separate thread we need to give them some time to complete
		time.sleep(0.2)

Add shard connection backoff policy #473

Are you sure you want to change the base?

Add shard connection backoff policy #473

Uh oh!

Conversation

dkropachev commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Solutions tested and rejected

Naive delay

Description

Results

Solution

Pre-review checklist

Uh oh!

mykaul commented Jun 5, 2025

Uh oh!

dkropachev commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lorak-mmk commented Jun 6, 2025

Uh oh!

dkropachev commented Jun 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Lorak-mmk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mykaul commented Jun 15, 2025

Uh oh!

Uh oh!

Uh oh!

dkropachev commented Jun 17, 2025

Uh oh!

mykaul commented Jun 18, 2025

Uh oh!

dkropachev commented Jun 18, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dkropachev commented Jul 3, 2025

Uh oh!

Lorak-mmk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkropachev Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkropachev Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dkropachev commented May 30, 2025 •

edited

Loading

dkropachev commented Jun 5, 2025 •

edited

Loading

dkropachev Jul 3, 2025 •

edited

Loading

dkropachev Jul 3, 2025 •

edited

Loading

dkropachev Jul 3, 2025 •

edited

Loading