-
Notifications
You must be signed in to change notification settings - Fork 48
Add shard connection backoff policy #473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add shard connection backoff policy #473
Conversation
0b80886
to
f62dfa3
Compare
dbb3ad1
to
cbb4719
Compare
Shouldn't we have some warning / info level log when backoff is taking place? |
I would rather not do it, it is not useful and can potentially pollute the log |
Do you know what caused the test failure?
it is a unit test that at the first glance should be fully deterministic. Failure is unexpected. |
It is known issue, conversion goes wrong somewhere |
a43ccd1
to
b0fd069
Compare
f47313f
to
9dfd9ec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment: integration tests for new policies are definitely needed here.
aebc540
to
61668de
Compare
The patchset lacks documentation, which would have helped to understand the feature and when/how to use it. Is documentation a separate repo / commit? |
806aba9
to
2584555
Compare
I have added documentation to all classes. |
I don't think it's such a small feature, and I think details might be missing. I did skim briefly over the code - so I might have missed it - where's the random jitter discussed, so multiple clients when do a concurrent backoff? (again - may have missed it!) |
2584555
to
8f3670e
Compare
ok, I will add it, jitter comes from |
37465f4
to
40dc7b6
Compare
@Lorak-mmk , done, all comments addressed please take a look |
40dc7b6
to
3d97ecd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks much better now, especially documentation-wise!
It would be good to describe this new policy in docs/ if we want people to use it.
Before merging it would be great to run some real-world scenario and see if new policy can help with cluster overload. Is that something that could be done with SCT?
Note: I did not yet read "LimitedConcurrencyShardConnectionBackoffPolicy". I'll have a few more comments there.
cassandra/cluster.py
Outdated
def schedule(self, delay, fn, *args, **kwargs): | ||
if self.is_shutdown: | ||
return | ||
if delay: | ||
self._insert_task(delay, (fn, args, tuple(kwargs.items()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit: "Make _Scheduler that has been shutdown ignore schedule requests "
Maybe it would be a better behavior to throw in such case, to let the calling code know about the issue, and e.g. perform some graceful shutdown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shutdown is done gracefully, just new requests that could be scheduled during shutdown are ignored.
Don't see any reason to allow it to schedule more, when scheduler is shutdown it means that cluster has been shutdown.
So, user is done.
Beside that, we already have illusive issue that we can't catch: #209
Why contribute to this problem even more?
class _NoDelayShardConnectionBackoffScheduler(ShardConnectionScheduler): | ||
""" | ||
A scheduler for ``cassandra.policies.NoDelayShardConnectionBackoffPolicy``. | ||
|
||
A shard connection backoff policy with no delay between attempts. | ||
Ensures that at most one pending request connection per (host, shard) pair. | ||
If connection attempts for the same (host, shard) it is silently dropped. | ||
""" | ||
|
||
scheduler: _Scheduler | ||
already_scheduled: set[tuple[str, int]] | ||
lock: Lock | ||
is_shutdown: bool = False | ||
|
||
def __init__(self, scheduler: _Scheduler): | ||
self.scheduler = scheduler | ||
self.already_scheduled = set() | ||
self.lock = Lock() | ||
|
||
def _execute( | ||
self, | ||
host_id: str, | ||
shard_id: int, | ||
method: Callable[[], None], | ||
) -> None: | ||
if self.is_shutdown: | ||
return | ||
try: | ||
method() | ||
finally: | ||
with self.lock: | ||
self.already_scheduled.remove((host_id, shard_id)) | ||
|
||
def schedule( | ||
self, | ||
host_id: str, | ||
shard_id: int, | ||
method: Callable[[], None], | ||
) -> bool: | ||
with self.lock: | ||
if self.is_shutdown or (host_id, shard_id) in self.already_scheduled: | ||
return False | ||
self.already_scheduled.add((host_id, shard_id)) | ||
|
||
self.scheduler.schedule(0, self._execute, host_id, shard_id, method) | ||
return True | ||
|
||
def shutdown(self): | ||
with self.lock: | ||
self.is_shutdown = True | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In further commits you modify scheduler to do nothing on shutdown.
Here you also perform those checks. Do we need this redundancy?
It still looks to me like the better solution is to handle shutdown in scheduler, by throwing when trying to schedule new task (and erroring out already scheduled tasks if possible).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is pretty consistent with rest of the code: Cluster
, Session
, HostConnection
, everything that holds live data does the same.
Do not see any problem doing it here too.
_Scheduler
is defined on cluster level, while ShardConnectionScheduler
on session level, so when ShardConnectionScheduler
is getting shut down _Scheduler
can still be operational.
I don't see any reason to throw exception here, missed attempt to create connection on cluster shutdown is not big deal.
) -> None: | ||
if self.is_shutdown: | ||
return | ||
try: | ||
method() | ||
finally: | ||
with self.lock: | ||
self.already_scheduled.remove((host_id, shard_id)) | ||
|
||
def schedule( | ||
self, | ||
host_id: str, | ||
shard_id: int, | ||
method: Callable[[], None], | ||
) -> bool: | ||
with self.lock: | ||
if self.is_shutdown or (host_id, shard_id) in self.already_scheduled: | ||
return False | ||
self.already_scheduled.add((host_id, shard_id)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know exact semantics of method
with regards for e.g. error handling, but I suspect there may be a race possible.
Let's assume that method
handles errors, and thus schedules re-connection in case the connection fails.
schedule
called for some shard- After some time,
_execute
is called method
is called a part of it. Connection fails, and thusmethod
callsschedule
again internally.schedule
rejects the request, because it is already pending.
Is that an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not the issue, first, it is not happening, second HostConnection
is responsible to find if it needs connection and schedule request to open it, if request fails same logic will be triggered again.
tests/unit/test_shard_aware.py
Outdated
def test_shard_aware_reconnection_policy_no_delay(self): | ||
# with NoDelayReconnectionPolicy all the connections should be created right away | ||
self._test_shard_aware_reconnection_policy(4, NoDelayShardConnectionBackoffPolicy(), 4) | ||
|
||
def _test_shard_aware_reconnection_policy(self, shard_count, shard_connection_backoff_policy, expected_connections): | ||
""" | ||
Test that on given a `shard_aware_port` on the OPTIONS message (ShardInfo class) | ||
the next connections would be open using this port | ||
It checks that: | ||
1. Next connections are opened using this port | ||
2. Connection creation pase matches `shard_connection_backoff_policy` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit: "Introduce NoDelayShardConnectionBackoffPolicy "
You introduced changes to a unit test. It now verifies that connection policy is used.
Is that that right commits for those changes? Connection policy is only integrated into driver few commits later, so I expect this unit test to fail in this commit. But I may be misunderstanding something of course! If that is the case, let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, it fails, moved it to integration commit.
@abstractmethod | ||
def schedule( | ||
self, | ||
host_id: str, | ||
shard_id: int, | ||
method: Callable[[], None], | ||
) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will be shard_id
for C* clusters? Will it be set to 0, or will be (contrary to type hint), a None?
Could you point me to the place in the code responsible for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API works only for scylla, when sharding information is present, in rest of the cases it is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List of places where it is called:
python-driver/cassandra/pool.py
Lines 488 to 489 in a83038c
self._session.shard_connection_backoff_scheduler.schedule( | |
self.host.host_id, shard_id, partial(self._open_connection_to_missing_shard, shard_id)) |
python-driver/cassandra/pool.py
Lines 499 to 500 in a83038c
self._session.shard_connection_backoff_scheduler.schedule( | |
self.host.host_id, shard_id, partial(self._open_connection_to_missing_shard, shard_id)) |
python-driver/cassandra/pool.py
Lines 610 to 611 in a83038c
self._session.shard_connection_backoff_scheduler.schedule( | |
self.host.host_id, connection.features.shard_id, partial(self._open_connection_to_missing_shard, connection.features.shard_id)) |
python-driver/cassandra/pool.py
Lines 853 to 854 in a83038c
self._session.shard_connection_backoff_scheduler.schedule( | |
self.host.host_id, shard_id, partial(self._open_connection_to_missing_shard, shard_id)) |
|
||
def setup_module(): | ||
os.environ['SCYLLA_EXT_OPTS'] = "--smp 4" | ||
use_cluster('test_cluster', [4]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is exactly why CCM taking such parameters through env is absolutely abysmal choice. It gives us absolutely no hope of ever running tests concurrently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, let's address it in CCM and later fix it in our CICD
|
||
# Since scheduled calls executed in a separate thread we need to give them some time to complete | ||
time.sleep(0.2) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:(
Can we get rid of it too?
I see your are using a mocked scheduler - perhaps we can use it to "move time forward" and run things instantly when we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remove sleep completely, but now it is conditional
3d97ecd
to
06f19e3
Compare
Commit introduces two abstract classes: 1. `ShardConnectionBackoffPolicy` - a base class for policy that controls pase of shard connections creation 2. Auxiliary `ShardConnectionScheduler` - a scheduler that is instatiated by `ShardConnectionBackoffPolicy` at session initialization
This policy is implementation of ShardConnectionBackoffPolicy. It implements same behavior that driver currently has: 1. No delay between creating shard connections 2. It avoids creating multiple connections to same host_id, shard_id
This is required by upcoming LimitedConcurrencyShardConnectionBackoffPolicy.
There is no reason to accept schedule requests when cluster is shutting down.
Add code that integrates ShardConnectionBackoffPolicy into: 1. Cluster 2. Session 3. HostConnection Main idea is to put ShardConnectionBackoffPolicy in control of shard connection creation proccess. Removing duplicate logic from HostConnection that tracks pending connection creation requests.
This policy is an implementation of `ShardConnectionBackoffPolicy`. Its primary purpose is to prevent connection storms by imposing restrictions on the number of concurrent pending connections per host and backoff time between each connection attempt.
Tests cover: 1. LimitedConcurrencyShardConnectionBackoffPolicy 2. NoDelayShardConnectionBackoffPolicy For both Scylla and Cassandra backend.
06f19e3
to
f71e7c9
Compare
Done, added section to
There is no python loader there, but we can emulate this issue locally, no need to run it on cloud, only difference is to overload real cluster you need way more connections. |
fa82cdd
to
41b5ea8
Compare
Sole goal of `ShardConnectionBackoffPolicy` existance is to fight connection storms. So, this commit adds connection storms section to `docs/faq.rst`
41b5ea8
to
088053b
Compare
Introduce
ShardReconnectionPolicy
and its implementations:NoDelayShardConnectionBackoffPolicy
: no delay or concurrency limit, ensures at most one pending connection per host+shard.LimitedConcurrencyShardConnectionBackoffPolicy
: limits pending concurrent connections tomax_concurrent
per host with backoff between shard connections.The idea of this PR is to shift responsibility of scheduling
HostConnection._open_connection_to_missing_shard
fromHostConnection
toShardConnectionBackoffPolicy
, that givesShardConnectionBackoffPolicy
control over process of opening connections.This feature enables finer control over process of creating per shard connections, helping to prevent connections storms.
Fixes: #483
Solutions tested and rejected
Naive delay
Description
Policy would introduce a delay instead of executing connection creation request right away.
Policy would remember last time when connection creation was scheduled to and when it tries to schedule next request it would make sure that there is time between old and new request execution is equal or more than
delay
it is configured with.Results
It worked fine when cluster operates in a normal way.
However, during testing with artificial delays, it became clear that this approach breaks down when the time to establish a
connection exceeds the configured delay.
In such cases, connections begin to pile up - the greater the connection initialization time relative to the delay, the faster they accumulate.
This becomes especially problematic during connection storms.
As the cluster becomes overloaded and connection initialization slows down, the delay-based throttling loses its effectiveness. In other words, the more the cluster suffers, the less effective the policy becomes.
Solution
The solution was to give the policy direct control over the connection initialization process.
This allows the policy to track how many connections are currently pending and apply delays after connections are created, rather than before.
That change ensures the policy remains effective even under heavy load.
This behavior is exactly what has been implemented in this PR.
Pre-review checklist
./docs/source/
.Fixes:
annotations to PR description.