Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ack invalid messages #38

Merged
merged 1 commit into from
Sep 7, 2024
Merged

ack invalid messages #38

merged 1 commit into from
Sep 7, 2024

Conversation

russell
Copy link
Contributor

@russell russell commented Sep 3, 2024

Messages that are left in a pending state will eventually be deleted. Processing these messages triggers an error, and so they are never discarded, even though they have no content anymore. We are running AWS Elasticache Redis 6.2.6

When the xclaimed messages are fetched with XGET, I can see that the error value is redis: nil

error -- redis: nil
xp -- {
        Consumer: dispatcher-7f7684b54f-etecce
        ID: 1722875908017-0
        Idle: 1020707000000
        RetryCount: 1427
      }

What is happening is that the message exists in the PEL but has no content because it was deleted. Acking the message removes it from the PEL so it won't be claimed again in the same consumer group.

I have tried to create a test that gets the queue in this state, but didn't succeed. I tested this patch against our staging systems, and it did succeed at draining the queue.

After a while messages are being lost from the queue because of
constraints in the queue size.

```
error -- redis: nil
xp -- {
        Consumer: dispatcher-7f7484b54f-njdpq
        ID: 1722875908017-0
        Idle: 1020707000000
        RetryCount: 1427
      }
```

What is happening is that the message exists in the PEL but has no
content beacues it was deleted. Acking the message removes it from the
PEL so it won't be claimed again in the same consumer group.
@minghsu0107
Copy link
Collaborator

minghsu0107 commented Sep 7, 2024

Hi @russell

Thank you for your PR. This actually addresses the weird behavior that the message id in PEL is not removed even when the message is deleted from the stream, reported in this issue. You must have deleted some entries inside PEL in your application to encounter this error.

This PR fixed the issue so that X[AUTO]CLAIM will never return "nil" instead of an entry.

It seems that the fix was merged to Redis 7.0.0 (reference: https://github.com/redis/redis/releases/tag/7.0-rc2). But I believe your PR is still necessary for Redis version earlier than 7.0.0.

@minghsu0107 minghsu0107 merged commit fae93cb into ThreeDotsLabs:main Sep 7, 2024
4 checks passed
@russell
Copy link
Contributor Author

russell commented Sep 11, 2024

You must have deleted some entries inside PEL in your application to encounter this error.

I didn't do anything explicitly to my redis stream. XDEL is the same behaviour as XTRIM or XADD with a maxLen argument. Using approx makes it hard to reproduce, but i think the reason I hit this is that my stream has quite a high throughput (thousands a minute) and the length is quite short because my message sizes are very large, and losing a few isn't a problem. Once one message is lost, then the queue becomes stuck.

But i'm just speculating i honestly don't know how i got the unacked message, but once there was one, i soon ended up with a few hundred, and none were being processed.

I have a shell script i was using to create this same state, i could probably write a test case using direct redis commands if you think that would be worth while, but seeing as it will be fixed in the next redis version it might make maintaining the test hard, as it would require the old redis version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants