Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent slack Update policy from posting new messages #298

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

AndreiPetrusMihai
Copy link

@AndreiPetrusMihai AndreiPetrusMihai commented May 20, 2024

The issue:

At the moment, a message with a policy of Update can post a new message on Slack. This happens when the message gets sent for a groupingKey which doesn't have a recorded timestamp, which happens when there was no previous message posted for the respectivegrouping key.

This is not correct and it is quite misleading since one would expect a policy of Update to never be able to actually post new messages. It also means that there is no practical difference between the PostAndUpdate and Update policies when it comes to sending a message for a groupingKey which didn't have any previous message recorded.

It's important to know that just because a timestamp wasn't recorded, it doesn't mean that a message with a certain groupingKey wasn't previously sent. The dictionary of groupingKey: timestamp is kept in-memory, so upon a complete engine restart, these records would get lost.

This could be considered a breaking change if someone relied on the Update policy to post new messages. It could also be considered a fix if the correct behavior of Update is to never post a new message.


The use-case/scenario with which this behavior was found:

We have multiple argo apps and we want to receive notifications when an error occurs. This would mean notifications for failed syncs, maybe degraded apps, etc.

At the moment this is doable, but it would be a bit hard to keep track of which apps were fixed and which weren't since the error messages are static. Even if the error for an app is now fixed, the error notification still stays in the slack channel, unchanged.

As a way to improve this experience, we want to do the following:

  • Send error/unhealthy app messages with a policy of PostAndUpdate. Of course, this would have a groupingKey which is related to a certain revision.
  • Send messages for successful syncs/healthy apps with a policy of Update. They would have the same groupingKey as the error message.

Having these 2 notifications would basically mean that errors would get posted to the channel, and once fixed, the error messages could be updated to reflect that the issues has been solved. This makes it much easier to follow and keep note of errors that still need fixing.

At the moment this doesn't work correctly. The successful sync messages do update existing error messages, but they also get posted when there is no corresponding error message for them.

Signed-off-by: Andrei Petrus <andreipetrus2000@gmail.com>
Copy link

codecov bot commented May 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.27%. Comparing base (f485671) to head (def4988).
Report is 3 commits behind head on master.

Current head def4988 differs from pull request most recent head 15a938c

Please upload reports for the commit 15a938c to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #298      +/-   ##
==========================================
- Coverage   55.35%   55.27%   -0.08%     
==========================================
  Files          35       35              
  Lines        3438     3439       +1     
==========================================
- Hits         1903     1901       -2     
- Misses       1256     1258       +2     
- Partials      279      280       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

if ts == "" || policy == Post || policy == PostAndUpdate {
newTs, channelID, err := SendMessageRateLimited(
// Updating an existing message
if ts != "" && (policy == Update || policy == PostAndUpdate) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updating is the first condition so it always takes priority. This way, when a message has a policy of PostAndUpdate, it always updates the existing message instead of posting a new one in a thread.

This might also fix a race condition in which PostAndUpdate would post a message in a thread instead of updating the initial message. I didn't encounter this behavior a lot so I'm not 100% sure. Either way, it makes sense for updates to take priority in this case.

@AndreiPetrusMihai AndreiPetrusMihai marked this pull request as ready for review May 20, 2024 09:31
@AndreiPetrusMihai
Copy link
Author

Hey @pasha-codefresh, could you maybe take a look at this PR when you have some spare time? Not sure who else to ping. Thanks!

@AndreiPetrusMihai
Copy link
Author

Hey @pasha-codefresh. Any chance someone could look over this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant