Celery/SQS micro-optimizations for performance improvement #1290

QuanNguyen-CDS · 2021-05-06T13:39:27Z

Summary | Résumé

Trello
feat: moving email to another celery worker instance
note: mfa email still go through the normal celery worker
fix: updated celery polling to sqs to 0.5 seconds

Test instructions | Instructions pour tester la modification

Testing setup:

created a simple rest endpoint [flask] and it's able to do a modest 800 rps - tested with apache bench (ab)
modified the awssesclient to send requests to this rest endpoint
created a python script to push emails to notification-api, ran 3 instances to simulate load
Tested different settings locally while monitoring SQS queues on staging/aws
Tested with --pool=gevent, prefetch-multiplier[10,100,124], polling_interval [1,0.5,0.3], acks_late

Locally with one instance of worker_email, I was able to do 10-15 rps to the rest endpoint. Theoretically this equates, with 10 pods, 100-150 emails per second, and 6000-9000 emails per minute.
The main observations is that the bottleneck within this system is the SQS signalling to initiate celery workers, future work we might want to consider moving to a more performant solution like Rabbitmq or Redis and or bulking notification.id together so a worker can handle more notifications in one instance.
Tried playing around with the prefetch with large numbers (124) but this didn't have much impact on the throughput

Unresolved questions / Out of scope | Questions non résolues ou hors sujet

Note: if we agree to move forward on this PR, it will not get committed until the notification-manifest PR gets created and approved (to be tested on staging)

note: this still keeps email mfa in the celery worker task for speediness

AntoineAugusti

Could you share more details why you think it's worth tweaking polling_interval, acks_late and concurrency?

QuanNguyen-CDS · 2021-05-07T11:46:41Z

Could you share more details why you think it's worth tweaking polling_interval, acks_late and concurrency?

Yes, of course.

Poll Interval:
Setting up the rest server, i was able to also setup a manual metric for Requests Per Second. Running the email task locally and queuing hundreds of requests within SQS, It was observed that there were fluctuations within the processing of tasks. I'd describe it as "bursty". I had hypothesized that the celery tasks were sitting idle until the next poll cycle.
When I increased the poll cycle to 0.5 seconds. The "burstiness" observations went away.

Concurrency:
The sending of emails is highly specialized; A burst of emails thousands of emails in a short period of time.
Additionally, the task itself is I/O bound:

fetching of the notification from the database
fetching the template from the database
sending the data for attachments
sending the email itself using the provider client
and then another database update on notification
The purpose of the increase in concurrency is to try to take I/O bound tasks and spread them out. If the tasks were CPU bound then we would want to minimize the concurrency to the number of CPUs allocated. However we want to account for the task waiting for I/O actions to complete.

Ack Late:
watching this video: https://www.youtube.com/watch?v=ceJ-vy7fvus @ 13:52 gave me ideas to play around with -Ofair, prefetch_multiplier and ack_late. It seems they came to the conclusions of these settings while running into problems running celery tasks, slow and fast tasks. I played around with the prefetch multiplier but didn't have any real effects on the performance of the system. Reading the description for ack late seems to make sense to enable it:

https://docs.celeryproject.org/en/stable/userguide/tasks.html
https://docs.celeryproject.org/en/stable/faq.html#faq-acks-late-vs-retry

The description
If your task is idempotent you can set the acks_late option to have the worker acknowledge the message after the task returns instead.

The acks_late setting would be used when you need the task to be executed again if the worker (for some reason) crashes mid-execution. It’s important to note that the worker isn’t known to crash, and if it does it’s usually an unrecoverable error that requires human intervention (bug in the worker, or task code).

sastels · 2021-08-03T19:21:00Z

@jimleroyer @Moro-Code we've been doing a lot of work around this lately - is this still relevant?

jimleroyer · 2021-08-05T21:47:27Z

@sastels Yes I'd say we should keep this. It seems like the tweaking can squeeze more performance. We could test thoroughly with our new load tests (which we didn't have yet when this PR was created).

jzbahrai · 2022-05-26T19:01:24Z

@jimleroyer can we shut this?

jimleroyer · 2022-05-26T19:09:36Z

Let's keep this please, there are good micro-optimizations that can help us that Quan did but we never merged in.

QuanNguyen-CDS added 3 commits May 6, 2021 09:15

feat: moving email to another celery instance

2253de3

note: this still keeps email mfa in the celery worker task for speediness

Merge branch 'master' into bulk-email

b8e480e

fix: set delivery_email to ack late, added new celery-email image deploy

8415a99

QuanNguyen-CDS requested review from AntoineAugusti and jimleroyer May 6, 2021 13:39

AntoineAugusti reviewed May 6, 2021

View reviewed changes

jimleroyer added 2 commits August 5, 2021 17:49

Merge branch 'master' into bulk-email

7c9b03d

Merge branch 'master' into bulk-email

61fcb9b

Base automatically changed from master to main February 17, 2022 18:17

jimleroyer changed the title ~~notification emails to be pushed to another celery worker~~ Celery/SQS micro-optimizations for performance improvement May 27, 2022

jimleroyer changed the title ~~Celery/SQS micro-optimizations for performance improvement~~ Prototype: Celery/SQS micro-optimizations for performance improvement May 27, 2022

sastels added the Prototype Prototype changes label Jun 1, 2022

sastels changed the title ~~Prototype: Celery/SQS micro-optimizations for performance improvement~~ Celery/SQS micro-optimizations for performance improvement Jun 1, 2022

jimleroyer mentioned this pull request Sep 20, 2024

Tweak Celery polling and scheduling configuration options for improved performance cds-snc/notification-planning-core#424

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Celery/SQS micro-optimizations for performance improvement #1290

Celery/SQS micro-optimizations for performance improvement #1290

QuanNguyen-CDS commented May 6, 2021 •

edited

Loading

AntoineAugusti left a comment

QuanNguyen-CDS commented May 7, 2021

sastels commented Aug 3, 2021

jimleroyer commented Aug 5, 2021 •

edited

Loading

jzbahrai commented May 26, 2022

jimleroyer commented May 26, 2022

Celery/SQS micro-optimizations for performance improvement #1290

Are you sure you want to change the base?

Celery/SQS micro-optimizations for performance improvement #1290

Conversation

QuanNguyen-CDS commented May 6, 2021 • edited Loading

Summary | Résumé

Test instructions | Instructions pour tester la modification

Unresolved questions / Out of scope | Questions non résolues ou hors sujet

AntoineAugusti left a comment

Choose a reason for hiding this comment

QuanNguyen-CDS commented May 7, 2021

sastels commented Aug 3, 2021

jimleroyer commented Aug 5, 2021 • edited Loading

jzbahrai commented May 26, 2022

jimleroyer commented May 26, 2022

QuanNguyen-CDS commented May 6, 2021 •

edited

Loading

jimleroyer commented Aug 5, 2021 •

edited

Loading