Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert ScheduledTask to a struct to reduce allocations for scheduling #2010

Merged

Conversation

FranzBusch
Copy link
Member

Motivation:

In my previous PR #2009, I added baseline performance and allocation tests around scheduleTask and execute. After analysing, the various allocations that happen when scheduling a task there were only a few that could be optimized away potentially.

Modifications:

This PR converts the ScheduledTask class to a struct which will reduce the number of allocations for scheduling tasks by 1. The only thing that needs to be worked around when converting to a struct is giving it an identity so that we can implement Equatable conformance properly. I explored two options. First, using an ObjectIdentifier passed to the init. Second, using an atomic counter per EventLoop. I went with the latter since the former requires an additional allocation in the case of calling execute

Result:

scheduleTask and execute require one less allocation

@FranzBusch FranzBusch added semver/patch No public API change. area/performance Improvements to performance. labels Dec 13, 2021
Sources/NIOEmbedded/Embedded.swift Outdated Show resolved Hide resolved
Sources/NIOEmbedded/Embedded.swift Outdated Show resolved Hide resolved
Sources/NIOPosix/SelectableEventLoop.swift Outdated Show resolved Hide resolved
Sources/_NIODataStructures/Heap.swift Outdated Show resolved Hide resolved
Sources/_NIODataStructures/Heap.swift Outdated Show resolved Hide resolved
@FranzBusch FranzBusch force-pushed the feature/scheduling-task-allocations-part-1 branch from 075586f to 8f0219b Compare December 14, 2021 09:03
@FranzBusch
Copy link
Member Author

@swift-nio-bot test perf please

@swift-server-bot
Copy link

performance report

build id: 88

timestamp: Tue Dec 14 09:07:41 UTC 2021

results

nameminmaxmeanstd
write_http_headers 0.00414562 0.004170369 0.0041584414 8.018787525416683e-06
http_headers_canonical_form 0.087742463 0.088283076 0.087967703 0.0002508194710229285
http_headers_canonical_form_trimming_whitespace 0.166966501 0.16789972 0.16751832039999998 0.0002995299266840957
http_headers_canonical_form_trimming_whitespace_from_short_string 0.152550671 0.153712555 0.15312232909999998 0.0002961416548453532
http_headers_canonical_form_trimming_whitespace_from_long_string 0.236444196 0.237103021 0.23669879759999998 0.00026267026775374144
bytebuffer_write_12MB_short_string_literals 0.537611691 0.542972231 0.5387932365 0.00158506946802259
bytebuffer_write_12MB_short_calculated_strings 0.538646812 0.540606477 0.5395693218 0.0005787273217277582
bytebuffer_write_12MB_medium_string_literals 0.180068208 0.181553359 0.18059913919999998 0.0003748533101950369
bytebuffer_write_12MB_medium_calculated_strings 0.227821003 0.231044606 0.22957030569999998 0.0012749121690409453
bytebuffer_write_12MB_large_calculated_strings 0.14665349 0.147763596 0.1475107632 0.00032684030631174545
bytebuffer_lots_of_rw 0.447008943 0.463126089 0.4503129862 0.004876279935427778
bytebuffer_write_http_response_ascii_only_as_string 0.041612008 0.042209483 0.041824030899999996 0.00020810271464802651
bytebuffer_write_http_response_ascii_only_as_staticstring 0.0315276 0.032115683 0.0317026366 0.00021893668531701113
bytebuffer_write_http_response_some_nonascii_as_string 0.041650639 0.042219819 0.0418002432 0.00021590348720450665
bytebuffer_write_http_response_some_nonascii_as_staticstring 0.031557406 0.032188876 0.0317449763 0.0002268380732333815
no-net_http1_10k_reqs_1_conn 0.111157714 0.11233259 0.1117950186 0.0004308926203025017
http1_10k_reqs_1_conn 0.604358335 0.611410898 0.6080412305 0.002048241046956931
http1_10k_reqs_100_conns 0.596351046 0.599218912 0.5979238572 0.0010553153948575263
future_whenallsucceed_100k_immediately_succeeded_off_loop 0.073654859 0.075574107 0.0742523995 0.0005735306620419509
future_whenallsucceed_100k_immediately_succeeded_on_loop 0.074791767 0.08178293 0.0757958808 0.002134402288224912
future_whenallsucceed_100k_deferred_off_loop 0.234591336 0.237554839 0.23562371259999998 0.0010143755890970733
future_whenallsucceed_100k_deferred_on_loop 0.127191517 0.129672888 0.12850151780000002 0.0009504499205471953
future_whenallcomplete_100k_immediately_succeeded_off_loop 0.031295856 0.031910069 0.03152248 0.00021275082802241324
future_whenallcomplete_100k_immediately_succeeded_on_loop 0.031155497 0.031679574 0.031384751100000004 0.00018632577939097867
future_whenallcomplete_100k_deferred_off_loop 0.162526734 0.166180411 0.16385470870000002 0.0012274460621670234
future_whenallcomplete_100k_deferred_on_loop 0.063995642 0.068065638 0.0648761518 0.0011634569531845643
future_reduce_10k_futures 0.03733747 0.037991428 0.0376282677 0.0002029491513677751
future_reduce_into_10k_futures 0.037015316 0.043428784 0.0385078817 0.0025771645687397552
channel_pipeline_1m_events 0.097144963 0.097293249 0.097212782 5.922684643338868e-05
websocket_encode_50b_space_at_front_1m_frames_cow 0.503200385 0.507200185 0.5038247551999999 0.0011984005962653985
websocket_encode_50b_space_at_front_1m_frames_cow_masking 0.066381626 0.06694858 0.0665445324 0.00020336462315271068
websocket_encode_1kb_space_at_front_100k_frames_cow 0.052755723 0.0532076 0.05286622009999999 0.0001797189935541038
websocket_encode_50b_no_space_at_front_1m_frames_cow 0.500103287 0.50065094 0.5003721264000001 0.00025701673337361194
websocket_encode_1kb_no_space_at_front_100k_frames_cow 0.054178451 0.054651007 0.0543335447 0.0002108864583883153
websocket_encode_50b_space_at_front_10k_frames 0.006546483 0.006577544 0.0065556167 8.946188164675497e-06
websocket_encode_50b_space_at_front_10k_frames_masking 0.082151078 0.083078638 0.0826351934 0.0003354404280637609
websocket_encode_1kb_space_at_front_1k_frames 0.000761485 0.00077072 0.0007675274 2.940965381941418e-06
websocket_encode_50b_no_space_at_front_10k_frames 0.006540266 0.00656259 0.0065480859 7.368225392408487e-06
websocket_encode_1kb_no_space_at_front_1k_frames 0.000700424 0.000710788 0.000704938 2.888092526988098e-06
websocket_decode_125b_100k_frames 0.118384619 0.118951616 0.11873000730000001 0.00021552578065952985
websocket_decode_125b_with_a_masking_key_100k_frames 0.121118064 0.121694449 0.12144852310000001 0.00024368365343911078
websocket_decode_64kb_100k_frames 0.121148649 0.121959815 0.1215142088 0.0002905360228580575
websocket_decode_64kb_with_a_masking_key_100k_frames 0.124120867 0.124728311 0.1244782972 0.00023383644156014673
websocket_decode_64kb_+1_100k_frames 0.121106266 0.121664253 0.12139651790000001 0.00023152917282923084
websocket_decode_64kb_+1_with_a_masking_key_100k_frames 0.124249877 0.124983955 0.12461501509999999 0.0002749658126525111
circular_buffer_into_byte_buffer_1kb 0.041223548 0.041686204 0.0413296688 0.0001787447633997823
circular_buffer_into_byte_buffer_1mb 0.082238839 0.082733285 0.0824486062 0.00022004600608953364
byte_buffer_view_iterator_1mb 0.020482201 0.020914296 0.0205452582 0.00013136769899941124
byte_to_message_decoder_decode_many_small 0.174208851 0.174743758 0.1744917968 0.0001309510004007429
generate_10k_random_request_keys 0.091033351 0.091316782 0.0911974049 0.0001087023401285976
bytebuffer_rw_10_uint32s 0.305135195 0.311577304 0.307461056 0.002188570596817882
bytebuffer_multi_rw_10_uint32s 0.056522444 0.058397674 0.0573392173 0.0005715695432386249
lock_1_thread_10M_ops 0.159187216 0.159703384 0.1593455218 0.00014157307187205196
lock_2_threads_10M_ops 0.867226819 0.949137449 0.9047931014999999 0.02694147852297731
lock_4_threads_10M_ops 0.947351939 0.976586619 0.9645746566 0.008434589480023822
lock_8_threads_10M_ops 0.960922712 0.988427938 0.9759528174 0.007270053169529688
schedule_10000_tasks 0.007192372 0.010283915 0.008210964599999999 0.0008356672738828
schedule_and_run_10000_tasks 0.023700744 0.024994938 0.024533386400000003 0.00037400854407774667
execute_10000 0.008811711 0.009204068 0.0088636962 0.00012175882497498415

comparison

name current previous winner diff
write_http_headers 0.00414562 0.004145224 previous 0%
http_headers_canonical_form 0.087742463 0.08911033 current -1%
http_headers_canonical_form_trimming_whitespace 0.166966501 0.165635222 previous 0%
http_headers_canonical_form_trimming_whitespace_from_short_string 0.152550671 0.152098971 previous 0%
http_headers_canonical_form_trimming_whitespace_from_long_string 0.236444196 0.234277035 previous 0%
bytebuffer_write_12MB_short_string_literals 0.537611691 0.514491549 previous 4%
bytebuffer_write_12MB_short_calculated_strings 0.538646812 0.513110549 previous 4%
bytebuffer_write_12MB_medium_string_literals 0.180068208 0.174893218 previous 2%
bytebuffer_write_12MB_medium_calculated_strings 0.227821003 0.228342126 current 0%
bytebuffer_write_12MB_large_calculated_strings 0.14665349 0.146318418 previous 0%
bytebuffer_lots_of_rw 0.447008943 0.443669225 previous 0%
bytebuffer_write_http_response_ascii_only_as_string 0.041612008 0.041542031 previous 0%
bytebuffer_write_http_response_ascii_only_as_staticstring 0.0315276 0.032454044 current -2%
bytebuffer_write_http_response_some_nonascii_as_string 0.041650639 0.040967112 previous 1%
bytebuffer_write_http_response_some_nonascii_as_staticstring 0.031557406 0.031451031 previous 0%
no-net_http1_10k_reqs_1_conn 0.111157714 0.110983425 previous 0%
http1_10k_reqs_1_conn 0.604358335 0.607672033 current 0%
http1_10k_reqs_100_conns 0.596351046 0.599901185 current 0%
future_whenallsucceed_100k_immediately_succeeded_off_loop 0.073654859 0.072675502 previous 1%
future_whenallsucceed_100k_immediately_succeeded_on_loop 0.074791767 0.072882312 previous 2%
future_whenallsucceed_100k_deferred_off_loop 0.234591336 0.279962873 current -16%
future_whenallsucceed_100k_deferred_on_loop 0.127191517 0.127752681 current 0%
future_whenallcomplete_100k_immediately_succeeded_off_loop 0.031295856 0.030952421 previous 1%
future_whenallcomplete_100k_immediately_succeeded_on_loop 0.031155497 0.030927637 previous 0%
future_whenallcomplete_100k_deferred_off_loop 0.162526734 0.204988133 current -20%
future_whenallcomplete_100k_deferred_on_loop 0.063995642 0.0647077 current -1%
future_reduce_10k_futures 0.03733747 0.037298608 previous 0%
future_reduce_into_10k_futures 0.037015316 0.036684117 previous 0%
channel_pipeline_1m_events 0.097144963 0.106349368 current -8%
websocket_encode_50b_space_at_front_1m_frames_cow 0.503200385 0.497735186 previous 1%
websocket_encode_50b_space_at_front_1m_frames_cow_masking 0.066381626 0.065854414 previous 0%
websocket_encode_1kb_space_at_front_100k_frames_cow 0.052755723 0.052458733 previous 0%
websocket_encode_50b_no_space_at_front_1m_frames_cow 0.500103287 0.497994446 previous 0%
websocket_encode_1kb_no_space_at_front_100k_frames_cow 0.054178451 0.052489603 previous 3%
websocket_encode_50b_space_at_front_10k_frames 0.006546483 0.006549271 current 0%
websocket_encode_50b_space_at_front_10k_frames_masking 0.082151078 0.081237611 previous 1%
websocket_encode_1kb_space_at_front_1k_frames 0.000761485 0.000767356 current 0%
websocket_encode_50b_no_space_at_front_10k_frames 0.006540266 0.006497964 previous 0%
websocket_encode_1kb_no_space_at_front_1k_frames 0.000700424 0.000709299 current -1%
websocket_decode_125b_100k_frames 0.118384619 0.121491762 current -2%
websocket_decode_125b_with_a_masking_key_100k_frames 0.121118064 0.124183576 current -2%
websocket_decode_64kb_100k_frames 0.121148649 0.12434102 current -2%
websocket_decode_64kb_with_a_masking_key_100k_frames 0.124120867 0.128138445 current -3%
websocket_decode_64kb_+1_100k_frames 0.121106266 0.124199858 current -2%
websocket_decode_64kb_+1_with_a_masking_key_100k_frames 0.124249877 0.126870633 current -2%
circular_buffer_into_byte_buffer_1kb 0.041223548 0.041222961 previous 0%
circular_buffer_into_byte_buffer_1mb 0.082238839 0.082247862 current 0%
byte_buffer_view_iterator_1mb 0.020482201 0.020486184 current 0%
byte_to_message_decoder_decode_many_small 0.174208851 0.17678933 current -1%
generate_10k_random_request_keys 0.091033351 0.090747235 previous 0%
bytebuffer_rw_10_uint32s 0.305135195 0.308369049 current -1%
bytebuffer_multi_rw_10_uint32s 0.056522444 0.05707658 current 0%
lock_1_thread_10M_ops 0.159187216 0.159143504 previous 0%
lock_2_threads_10M_ops 0.867226819 0.901086754 current -3%
lock_4_threads_10M_ops 0.947351939 0.889878295 previous 6%
lock_8_threads_10M_ops 0.960922712 0.977127124 current -1%
schedule_10000_tasks 0.007192372 0.007754252 current -7%
schedule_and_run_10000_tasks 0.023700744 0.02520874 current -5%
execute_10000 0.008811711 0.012096945 current -27%

significant differences found

Sources/NIOEmbedded/Embedded.swift Outdated Show resolved Hide resolved
Sources/NIOEmbedded/Embedded.swift Outdated Show resolved Hide resolved
Sources/NIOEmbedded/Embedded.swift Outdated Show resolved Hide resolved
@FranzBusch FranzBusch force-pushed the feature/scheduling-task-allocations-part-1 branch 2 times, most recently from 8b6d27f to 5b75a75 Compare December 14, 2021 10:13
FranzBusch added a commit to FranzBusch/swift-nio that referenced this pull request Dec 14, 2021
### Motivation:

In my previous PR apple#2010, I was able to decrease the allocations for both `scheduleTask` and `execute` by 1 already. Gladly, there are no more allocations left to remove from `execute` now; however, `scheduleTask` still provides a couple of allocations that we can try to get rid of.

### Modifications:

This PR removes two allocations inside `Scheduled` where we were using the passed in `EventLoopPromise` to call the `cancellationTask` once the `EventLoopFuture` of the promise fails. This requires two allocations inside `whenFailure` and inside `_whenComplete`. However, since we are passing the `cancellationTask` to `Scheduled` anyhow and `Scheduled` is also the one that is failing the promise from the `cancel()` method. We can just go ahead and store the `cancellationTask` inside `Scheduled` and call it from the `cancel()` method directly instead of going through the future.

Importantly, here is that the `cancellationTask` is not allowed to retain the `ScheduledTask.task` otherwise we would change the semantics and retain the `ScheduledTask.task` longer than necessary. My previous PR apple#2010, already implemented the work to get rid of the retain from the `cancellationTask` closure. So we are good to go ahead and store the `cancellationTask` inside `Scheduled` now

### Result:

`scheduleTask` requires two fewer allocations
FranzBusch added a commit to FranzBusch/swift-nio that referenced this pull request Dec 14, 2021
### Motivation:

In my previous PR apple#2010, I was able to decrease the allocations for both `scheduleTask` and `execute` by 1 already. Gladly, there are no more allocations left to remove from `execute` now; however, `scheduleTask` still provides a couple of allocations that we can try to get rid of.

### Modifications:

This PR removes two allocations inside `Scheduled` where we were using the passed in `EventLoopPromise` to call the `cancellationTask` once the `EventLoopFuture` of the promise fails. This requires two allocations inside `whenFailure` and inside `_whenComplete`. However, since we are passing the `cancellationTask` to `Scheduled` anyhow and `Scheduled` is also the one that is failing the promise from the `cancel()` method. We can just go ahead and store the `cancellationTask` inside `Scheduled` and call it from the `cancel()` method directly instead of going through the future.

Importantly, here is that the `cancellationTask` is not allowed to retain the `ScheduledTask.task` otherwise we would change the semantics and retain the `ScheduledTask.task` longer than necessary. My previous PR apple#2010, already implemented the work to get rid of the retain from the `cancellationTask` closure. So we are good to go ahead and store the `cancellationTask` inside `Scheduled` now

### Result:

`scheduleTask` requires two fewer allocations
@FranzBusch FranzBusch force-pushed the feature/scheduling-task-allocations-part-1 branch 3 times, most recently from 7e9b916 to b77f283 Compare December 14, 2021 12:46
@@ -63,6 +67,7 @@ public final class EmbeddedEventLoop: EventLoop {
/// The current "time" for this event loop. This is an amount in nanoseconds.
/* private but tests */ internal var _now: NIODeadline = .uptimeNanoseconds(0)

private var scheduledTaskCounter = NIOAtomic.makeAtomic(value: UInt64(0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a shame that this now regresses a bunch of allocation counting tests that use EmbeddedEventLoop (and will no doubt regress allocation counting tests in other swift-nio-* packages). I wonder if we should make the counter for embedded loops global (or static) to reduce the noise of this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually EmbeddedEventLoop isn't thread safe, can we just use a UInt64?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can, and probably should.

@FranzBusch
Copy link
Member Author

@swift-nio-bot test perf please

@swift-server-bot
Copy link

performance report

build id: 89

timestamp: Tue Dec 14 13:09:22 UTC 2021

results

nameminmaxmeanstd
write_http_headers 0.004151522 0.004183727 0.0041620739 1.2041364286399366e-05
http_headers_canonical_form 0.088430253 0.089901379 0.0891307442 0.0004366230728185489
http_headers_canonical_form_trimming_whitespace 0.167547181 0.168943945 0.16836075960000002 0.0004936144436334926
http_headers_canonical_form_trimming_whitespace_from_short_string 0.153256064 0.155570973 0.1542473069 0.0006904524114223839
http_headers_canonical_form_trimming_whitespace_from_long_string 0.236537501 0.238182843 0.23765484199999998 0.0005199242473031766
bytebuffer_write_12MB_short_string_literals 0.53871326 0.544861723 0.5398345834 0.001792766522633744
bytebuffer_write_12MB_short_calculated_strings 0.537398122 0.539197074 0.5384030387000001 0.0004938864378844973
bytebuffer_write_12MB_medium_string_literals 0.18036815 0.182815677 0.1811850212 0.0006934664284326004
bytebuffer_write_12MB_medium_calculated_strings 0.227687476 0.230903539 0.22843203509999999 0.0009207102813535293
bytebuffer_write_12MB_large_calculated_strings 0.145153582 0.146097304 0.1457087803 0.00029950583510622597
bytebuffer_lots_of_rw 0.444823524 0.461599314 0.4504535366 0.004721448807828372
bytebuffer_write_http_response_ascii_only_as_string 0.041847718 0.04245066 0.0420044528 0.00022665548398483534
bytebuffer_write_http_response_ascii_only_as_staticstring 0.031690055 0.032244306 0.0317804414 0.00017363695855062396
bytebuffer_write_http_response_some_nonascii_as_string 0.041374321 0.041930402 0.0415125409 0.00021184051884138853
bytebuffer_write_http_response_some_nonascii_as_staticstring 0.031662732 0.032154421 0.0317516163 0.00015483711487240825
no-net_http1_10k_reqs_1_conn 0.107703688 0.108847996 0.10833147380000001 0.00040061290725692096
http1_10k_reqs_1_conn 0.60143013 0.605420767 0.6040527650999999 0.0011978514504024365
http1_10k_reqs_100_conns 0.595530645 0.599763302 0.5980088506 0.001196790948591595
future_whenallsucceed_100k_immediately_succeeded_off_loop 0.075422323 0.076797785 0.0760079438 0.0005613757400653806
future_whenallsucceed_100k_immediately_succeeded_on_loop 0.075904764 0.083652273 0.0770806349 0.0023294347408574583
future_whenallsucceed_100k_deferred_off_loop 0.236619435 0.238378638 0.237582446 0.0004994269717957895
future_whenallsucceed_100k_deferred_on_loop 0.128441917 0.131340894 0.1293606032 0.0008738091035788651
future_whenallcomplete_100k_immediately_succeeded_off_loop 0.033356524 0.034981001 0.0343318886 0.0006332854324264855
future_whenallcomplete_100k_immediately_succeeded_on_loop 0.032398081 0.033786268 0.0331543891 0.0003907790664387827
future_whenallcomplete_100k_deferred_off_loop 0.162560305 0.166081693 0.1644604277 0.0012419962684892083
future_whenallcomplete_100k_deferred_on_loop 0.064100589 0.068463006 0.0654826104 0.0013425638707560172
future_reduce_10k_futures 0.037746259 0.038828006 0.038112586399999995 0.0003752675242381385
future_reduce_into_10k_futures 0.037557169 0.038299607 0.0379542774 0.00024701019231701424
channel_pipeline_1m_events 0.097131324 0.097276606 0.09719819539999999 6.156095779018069e-05
websocket_encode_50b_space_at_front_1m_frames_cow 0.496059108 0.497152351 0.49638445769999995 0.0003588585709898785
websocket_encode_50b_space_at_front_1m_frames_cow_masking 0.065929138 0.066450517 0.0660998009 0.00022877699460656324
websocket_encode_1kb_space_at_front_100k_frames_cow 0.053373654 0.054409663 0.05397034739999999 0.0002968013269898453
websocket_encode_50b_no_space_at_front_1m_frames_cow 0.496025982 0.496569452 0.49623663239999993 0.000216899089006447
websocket_encode_1kb_no_space_at_front_100k_frames_cow 0.052971328 0.054417669 0.053904922300000005 0.00039371177614709526
websocket_encode_50b_space_at_front_10k_frames 0.006558222 0.006999774 0.0066094029 0.00013750854300365823
websocket_encode_50b_space_at_front_10k_frames_masking 0.081013162 0.081547103 0.0812336803 0.00025410743309301506
websocket_encode_1kb_space_at_front_1k_frames 0.000787929 0.000798157 0.0007929230000000001 3.7581814159983654e-06
websocket_encode_50b_no_space_at_front_10k_frames 0.006513769 0.006956659 0.0065641621 0.00013802836379080437
websocket_encode_1kb_no_space_at_front_1k_frames 0.000697652 0.000708102 0.000701608 3.1716712383922166e-06
websocket_decode_125b_100k_frames 0.118390293 0.118881447 0.1186925198 0.0002008525828262016
websocket_decode_125b_with_a_masking_key_100k_frames 0.12106642 0.121711364 0.1214091472 0.00022582451747014398
websocket_decode_64kb_100k_frames 0.121478064 0.12196782 0.1217863227 0.00019758312234435539
websocket_decode_64kb_with_a_masking_key_100k_frames 0.124029946 0.124654548 0.1243937584 0.00020611284360121343
websocket_decode_64kb_+1_100k_frames 0.121233658 0.187318269 0.1329091804 0.023918651470373257
websocket_decode_64kb_+1_with_a_masking_key_100k_frames 0.124289538 0.124858212 0.12463643179999999 0.00020249188382626712
circular_buffer_into_byte_buffer_1kb 0.041224497 0.041686484 0.041336235 0.00017848021361110855
circular_buffer_into_byte_buffer_1mb 0.08226938 0.082708088 0.0824594048 0.00021066042729093646
byte_buffer_view_iterator_1mb 0.020482362 0.020907909 0.020606692 0.0001680268793490422
byte_to_message_decoder_decode_many_small 0.175910427 0.17650716 0.1763541424 0.00016317664642970023
generate_10k_random_request_keys 0.090992491 0.091148277 0.0910713021 5.88565908146439e-05
bytebuffer_rw_10_uint32s 0.304043536 0.306026601 0.3051114183 0.0007635843977086005
bytebuffer_multi_rw_10_uint32s 0.05580767 0.057469933 0.056581459599999995 0.0005693828706156243
lock_1_thread_10M_ops 0.159191882 0.159679862 0.1593529433 0.00013261999702914712
lock_2_threads_10M_ops 0.949855816 1.013004937 0.971460943 0.022373449804471115
lock_4_threads_10M_ops 0.924154611 0.973021515 0.9496549908 0.015007128755349148
lock_8_threads_10M_ops 1.000105751 1.027412619 1.0179209897 0.008612550126211243
schedule_10000_tasks 0.007107886 0.010208763 0.0081019604 0.0008648477579992135
schedule_and_run_10000_tasks 0.024080419 0.025390788 0.0246770319 0.00037963173067254716
execute_10000 0.008820912 0.00921915 0.008875854700000001 0.0001215012971709906

comparison

name current previous winner diff
write_http_headers 0.004151522 0.004145224 previous 0%
http_headers_canonical_form 0.088430253 0.08911033 current 0%
http_headers_canonical_form_trimming_whitespace 0.167547181 0.165635222 previous 1%
http_headers_canonical_form_trimming_whitespace_from_short_string 0.153256064 0.152098971 previous 0%
http_headers_canonical_form_trimming_whitespace_from_long_string 0.236537501 0.234277035 previous 0%
bytebuffer_write_12MB_short_string_literals 0.53871326 0.514491549 previous 4%
bytebuffer_write_12MB_short_calculated_strings 0.537398122 0.513110549 previous 4%
bytebuffer_write_12MB_medium_string_literals 0.18036815 0.174893218 previous 3%
bytebuffer_write_12MB_medium_calculated_strings 0.227687476 0.228342126 current 0%
bytebuffer_write_12MB_large_calculated_strings 0.145153582 0.146318418 current 0%
bytebuffer_lots_of_rw 0.444823524 0.443669225 previous 0%
bytebuffer_write_http_response_ascii_only_as_string 0.041847718 0.041542031 previous 0%
bytebuffer_write_http_response_ascii_only_as_staticstring 0.031690055 0.032454044 current -2%
bytebuffer_write_http_response_some_nonascii_as_string 0.041374321 0.040967112 previous 0%
bytebuffer_write_http_response_some_nonascii_as_staticstring 0.031662732 0.031451031 previous 0%
no-net_http1_10k_reqs_1_conn 0.107703688 0.110983425 current -2%
http1_10k_reqs_1_conn 0.60143013 0.607672033 current -1%
http1_10k_reqs_100_conns 0.595530645 0.599901185 current 0%
future_whenallsucceed_100k_immediately_succeeded_off_loop 0.075422323 0.072675502 previous 3%
future_whenallsucceed_100k_immediately_succeeded_on_loop 0.075904764 0.072882312 previous 4%
future_whenallsucceed_100k_deferred_off_loop 0.236619435 0.279962873 current -15%
future_whenallsucceed_100k_deferred_on_loop 0.128441917 0.127752681 previous 0%
future_whenallcomplete_100k_immediately_succeeded_off_loop 0.033356524 0.030952421 previous 7%
future_whenallcomplete_100k_immediately_succeeded_on_loop 0.032398081 0.030927637 previous 4%
future_whenallcomplete_100k_deferred_off_loop 0.162560305 0.204988133 current -20%
future_whenallcomplete_100k_deferred_on_loop 0.064100589 0.0647077 current 0%
future_reduce_10k_futures 0.037746259 0.037298608 previous 1%
future_reduce_into_10k_futures 0.037557169 0.036684117 previous 2%
channel_pipeline_1m_events 0.097131324 0.106349368 current -8%
websocket_encode_50b_space_at_front_1m_frames_cow 0.496059108 0.497735186 current 0%
websocket_encode_50b_space_at_front_1m_frames_cow_masking 0.065929138 0.065854414 previous 0%
websocket_encode_1kb_space_at_front_100k_frames_cow 0.053373654 0.052458733 previous 1%
websocket_encode_50b_no_space_at_front_1m_frames_cow 0.496025982 0.497994446 current 0%
websocket_encode_1kb_no_space_at_front_100k_frames_cow 0.052971328 0.052489603 previous 0%
websocket_encode_50b_space_at_front_10k_frames 0.006558222 0.006549271 previous 0%
websocket_encode_50b_space_at_front_10k_frames_masking 0.081013162 0.081237611 current 0%
websocket_encode_1kb_space_at_front_1k_frames 0.000787929 0.000767356 previous 2%
websocket_encode_50b_no_space_at_front_10k_frames 0.006513769 0.006497964 previous 0%
websocket_encode_1kb_no_space_at_front_1k_frames 0.000697652 0.000709299 current -1%
websocket_decode_125b_100k_frames 0.118390293 0.121491762 current -2%
websocket_decode_125b_with_a_masking_key_100k_frames 0.12106642 0.124183576 current -2%
websocket_decode_64kb_100k_frames 0.121478064 0.12434102 current -2%
websocket_decode_64kb_with_a_masking_key_100k_frames 0.124029946 0.128138445 current -3%
websocket_decode_64kb_+1_100k_frames 0.121233658 0.124199858 current -2%
websocket_decode_64kb_+1_with_a_masking_key_100k_frames 0.124289538 0.126870633 current -2%
circular_buffer_into_byte_buffer_1kb 0.041224497 0.041222961 previous 0%
circular_buffer_into_byte_buffer_1mb 0.08226938 0.082247862 previous 0%
byte_buffer_view_iterator_1mb 0.020482362 0.020486184 current 0%
byte_to_message_decoder_decode_many_small 0.175910427 0.17678933 current 0%
generate_10k_random_request_keys 0.090992491 0.090747235 previous 0%
bytebuffer_rw_10_uint32s 0.304043536 0.308369049 current -1%
bytebuffer_multi_rw_10_uint32s 0.05580767 0.05707658 current -2%
lock_1_thread_10M_ops 0.159191882 0.159143504 previous 0%
lock_2_threads_10M_ops 0.949855816 0.901086754 previous 5%
lock_4_threads_10M_ops 0.924154611 0.889878295 previous 3%
lock_8_threads_10M_ops 1.000105751 0.977127124 previous 2%
schedule_10000_tasks 0.007107886 0.007754252 current -8%
schedule_and_run_10000_tasks 0.024080419 0.02520874 current -4%
execute_10000 0.008820912 0.012096945 current -27%

significant differences found

@FranzBusch FranzBusch force-pushed the feature/scheduling-task-allocations-part-1 branch 2 times, most recently from 397dbdf to 419ced9 Compare December 14, 2021 14:47
FranzBusch added a commit to FranzBusch/swift-nio that referenced this pull request Dec 14, 2021
### Motivation:

In my previous PR apple#2010, I was able to decrease the allocations for both `scheduleTask` and `execute` by 1 already. Gladly, there are no more allocations left to remove from `execute` now; however, `scheduleTask` still provides a couple of allocations that we can try to get rid of.

### Modifications:

This PR removes two allocations inside `Scheduled` where we were using the passed in `EventLoopPromise` to call the `cancellationTask` once the `EventLoopFuture` of the promise fails. This requires two allocations inside `whenFailure` and inside `_whenComplete`. However, since we are passing the `cancellationTask` to `Scheduled` anyhow and `Scheduled` is also the one that is failing the promise from the `cancel()` method. We can just go ahead and store the `cancellationTask` inside `Scheduled` and call it from the `cancel()` method directly instead of going through the future.

Importantly, here is that the `cancellationTask` is not allowed to retain the `ScheduledTask.task` otherwise we would change the semantics and retain the `ScheduledTask.task` longer than necessary. My previous PR apple#2010, already implemented the work to get rid of the retain from the `cancellationTask` closure. So we are good to go ahead and store the `cancellationTask` inside `Scheduled` now

### Result:

`scheduleTask` requires two fewer allocations
@dnadoba
Copy link
Member

dnadoba commented Dec 14, 2021

@swift-server-bot test this please

@FranzBusch FranzBusch force-pushed the feature/scheduling-task-allocations-part-1 branch from 419ced9 to 15b1bdd Compare December 14, 2021 16:09
Copy link
Contributor

@glbrntt glbrntt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

### Motivation:

In my previous PR apple#2009, I added baseline performance and allocation tests around `scheduleTask` and `execute`. After analysing, the various allocations that happen when scheduling a task there were only a few that could be optimized away potentially.

### Modifications:

This PR converts the `ScheduledTask` class to a struct which will reduce the number of allocations for scheduling tasks by 1. The only thing that needs to be worked around when converting to a struct is giving it an identity so that we can implement `Equatable` conformance properly. I explored two options. First, using an `ObjectIdentifier` passed to the init. Second, using an atomic counter per EventLoop. I went with the latter since the former requires an additional allocation in the case of calling `execute`

### Result:

`scheduleTask` and `execute` require one less allocation
@FranzBusch FranzBusch force-pushed the feature/scheduling-task-allocations-part-1 branch from 15b1bdd to 584c6aa Compare December 14, 2021 17:06
Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice change, well done!

@Lukasa Lukasa merged commit f228e26 into apple:main Dec 14, 2021
@FranzBusch FranzBusch deleted the feature/scheduling-task-allocations-part-1 branch December 14, 2021 17:58
FranzBusch added a commit to FranzBusch/swift-nio that referenced this pull request Dec 14, 2021
### Motivation:

In my previous PR apple#2010, I was able to decrease the allocations for both `scheduleTask` and `execute` by 1 already. Gladly, there are no more allocations left to remove from `execute` now; however, `scheduleTask` still provides a couple of allocations that we can try to get rid of.

### Modifications:

This PR removes two allocations inside `Scheduled` where we were using the passed in `EventLoopPromise` to call the `cancellationTask` once the `EventLoopFuture` of the promise fails. This requires two allocations inside `whenFailure` and inside `_whenComplete`. However, since we are passing the `cancellationTask` to `Scheduled` anyhow and `Scheduled` is also the one that is failing the promise from the `cancel()` method. We can just go ahead and store the `cancellationTask` inside `Scheduled` and call it from the `cancel()` method directly instead of going through the future.

Importantly, here is that the `cancellationTask` is not allowed to retain the `ScheduledTask.task` otherwise we would change the semantics and retain the `ScheduledTask.task` longer than necessary. My previous PR apple#2010, already implemented the work to get rid of the retain from the `cancellationTask` closure. So we are good to go ahead and store the `cancellationTask` inside `Scheduled` now

### Result:

`scheduleTask` requires two fewer allocations
FranzBusch added a commit to FranzBusch/swift-nio that referenced this pull request Dec 15, 2021
### Motivation:

In my previous PR apple#2010, I was able to decrease the allocations for both `scheduleTask` and `execute` by 1 already. Gladly, there are no more allocations left to remove from `execute` now; however, `scheduleTask` still provides a couple of allocations that we can try to get rid of.

### Modifications:

This PR removes two allocations inside `Scheduled` where we were using the passed in `EventLoopPromise` to call the `cancellationTask` once the `EventLoopFuture` of the promise fails. This requires two allocations inside `whenFailure` and inside `_whenComplete`. However, since we are passing the `cancellationTask` to `Scheduled` anyhow and `Scheduled` is also the one that is failing the promise from the `cancel()` method. We can just go ahead and store the `cancellationTask` inside `Scheduled` and call it from the `cancel()` method directly instead of going through the future.

Importantly, here is that the `cancellationTask` is not allowed to retain the `ScheduledTask.task` otherwise we would change the semantics and retain the `ScheduledTask.task` longer than necessary. My previous PR apple#2010, already implemented the work to get rid of the retain from the `cancellationTask` closure. So we are good to go ahead and store the `cancellationTask` inside `Scheduled` now

### Result:

`scheduleTask` requires two fewer allocations
FranzBusch added a commit to FranzBusch/swift-nio that referenced this pull request Dec 15, 2021
### Motivation:

In my previous PR apple#2010, I was able to decrease the allocations for both `scheduleTask` and `execute` by 1 already. Gladly, there are no more allocations left to remove from `execute` now; however, `scheduleTask` still provides a couple of allocations that we can try to get rid of.

### Modifications:

This PR removes two allocations inside `Scheduled` where we were using the passed in `EventLoopPromise` to call the `cancellationTask` once the `EventLoopFuture` of the promise fails. This requires two allocations inside `whenFailure` and inside `_whenComplete`. However, since we are passing the `cancellationTask` to `Scheduled` anyhow and `Scheduled` is also the one that is failing the promise from the `cancel()` method. We can just go ahead and store the `cancellationTask` inside `Scheduled` and call it from the `cancel()` method directly instead of going through the future.

Importantly, here is that the `cancellationTask` is not allowed to retain the `ScheduledTask.task` otherwise we would change the semantics and retain the `ScheduledTask.task` longer than necessary. My previous PR apple#2010, already implemented the work to get rid of the retain from the `cancellationTask` closure. So we are good to go ahead and store the `cancellationTask` inside `Scheduled` now

### Result:

`scheduleTask` requires two fewer allocations
FranzBusch added a commit to FranzBusch/swift-nio that referenced this pull request Dec 15, 2021
### Motivation:

In my previous PR apple#2010, I was able to decrease the allocations for both `scheduleTask` and `execute` by 1 already. Gladly, there are no more allocations left to remove from `execute` now; however, `scheduleTask` still provides a couple of allocations that we can try to get rid of.

### Modifications:

This PR removes two allocations inside `Scheduled` where we were using the passed in `EventLoopPromise` to call the `cancellationTask` once the `EventLoopFuture` of the promise fails. This requires two allocations inside `whenFailure` and inside `_whenComplete`. However, since we are passing the `cancellationTask` to `Scheduled` anyhow and `Scheduled` is also the one that is failing the promise from the `cancel()` method. We can just go ahead and store the `cancellationTask` inside `Scheduled` and call it from the `cancel()` method directly instead of going through the future.

Importantly, here is that the `cancellationTask` is not allowed to retain the `ScheduledTask.task` otherwise we would change the semantics and retain the `ScheduledTask.task` longer than necessary. My previous PR apple#2010, already implemented the work to get rid of the retain from the `cancellationTask` closure. So we are good to go ahead and store the `cancellationTask` inside `Scheduled` now

### Result:

`scheduleTask` requires two fewer allocations
FranzBusch added a commit to FranzBusch/swift-nio that referenced this pull request Dec 15, 2021
### Motivation:

In my previous PR apple#2010, I was able to decrease the allocations for both `scheduleTask` and `execute` by 1 already. Gladly, there are no more allocations left to remove from `execute` now; however, `scheduleTask` still provides a couple of allocations that we can try to get rid of.

### Modifications:

This PR removes two allocations inside `Scheduled` where we were using the passed in `EventLoopPromise` to call the `cancellationTask` once the `EventLoopFuture` of the promise fails. This requires two allocations inside `whenFailure` and inside `_whenComplete`. However, since we are passing the `cancellationTask` to `Scheduled` anyhow and `Scheduled` is also the one that is failing the promise from the `cancel()` method. We can just go ahead and store the `cancellationTask` inside `Scheduled` and call it from the `cancel()` method directly instead of going through the future.

Importantly, here is that the `cancellationTask` is not allowed to retain the `ScheduledTask.task` otherwise we would change the semantics and retain the `ScheduledTask.task` longer than necessary. My previous PR apple#2010, already implemented the work to get rid of the retain from the `cancellationTask` closure. So we are good to go ahead and store the `cancellationTask` inside `Scheduled` now

### Result:

`scheduleTask` requires two fewer allocations
FranzBusch added a commit to FranzBusch/swift-nio that referenced this pull request Dec 15, 2021
### Motivation:

In my previous PR apple#2010, I was able to decrease the allocations for both `scheduleTask` and `execute` by 1 already. Gladly, there are no more allocations left to remove from `execute` now; however, `scheduleTask` still provides a couple of allocations that we can try to get rid of.

### Modifications:

This PR removes two allocations inside `Scheduled` where we were using the passed in `EventLoopPromise` to call the `cancellationTask` once the `EventLoopFuture` of the promise fails. This requires two allocations inside `whenFailure` and inside `_whenComplete`. However, since we are passing the `cancellationTask` to `Scheduled` anyhow and `Scheduled` is also the one that is failing the promise from the `cancel()` method. We can just go ahead and store the `cancellationTask` inside `Scheduled` and call it from the `cancel()` method directly instead of going through the future.

Importantly, here is that the `cancellationTask` is not allowed to retain the `ScheduledTask.task` otherwise we would change the semantics and retain the `ScheduledTask.task` longer than necessary. My previous PR apple#2010, already implemented the work to get rid of the retain from the `cancellationTask` closure. So we are good to go ahead and store the `cancellationTask` inside `Scheduled` now

### Result:

`scheduleTask` requires two fewer allocations
FranzBusch added a commit to FranzBusch/swift-nio that referenced this pull request Dec 15, 2021
### Motivation:

In my previous PR apple#2010, I was able to decrease the allocations for both `scheduleTask` and `execute` by 1 already. Gladly, there are no more allocations left to remove from `execute` now; however, `scheduleTask` still provides a couple of allocations that we can try to get rid of.

### Modifications:

This PR removes two allocations inside `Scheduled` where we were using the passed in `EventLoopPromise` to call the `cancellationTask` once the `EventLoopFuture` of the promise fails. This requires two allocations inside `whenFailure` and inside `_whenComplete`. However, since we are passing the `cancellationTask` to `Scheduled` anyhow and `Scheduled` is also the one that is failing the promise from the `cancel()` method. We can just go ahead and store the `cancellationTask` inside `Scheduled` and call it from the `cancel()` method directly instead of going through the future.

Importantly, here is that the `cancellationTask` is not allowed to retain the `ScheduledTask.task` otherwise we would change the semantics and retain the `ScheduledTask.task` longer than necessary. My previous PR apple#2010, already implemented the work to get rid of the retain from the `cancellationTask` closure. So we are good to go ahead and store the `cancellationTask` inside `Scheduled` now

### Result:

`scheduleTask` requires two fewer allocations
FranzBusch added a commit to FranzBusch/swift-nio that referenced this pull request Dec 15, 2021
### Motivation:

In my previous PR apple#2010, I was able to decrease the allocations for both `scheduleTask` and `execute` by 1 already. Gladly, there are no more allocations left to remove from `execute` now; however, `scheduleTask` still provides a couple of allocations that we can try to get rid of.

### Modifications:

This PR removes two allocations inside `Scheduled` where we were using the passed in `EventLoopPromise` to call the `cancellationTask` once the `EventLoopFuture` of the promise fails. This requires two allocations inside `whenFailure` and inside `_whenComplete`. However, since we are passing the `cancellationTask` to `Scheduled` anyhow and `Scheduled` is also the one that is failing the promise from the `cancel()` method. We can just go ahead and store the `cancellationTask` inside `Scheduled` and call it from the `cancel()` method directly instead of going through the future.

Importantly, here is that the `cancellationTask` is not allowed to retain the `ScheduledTask.task` otherwise we would change the semantics and retain the `ScheduledTask.task` longer than necessary. My previous PR apple#2010, already implemented the work to get rid of the retain from the `cancellationTask` closure. So we are good to go ahead and store the `cancellationTask` inside `Scheduled` now

### Result:

`scheduleTask` requires two fewer allocations
FranzBusch added a commit that referenced this pull request Dec 15, 2021
### Motivation:

In my previous PR #2010, I was able to decrease the allocations for both `scheduleTask` and `execute` by 1 already. Gladly, there are no more allocations left to remove from `execute` now; however, `scheduleTask` still provides a couple of allocations that we can try to get rid of.

### Modifications:

This PR removes two allocations inside `Scheduled` where we were using the passed in `EventLoopPromise` to call the `cancellationTask` once the `EventLoopFuture` of the promise fails. This requires two allocations inside `whenFailure` and inside `_whenComplete`. However, since we are passing the `cancellationTask` to `Scheduled` anyhow and `Scheduled` is also the one that is failing the promise from the `cancel()` method. We can just go ahead and store the `cancellationTask` inside `Scheduled` and call it from the `cancel()` method directly instead of going through the future.

Importantly, here is that the `cancellationTask` is not allowed to retain the `ScheduledTask.task` otherwise we would change the semantics and retain the `ScheduledTask.task` longer than necessary. My previous PR #2010, already implemented the work to get rid of the retain from the `cancellationTask` closure. So we are good to go ahead and store the `cancellationTask` inside `Scheduled` now

### Result:

`scheduleTask` requires two fewer allocations
FranzBusch added a commit to FranzBusch/swift-nio that referenced this pull request Dec 27, 2021
### Motivation:

When scheduling tasks with the same deadline the current order of execution is undefined. apple#1541

### Modifications:

In my previous PR apple#2010, I added an internal id to every `ScheduledTask` to give them an identity for cancellation purposes. In this PR, I am now using the same id to also ensure that the execution of tasks with the same deadline is the same as the order they were scheduled in.

### Result:

`ScheduledTask`s are now executed in their scheduled order when they have the same deadline.
FranzBusch added a commit to FranzBusch/swift-nio that referenced this pull request Dec 27, 2021
### Motivation:

When scheduling tasks with the same deadline the current order of execution is undefined. Fixes apple#1541

### Modifications:

In my previous PR apple#2010, I added an internal id to every `ScheduledTask` to give them an identity for cancellation purposes. In this PR, I am now using the same id to also ensure that the execution of tasks with the same deadline is the same as the order they were scheduled in.

### Result:

`ScheduledTask`s are now executed in their scheduled order when they have the same deadline.
FranzBusch added a commit to FranzBusch/swift-nio that referenced this pull request Dec 27, 2021
### Motivation:

When scheduling tasks with the same deadline the current order of execution is undefined. Fixes apple#1541

### Modifications:

In my previous PR apple#2010, I added an internal id to every `ScheduledTask` to give them an identity for cancellation purposes. In this PR, I am now using the same id to also ensure that the execution of tasks with the same deadline is the same as the order they were scheduled in.

### Result:

`ScheduledTask`s are now executed in their scheduled order when they have the same deadline.
FranzBusch added a commit to FranzBusch/swift-nio that referenced this pull request Dec 27, 2021
### Motivation:

When scheduling tasks with the same deadline the current order of execution is undefined. Fixes apple#1541

### Modifications:

In my previous PR apple#2010, I added an internal id to every `ScheduledTask` to give them an identity for cancellation purposes. In this PR, I am now using the same id to also ensure that the execution of tasks with the same deadline is the same as the order they were scheduled in.

### Result:

`ScheduledTask`s are now executed in their scheduled order when they have the same deadline.
FranzBusch added a commit to FranzBusch/swift-nio that referenced this pull request Jan 4, 2022
### Motivation:

When scheduling tasks with the same deadline the current order of execution is undefined. Fixes apple#1541

### Modifications:

In my previous PR apple#2010, I added an internal id to every `ScheduledTask` to give them an identity for cancellation purposes. In this PR, I am now using the same id to also ensure that the execution of tasks with the same deadline is the same as the order they were scheduled in.

### Result:

`ScheduledTask`s are now executed in their scheduled order when they have the same deadline.
FranzBusch added a commit to FranzBusch/swift-nio that referenced this pull request Jan 4, 2022
### Motivation:

When scheduling tasks with the same deadline the current order of execution is undefined. Fixes apple#1541

### Modifications:

In my previous PR apple#2010, I added an internal id to every `ScheduledTask` to give them an identity for cancellation purposes. In this PR, I am now using the same id to also ensure that the execution of tasks with the same deadline is the same as the order they were scheduled in.

### Result:

`ScheduledTask`s are now executed in their scheduled order when they have the same deadline.
FranzBusch added a commit that referenced this pull request Jan 4, 2022
### Motivation:

When scheduling tasks with the same deadline the current order of execution is undefined. Fixes #1541

### Modifications:

In my previous PR #2010, I added an internal id to every `ScheduledTask` to give them an identity for cancellation purposes. In this PR, I am now using the same id to also ensure that the execution of tasks with the same deadline is the same as the order they were scheduled in.

### Result:

`ScheduledTask`s are now executed in their scheduled order when they have the same deadline.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Improvements to performance. semver/patch No public API change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants