Adding evm_chain_id, address and event_sig to data and topic indexes #12786

mateusz-sekara · 2024-04-11T12:17:54Z

Context

CCIP uses data_word_five for storing sequence numbers which are integers, unique per every OnRamp for fetching CCIPSendRequested (there is a two OnRamps per CCIP lane)

event CCIPSendRequested(Internal.EVM2EVMMessage message);
struct EVM2EVMMessage {
    uint64 sourceChainSelector; 
    address sender;
    address receiver; 
    uint64 sequenceNumber;
    ...
}

We rely on that heavily, because messages are fetched every OCR2 round during Execution Phase with the following query:

SELECT * FROM evm.logs 
			WHERE evm_chain_id = 1337
			AND address = :address 
			AND event_sig = :event_sig
			AND substring(data from 32*5+1 for 32) >= min
			AND substring(data from 32*5+1 for 32) <= max
			AND block_number <= %s
			ORDER BY (block_number, log_index)

We have indexes for topics and data using a single column only, examples

create index evm_logs_idx_data_word_three
    on evm.logs ("substring"(data, 65, 32));
create index evm_logs_idx_topic_two
    on evm.logs ((topics[2]));

Postgres uses the evm_logs_idx_topic_five index to filter only by min and max and then goes through the entire result set to filter out not matching records. Unfortunately, indexes for topics and data words are not efficient, when there are a lot of duplicated values for data_word or topic within evm.logs table. It works well for a small number of chains and lanes. However, as lanes and chains grow in time, LogPoller’s performance is vastly degraded, because it matches too many records within that index.

CCIP example: we use that index for fetching matching onramp messages for the Commit Root that is being executed. CommitRoot has at most 256 leaves (containing up to 256 messages).

2 chains and 2 lanes return 512 tuples from the index. Postgres fetches that dataset from the index and has to scan them and filter by additional filters present in the query (address, eventSig, evm_chain_id) to finally return 256 rows. Whereas it’s not a problem for a small number of lanes it degrades over time when more lanes and chains are added

40 chains, 400 lanes, Commit Root fully packed (256 messages) returns 200k records that have to be scanned sequentially only to return 256 records. We did test that on 30 chains / 250 lanes, around 15 messages per Commit Root and it caused major index degradation over time. Including evm_chain_id, addr, event_sig to index makes it almost O(1). Please see the execution plans below to better understand that problem.

This degradation is visible in growing number of tuples / selects. That being said, the more logs emitted, the heavier query becomes

Solution

Use compound indexes for topics and data_words. All LogPoller's queries are filtered by (evm_chain_id, addr, event_sig), so including them in the index helps it identify exact words properly. Without that change, Postgres tries to use a single-column index and fetches an extremely large number of tuples that need to be scanned sequentially.

I've also noticed that data_word indexes are used only by CCIP so I've removed ones that are no longer required for querying logs.

Previous chart but when running with new indexes (mind the zeros!)

Current state - lack of word_five index ~ 261ms

Execution plan

 Sort  (cost=2147.70..2147.71 rows=3 width=406) (actual time=261.701..261.716 rows=97 loops=1)
   Sort Key: (ROW(logs.block_number, logs.log_index))
   Sort Method: quicksort  Memory: 123kB
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..1.21 rows=1 width=16) (actual time=0.061..0.062 rows=1 loops=1)
           ->  Index Scan using idx_evm_log_poller_blocks_order_by_block on log_poller_blocks  (cost=0.43..77036.27 rows=98594 width=16) (actual time=0.060..0.061 rows=1 loops=1)
                 Index Cond: (evm_chain_id = '11155111'::numeric)
   ->  Index Scan using idx_evm_logs_ordered_by_block_and_created_at on logs  (cost=0.56..2146.47 rows=3 width=406) (actual time=0.101..261.614 rows=97 loops=1)
         Index Cond: ((evm_chain_id = '11155111'::numeric) AND (address = '\x81660dc846f0528a7ce003c1f7774d7c4135f344'::bytea) AND (event_sig = '\xd0c3c799bf9e2639de44391e7f524d229b2b55f5b1ea94b2bf7da42f7243dddd'::bytea) AND (block_number <= $0))
         Filter: (("substring"(data, 129, 32) >= '\x0000000000000000000000000000000000000000000000000000000000000001'::bytea) AND ("substring"(data, 129, 32) <= '\x0000000000000000000000000000000000000000000000000000000000000061'::bytea))
         Rows Removed by Filter: 228359
 Planning Time: 0.238 ms
 Execution Time: 261.781 ms
(13 rows)

Adding index on a single column ~ 96ms

CREATE INDEX evm_logs_idx_data_word_five ON evm.logs (substring(data from 129 for 32));

Execution plan

 Sort  (cost=1190.89..1190.89 rows=3 width=406) (actual time=91.086..91.096 rows=97 loops=1)
   Sort Key: (ROW(logs.block_number, logs.log_index))
   Sort Method: quicksort  Memory: 123kB
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..1.21 rows=1 width=16) (actual time=0.068..0.069 rows=1 loops=1)
           ->  Index Scan using idx_evm_log_poller_blocks_order_by_block on log_poller_blocks  (cost=0.43..77036.27 rows=98594 width=16) (actual time=0.067..0.067 rows=1 loops=1)
                 Index Cond: (evm_chain_id = '11155111'::numeric)
   ->  Bitmap Heap Scan on logs  (cost=1177.58..1189.65 rows=3 width=406) (actual time=88.277..90.893 rows=97 loops=1)
         Recheck Cond: ((evm_chain_id = '11155111'::numeric) AND (address = '\x81660dc846f0528a7ce003c1f7774d7c4135f344'::bytea) AND (event_sig = '\xd0c3c799bf9e2639de44391e7f524d229b2b55f5b1ea94b2bf7da42f7243dddd'::bytea) AND (block_number <= $0) AND ("substring"(data, 129, 32) >= '\x0000000000000000000000000000000000000000000000000000000000000001'::bytea) AND ("substring"(data, 129, 32) <= '\x0000000000000000000000000000000000000000000000000000000000000061'::bytea))
         Rows Removed by Index Recheck: 6013
         Heap Blocks: exact=10 lossy=456
         ->  BitmapAnd  (cost=1177.58..1177.58 rows=3 width=0) (actual time=87.654..87.655 rows=0 loops=1)
               ->  Bitmap Index Scan on idx_evm_logs_ordered_by_block_and_created_at  (cost=0.00..53.23 rows=578 width=0) (actual time=82.674..82.674 rows=228456 loops=1)
                     Index Cond: ((evm_chain_id = '11155111'::numeric) AND (address = '\x81660dc846f0528a7ce003c1f7774d7c4135f344'::bytea) AND (event_sig = '\xd0c3c799bf9e2639de44391e7f524d229b2b55f5b1ea94b2bf7da42f7243dddd'::bytea) AND (block_number <= $0))
               ->  Bitmap Index Scan on evm_logs_idx_data_word_five  (cost=0.00..1124.10 rows=73554 width=0) (actual time=0.463..0.463 rows=5566 loops=1)
                     Index Cond: (("substring"(data, 129, 32) >= '\x0000000000000000000000000000000000000000000000000000000000000001'::bytea) AND ("substring"(data, 129, 32) <= '\x0000000000000000000000000000000000000000000000000000000000000061'::bytea))
 Planning Time: 0.414 ms
 Execution Time: 92.380 ms
(18 rows)

Compound index ~ 1ms

create index evm_logs_idx_data_word_five
    on evm.logs (address, event_sig, evm_chain_id, "substring"(data, 129, 32));

Execution Plan

 Sort  (cost=39.24..39.25 rows=3 width=406) (actual time=0.351..0.357 rows=97 loops=1)
   Sort Key: (ROW(logs.block_number, logs.log_index))
   Sort Method: quicksort  Memory: 123kB
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..1.21 rows=1 width=16) (actual time=0.057..0.058 rows=1 loops=1)
           ->  Index Scan using idx_evm_log_poller_blocks_order_by_block on log_poller_blocks  (cost=0.43..77036.27 rows=98594 width=16) (actual time=0.056..0.056 rows=1 loops=1)
                 Index Cond: (evm_chain_id = '11155111'::numeric)
   ->  Index Scan using evm_logs_idx_data_word_five on logs  (cost=0.56..38.01 rows=3 width=406) (actual time=0.111..0.264 rows=97 loops=1)
         Index Cond: ((evm_chain_id = '11155111'::numeric) AND (address = '\x81660dc846f0528a7ce003c1f7774d7c4135f344'::bytea) AND (event_sig = '\xd0c3c799bf9e2639de44391e7f524d229b2b55f5b1ea94b2bf7da42f7243dddd'::bytea) AND ("substring"(data, 129, 32) >= '\x0000000000000000000000000000000000000000000000000000000000000001'::bytea) AND ("substring"(data, 129, 32) <= '\x0000000000000000000000000000000000000000000000000000000000000061'::bytea))
         Filter: (block_number <= $0)
 Planning Time: 0.423 ms
 Execution Time: 0.410 ms

github-actions · 2024-04-11T12:18:12Z

I see you updated files related to core. Please run pnpm changeset in the root directory to add a changeset.

github-actions · 2024-04-12T13:54:00Z

I see you added a changeset file but it does not contain a tag. Please edit the text include at least one of the following tags:

#nops : For any feature that is NOP facing and needs to be in the official Release Notes for the release.
#added : For any new functionality added.
#changed : For any change to the existing functionality. 
#removed : For any functionality/config that is removed.
#updated : For any functionality that is updated.
#deprecation_notice : For any upcoming deprecation functionality.
#breaking_change : For any functionality that requires manual action for the node to boot.
#db_update : For any feature that introduces updates to database schema.
#wip : For any change that is not ready yet and external communication about it should be held off till it is feature complete.

reductionista · 2024-04-12T19:20:07Z

core/store/migrate/migrations/0232_log_poller_word_topic_indexes.sql

+    on evm.logs (evm_chain_id, address, event_sig, (topics[3]));
+
+create index evm_logs_idx_topic_four
+    on evm.logs (evm_chain_id, address, event_sig, (topics[4]));


What happened to data words one, two, and four.?

I've removed them because data word queries are used only by CCIP and we rely only on 3 and 5

If you feel it's too risky we can consider bringing them back. I just wanted to reduce number of redundant indexes from that table

reductionista · 2024-04-12T19:53:26Z

Good idea! Makes sense.

I don't think we should remove the indices for data words one, two, and four though. Even in CCIP, I see a place where it presently uses data word four:
https://github.com/smartcontractkit/ccip/blob/ccip-develop/core/services/ocr2/plugins/ccip/internal/ccipdata/v1_2_0/commit_store.go#L286-L293

But we want LogPoller to be a general core service, not something where we have to add a new index whenever a product needs it. I guess only indexing the first five is arbitrary, but at least it's easier to document than "we only support indexing on words three and five".

We're looking at ways to make this more dynamic in ChainReader, so that different products can specify which indices should be accelerated in the job spec. But we probably won't bother making that change for evm unless we have to... it will mostly just apply to non-evm chains where we expect indexing to be more complex.

mateusz-sekara · 2024-04-12T20:05:42Z

Good idea! Makes sense.

I don't think we should remove the indices for data words one, two, and four though. Even in CCIP, I see a place where it presently uses data word four: https://github.com/smartcontractkit/ccip/blob/ccip-develop/core/services/ocr2/plugins/ccip/internal/ccipdata/v1_2_0/commit_store.go#L286-L293

But we want LogPoller to be a general core service, not something where we have to add a new index whenever a product needs it. I guess only indexing the first five is arbitrary, but at least it's easier to document than "we only support indexing on words three and five".

We're looking at ways to make this more dynamic in ChainReader, so that different products can specify which indices should be accelerated in the job spec. But we probably won't bother making that change for evm unless we have to... it will mostly just apply to non-evm chains where we expect indexing to be more complex.

Totally agree. I missed those other indexes because they are probably not heavily used and I relied on db stats when working on that PR. Let's use 1-5 as you suggested. In the long term, it would be great to define indexes on a product level, as you mentioned. This way every product will have a set of indexes that match its queries

core/store/migrate/migrations/0232_log_poller_word_topic_indexes.sql

cl-sonarqube-production · 2024-04-16T11:13:53Z

Quality Gate failed

Failed conditions
8.89% Technical Debt Ratio on New Code (required ≤ 4%)
B Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

* Adding evm_chain_id, address and event_sig to data and topic indexes smartcontractkit/chainlink#12786 * Improved fetching Commit Reports from database #726

mateusz-sekara temporarily deployed to sdlc April 11, 2024 12:18 — with GitHub Actions Inactive

mateusz-sekara force-pushed the lp-smarter-indexes branch from 3def1d5 to 271f7a2 Compare April 11, 2024 12:22

mateusz-sekara temporarily deployed to sdlc April 11, 2024 12:22 — with GitHub Actions Inactive

mateusz-sekara temporarily deployed to sdlc April 12, 2024 13:42 — with GitHub Actions Inactive

mateusz-sekara marked this pull request as ready for review April 12, 2024 13:52

mateusz-sekara requested a review from a team as a code owner April 12, 2024 13:52

mateusz-sekara requested a review from reductionista April 12, 2024 13:53

mateusz-sekara temporarily deployed to sdlc April 12, 2024 13:53 — with GitHub Actions Inactive

Adding evm_chain_id, address and event_sig to data and topic indexes

9f60f80

mateusz-sekara force-pushed the lp-smarter-indexes branch from 424461b to 9f60f80 Compare April 12, 2024 14:16

mateusz-sekara temporarily deployed to sdlc April 12, 2024 14:16 — with GitHub Actions Inactive

reductionista reviewed Apr 12, 2024

View reviewed changes

Merge branch 'develop' into lp-smarter-indexes

cad644c

reductionista temporarily deployed to sdlc April 12, 2024 19:55 — with GitHub Actions Inactive

Post review fixes

80efcf1

mateusz-sekara temporarily deployed to sdlc April 12, 2024 20:07 — with GitHub Actions Inactive

mateusz-sekara requested a review from reductionista April 12, 2024 20:08

reductionista reviewed Apr 12, 2024

View reviewed changes

core/store/migrate/migrations/0232_log_poller_word_topic_indexes.sql Outdated Show resolved Hide resolved

reductionista reviewed Apr 12, 2024

View reviewed changes

core/store/migrate/migrations/0232_log_poller_word_topic_indexes.sql Outdated Show resolved Hide resolved

Post review fixes

fca3e84

mateusz-sekara temporarily deployed to sdlc April 15, 2024 10:19 — with GitHub Actions Inactive

Post review fixes

d325cd1

mateusz-sekara temporarily deployed to sdlc April 15, 2024 10:29 — with GitHub Actions Inactive

Merge branch 'develop' into lp-smarter-indexes

17a56f9

mateusz-sekara temporarily deployed to sdlc April 15, 2024 10:30 — with GitHub Actions Inactive

mateusz-sekara requested a review from reductionista April 15, 2024 11:06

mateusz-sekara requested a review from reductionista April 16, 2024 11:13

makramkd approved these changes Apr 16, 2024

View reviewed changes

mateusz-sekara added this pull request to the merge queue Apr 16, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 16, 2024

mateusz-sekara added this pull request to the merge queue Apr 16, 2024

Merged via the queue into develop with commit fbb705c Apr 16, 2024
104 of 105 checks passed

mateusz-sekara deleted the lp-smarter-indexes branch April 16, 2024 15:02

This was referenced Apr 16, 2024

[DO NOT MERGE] Release Preview - Changeset #12843

Closed

[DO NOT MERGE] Release Preview - Changeset #12850

Closed

[DO NOT MERGE] Release Preview - Changeset #12864

Closed

mateusz-sekara mentioned this pull request Apr 23, 2024

Cherry-picking performance fixes to the release branch smartcontractkit/ccip#744

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding evm_chain_id, address and event_sig to data and topic indexes #12786

Adding evm_chain_id, address and event_sig to data and topic indexes #12786

mateusz-sekara commented Apr 11, 2024 •

edited

Loading

github-actions bot commented Apr 11, 2024

github-actions bot commented Apr 12, 2024

reductionista Apr 12, 2024

mateusz-sekara Apr 12, 2024

mateusz-sekara Apr 12, 2024

reductionista commented Apr 12, 2024

mateusz-sekara commented Apr 12, 2024 •

edited

Loading

cl-sonarqube-production bot commented Apr 16, 2024

Adding evm_chain_id, address and event_sig to data and topic indexes #12786

Adding evm_chain_id, address and event_sig to data and topic indexes #12786

Conversation

mateusz-sekara commented Apr 11, 2024 • edited Loading

Context

Solution

Current state - lack of word_five index ~ 261ms

Adding index on a single column ~ 96ms

Compound index ~ 1ms

github-actions bot commented Apr 11, 2024

github-actions bot commented Apr 12, 2024

reductionista Apr 12, 2024

Choose a reason for hiding this comment

mateusz-sekara Apr 12, 2024

Choose a reason for hiding this comment

mateusz-sekara Apr 12, 2024

Choose a reason for hiding this comment

reductionista commented Apr 12, 2024

mateusz-sekara commented Apr 12, 2024 • edited Loading

cl-sonarqube-production bot commented Apr 16, 2024

Quality Gate failed

mateusz-sekara commented Apr 11, 2024 •

edited

Loading

mateusz-sekara commented Apr 12, 2024 •

edited

Loading