fix: and optimising fetching membership events #706

harsh-98 · 2023-09-04T09:05:49Z

Description

Bug and other optimisation found for membership fetching logic while checking #701.

Changes

errCh can be closed multiple times. It will result in error.
Some blocks might be missed btw loadOldEvents and watchNewEvents.
Use binary split of work for gettting events.

Tests

status-im-auto · 2023-09-04T09:07:29Z

Jenkins Builds

Click to see older builds (54)

❔	Commit	#️⃣	Finished (UTC)	Duration	Platform	Result
✔️	`e2a4f85`	#1	2023-09-04 09:07:29	~1 min	`linux`	📦`deb`
✔️	`e2a4f85`	#1	2023-09-04 09:08:15	~2 min	`nix-flake`	📄`log`
✔️	`e2a4f85`	#1	2023-09-04 09:09:09	~3 min	`tests`	📄`log`
✔️	`e2a4f85`	#1	2023-09-04 09:09:13	~3 min	`tests`	📄`log`
✔️	`e2a4f85`	#1	2023-09-04 09:09:58	~3 min	`android`	📦`tgz`
✔️	`e2a4f85`	#1	2023-09-04 09:10:29	~4 min	`ios`	📦`tgz`

✔️	`f2ba54a`	#2	2023-09-04 10:37:38	~1 min	`linux`	📦`deb`
✔️	`f2ba54a`	#2	2023-09-04 10:38:19	~1 min	`nix-flake`	📄`log`
✔️	`f2ba54a`	#2	2023-09-04 10:39:18	~2 min	`tests`	📄`log`
✔️	`f2ba54a`	#2	2023-09-04 10:39:30	~3 min	`tests`	📄`log`
✔️	`f2ba54a`	#2	2023-09-04 10:39:35	~3 min	`android`	📦`tgz`
✔️	`f2ba54a`	#2	2023-09-04 10:40:12	~3 min	`ios`	📦`tgz`

✖️	`bbf38f2`	#3	2023-09-06 13:56:18	~21 sec	`tests`	📄`log`
✔️	`bbf38f2`	#3	2023-09-06 13:57:22	~1 min	`linux`	📦`deb`
✔️	`bbf38f2`	#3	2023-09-06 13:57:48	~1 min	`nix-flake`	📄`log`
✖️	`bbf38f2`	#3	2023-09-06 13:58:46	~2 min	`tests`	📄`log`
✔️	`bbf38f2`	#3	2023-09-06 13:59:35	~3 min	`android`	📦`tgz`
✔️	`bbf38f2`	#3	2023-09-06 14:00:16	~4 min	`ios`	📦`tgz`

✔️	`9ad1355`	#4	2023-09-06 15:25:12	~1 min	`linux`	📦`deb`
✖️	`9ad1355`	#4	2023-09-06 15:25:21	~1 min	`tests`	📄`log`
✖️	`9ad1355`	#4	2023-09-06 15:25:35	~1 min	`tests`	📄`log`
✔️	`9ad1355`	#4	2023-09-06 15:25:52	~1 min	`nix-flake`	📄`log`
✔️	`9ad1355`	#4	2023-09-06 15:27:10	~3 min	`android`	📦`tgz`
✔️	`9ad1355`	#4	2023-09-06 15:28:00	~4 min	`ios`	📦`tgz`

✖️	`b5ff218`	#5	2023-09-06 15:47:24	~21 sec	`tests`	📄`log`
✖️	`b5ff218`	#5	2023-09-06 15:47:27	~24 sec	`tests`	📄`log`
✔️	`b5ff218`	#5	2023-09-06 15:47:39	~36 sec	`linux`	📦`deb`
✔️	`b5ff218`	#5	2023-09-06 15:48:57	~1 min	`nix-flake`	📄`log`
✔️	`b5ff218`	#5	2023-09-06 15:50:23	~3 min	`android`	📦`tgz`
✔️	`b5ff218`	#5	2023-09-06 15:50:52	~3 min	`ios`	📦`tgz`

✖️	`80345a4`	#6	2023-09-06 15:49:56	~22 sec	`tests`	📄`log`
✖️	`80345a4`	#6	2023-09-06 15:49:56	~22 sec	`tests`	📄`log`
✔️	`80345a4`	#6	2023-09-06 15:50:10	~36 sec	`linux`	📦`deb`
✔️	`80345a4`	#6	2023-09-06 15:51:28	~1 min	`nix-flake`	📄`log`
✔️	`80345a4`	#6	2023-09-06 15:53:24	~2 min	`android`	📦`tgz`
✔️	`80345a4`	#6	2023-09-06 15:53:38	~4 min	`ios`	📦`tgz`

✔️	`0e0af05`	#7	2023-09-06 15:58:44	~34 sec	`linux`	📦`deb`
✖️	`0e0af05`	#7	2023-09-06 15:59:15	~1 min	`tests`	📄`log`
✖️	`0e0af05`	#7	2023-09-06 15:59:45	~1 min	`tests`	📄`log`
✔️	`0e0af05`	#7	2023-09-06 16:00:04	~1 min	`nix-flake`	📄`log`
✔️	`0e0af05`	#7	2023-09-06 16:01:24	~3 min	`android`	📦`tgz`
✔️	`0e0af05`	#7	2023-09-06 16:01:27	~3 min	`ios`	📦`tgz`

✔️	`81c9919`	#8	2023-09-06 16:41:06	~31 sec	`linux`	📦`deb`
✔️	`81c9919`	#8	2023-09-06 16:42:32	~1 min	`nix-flake`	📄`log`
✔️	`81c9919`	#8	2023-09-06 16:43:50	~3 min	`android`	📦`tgz`
✔️	`81c9919`	#8	2023-09-06 16:45:54	~5 min	`ios`	📦`tgz`
✖️	`81c9919`	#8	2023-09-06 16:46:16	~5 min	`tests`	📄`log`
✖️	`81c9919`	#8	2023-09-06 16:46:16	~5 min	`tests`	📄`log`

✔️	`e72018d`	#9	2023-09-06 17:07:06	~30 sec	`linux`	📦`deb`
✔️	`e72018d`	#9	2023-09-06 17:07:33	~54 sec	`tests`	📄`log`
✔️	`e72018d`	#9	2023-09-06 17:08:23	~1 min	`tests`	📄`log`
✔️	`e72018d`	#9	2023-09-06 17:08:30	~1 min	`nix-flake`	📄`log`
✔️	`e72018d`	#9	2023-09-06 17:09:55	~3 min	`android`	📦`tgz`
✔️	`e72018d`	#9	2023-09-06 17:11:25	~4 min	`ios`	📦`tgz`

❔	Commit	#️⃣	Finished (UTC)	Duration	Platform	Result
✔️	`8ca4d4e`	#10	2023-09-07 07:13:20	~36 sec	`linux`	📦`deb`
✔️	`8ca4d4e`	#10	2023-09-07 07:13:46	~59 sec	`tests`	📄`log`
✔️	`8ca4d4e`	#10	2023-09-07 07:14:36	~1 min	`tests`	📄`log`
✔️	`8ca4d4e`	#10	2023-09-07 07:16:37	~3 min	`nix-flake`	📄`log`
✔️	`8ca4d4e`	#10	2023-09-07 07:17:38	~4 min	`ios`	📦`tgz`
✔️	`8ca4d4e`	#10	2023-09-07 07:18:06	~5 min	`android`	📦`tgz`

✔️	`d00f053`	#11	2023-09-07 16:14:18	~3 min	`linux`	📦`deb`
✔️	`d00f053`	#11	2023-09-07 16:14:29	~3 min	`nix-flake`	📄`log`
✔️	`d00f053`	#11	2023-09-07 16:15:27	~4 min	`tests`	📄`log`
✔️	`d00f053`	#11	2023-09-07 16:16:06	~5 min	`tests`	📄`log`
✔️	`d00f053`	#11	2023-09-07 16:16:16	~5 min	`android`	📦`tgz`
✔️	`d00f053`	#12	2023-09-07 16:23:35	~4 min	`ios`	📦`tgz`

richard-ramos · 2023-09-04T15:21:06Z

I like the changes related to:

- errCh can be closed multiple times. It will result in error.
- Some blocks might be missed btw loadOldEvents and watchNewEvents.

However I think the current mechanism for loading events (using chunks) is more appropiate compared to using the proposed binary split, which was one of the initial approaches I had considered for loading the events.

Following this approach, while it works for Infura because it uses an offchain index for event logs, it might overwhelm a node you're running locally. The approach we're currently follow in go-waku is the same as eth2 nodes like prysm in which we fetch block ranges in chunks of 5000 blocks (prysm goes even lower at 1000 blocks per chunk), with some special handling for cases where we exceed the number of events, which is using the multiplicativeDecreaseDivisor to reduce the range to a smaller number of blocks, and then slowly increase the range using the additiveFactorMultiplier, until we get again back to the max number of blocks (5000).

While I agree that using a binary split might be faster, it causes more load over a local node, and also, syncing the chain in 5K blocks is still a fast operation that takes only some 10s of seconds and it's only executed once at the beginning, and since we already store the merkle tree in the RLN database,

harsh-98 · 2023-09-04T15:57:28Z

I like the changes related to:
- errCh can be closed multiple times. It will result in error.

- Some blocks might be missed btw loadOldEvents and watchNewEvents.
However I think the current mechanism for loading events (using chunks) is more appropiate compared to using the proposed binary split, which was one of the initial approaches I had considered for loading the events.

Following this approach, while it works for Infura because it uses an offchain index for event logs, it might overwhelm a node you're running locally. The approach we're currently follow in go-waku is the same as eth2 nodes like prysm in which we fetch block ranges in chunks of 5000 blocks (prysm goes even lower at 1000 blocks per chunk), with some special handling for cases where we exceed the number of events, which is using the multiplicativeDecreaseDivisor to reduce the range to a smaller number of blocks, and then slowly increase the range using the additiveFactorMultiplier, until we get again back to the max number of blocks (5000).

While I agree that using a binary split might be faster, it causes more load over a local node, and also, syncing the chain in 5K blocks is still a fast operation that takes only some 10s of seconds and it's only executed once at the beginning, and since we already store the merkle tree in the RLN database,

In this PR , we are also syncing in batches of 5k within the loadOldevents. it is just that if the getEvents call for 5k blocks fails with over-the-limit error, We split the work in 2 jobs of half range in getEvents.

richard-ramos · 2023-09-04T16:20:17Z

Yes but do notice that subsequent requests will still try to load the 5000 requests and split it into separate jobs which on a period of heavy usage of the smart contract will mean that the tooMuchData error should appear frequently for some range periods while trying to load the events, while current approach will not use 5000, but the last valid chunk size + 10% in an attempt to slowly get back to to the original 5000.

Do you think the additive factor + chunk size could be kept with the approach you propose?

harsh-98 · 2023-09-04T17:11:58Z

Yes but do notice that subsequent requests will still try to load the 5000 requests and split it into separate jobs which on a period of heavy usage of the smart contract will mean that the tooMuchData error should appear frequently for some range periods while trying to load the events, while current approach will not use 5000, but the last valid chunk size + 10% in an attempt to slowly get back to to the original 5000.

Do you think the additive factor + chunk size could be kept with the approach you propose?

The current approach already divides in half by multiplicativedivisor in case of error so including additivefactor for increasing batch size will result in same sort of code. So i can revert that change.

But i also think that this error wont be frequent and will only happen during the start of the node. Plus all the waku node will probably have connection to different eth rpc. Thats why we can try with same batchsize instead of setting batchsize to decreased value and increasing over time.

Again this is subjective, i am comfortable with both the approach. let me know whats your verdict after this discussion.

richard-ramos · 2023-09-05T11:14:22Z

I see. Not sure either what decision to take, I tend to prefer the original approach but I agree with the argument you present!
let's ask @chaitanyaprem for feedback!

chaitanyaprem · 2023-09-05T13:33:26Z

I see. Not sure either what decision to take, I tend to prefer the original approach but I agree with the argument you present! let's ask @chaitanyaprem for feedback!

Had a quick discussion with @harsh-98 regarding both the approaches.
Wrt the approach of halving only in case of error and reverting back to default blockFetchSize(5000) seems simple to me and would make the code simple and more maintainable and less error-prone.

I would suggest, we go ahead with the approach suggested by @harsh-98.

waku/v2/protocol/rln/group_manager/dynamic/web3.go

harsh-98 · 2023-09-06T15:26:02Z

@richard-ramos @chaitanyaprem This PR has become quite big as i had to rebase on master to account for web3Config. Bcz of that i had to refactor a bit. Also can you guys review this before there are more merge conflicts.

New changes are from test: fetching membership logic commit.
rlnInstance, rootTrack were previously created while creating rlnRelay
but were assigned to groupManager on Start of rlnRelay. This created
unncessary dependency of passing them to static and dynamic group
manager.
Web3Config uses interface EthClientI for client, so that we can pass
mock client for testing MembershipFetcher.
test for checking fetching membershipRegister events.

richard-ramos

Left some observations.
Sorry for the rebase issues! but having said that, these will keep happening as the RLN feature is under heavy development. Even tho i'm one who always breaks this rule, Let's try to keep the PRs small!

waku/v2/protocol/rln/waku_rln_relay.go

waku/v2/protocol/rln/web3/web3.go

waku/v2/protocol/rln/group_manager/group_manager.go

waku/v2/protocol/rln/group_manager/dynamic/membership_fetcher_test.go

waku/v2/protocol/rln/onchain_test.go

waku/v2/protocol/rln/group_manager/group_manager.go

harsh-98 · 2023-09-06T17:14:59Z

Commenting so that i remember , how to test a PR local.

go vet ./...
go test -timeout 300s ./waku/... -coverprofile=./c.out.tmp
go test -race -timeout 30s ./... -v
go  test -v -count 1 -tags="gowaku_rln include_onchain_tests" github.com/waku-org/go-waku/waku/v2/protocol/rln
golangci-lint --exclude=SA1019 run ./... --deadline=5m

waku/v2/protocol/rln/group_manager/dynamic/web3.go

waku/v2/protocol/rln/group_manager/dynamic/membership_fetcher_test.go

chaitanyaprem

Do address comments.
Let's avoid such refactor's combining with fixes and other changes, it makes review very hard.
LGTM

Do wait for ok from @richard-ramos as he is actively working on this.

waku/v2/protocol/rln/group_manager/dynamic/membership_fetcher.go

waku/v2/protocol/rln/group_manager/dynamic/membership_fetcher.json

waku/v2/protocol/rln/group_manager/dynamic/membership_fetcher_test.go

waku/v2/protocol/rln/group_manager/dynamic/mock_blockchain.go

rlnInstance, rootTrack were previously created while creating rlnRelay but were assigned to groupManager on Start of rlnRelay. This created unncessary dependency of passing them to static and dynamic group manager. Web3Config uses interface EthClientI for client, so that we can pass mock client for testing MembershipFetcher.

harsh-98 requested review from richard-ramos and chaitanyaprem September 4, 2023 09:20

richard-ramos approved these changes Sep 5, 2023

View reviewed changes

chaitanyaprem reviewed Sep 6, 2023

View reviewed changes

waku/v2/protocol/rln/group_manager/dynamic/web3.go Outdated Show resolved Hide resolved

waku/v2/protocol/rln/group_manager/dynamic/web3.go Outdated Show resolved Hide resolved

harsh-98 force-pushed the rlnFetcher branch from bbf38f2 to 9ad1355 Compare September 6, 2023 15:23

harsh-98 force-pushed the rlnFetcher branch from b5ff218 to 80345a4 Compare September 6, 2023 15:49

richard-ramos requested changes Sep 6, 2023

View reviewed changes

harsh-98 commented Sep 7, 2023

View reviewed changes

chaitanyaprem approved these changes Sep 7, 2023

View reviewed changes

richard-ramos approved these changes Sep 7, 2023

View reviewed changes

harsh-98 added 10 commits September 7, 2023 23:09

fix: and optimising fetching membership events

61408d4

fix: start from lastProcessedBlock+1

bfddd77

test: fetching membership logic

5c5067e

fix: failing test

fc8c127

fix: lint error

e90f8f7

fix: account for PR suggestions

07a1103

fix: failing race test

1e4264b

fix: dont' increase fromBlock on error

8c7d695

nit: fix naming and add comments

d00f053

harsh-98 force-pushed the rlnFetcher branch from 8ca4d4e to d00f053 Compare September 7, 2023 16:10

harsh-98 merged commit 08cabab into master Sep 7, 2023
10 of 11 checks passed

harsh-98 deleted the rlnFetcher branch September 7, 2023 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: and optimising fetching membership events #706

fix: and optimising fetching membership events #706

harsh-98 commented Sep 4, 2023 •

edited

Loading

status-im-auto commented Sep 4, 2023 •

edited

Loading

richard-ramos commented Sep 4, 2023

harsh-98 commented Sep 4, 2023 •

edited

Loading

richard-ramos commented Sep 4, 2023

harsh-98 commented Sep 4, 2023 •

edited

Loading

richard-ramos commented Sep 5, 2023

chaitanyaprem commented Sep 5, 2023 •

edited

Loading

harsh-98 commented Sep 6, 2023 •

edited

Loading

richard-ramos left a comment •

edited

Loading

harsh-98 commented Sep 6, 2023

chaitanyaprem left a comment

fix: and optimising fetching membership events #706

fix: and optimising fetching membership events #706

Conversation

harsh-98 commented Sep 4, 2023 • edited Loading

Description

Changes

Tests

status-im-auto commented Sep 4, 2023 • edited Loading

Jenkins Builds

richard-ramos commented Sep 4, 2023

harsh-98 commented Sep 4, 2023 • edited Loading

richard-ramos commented Sep 4, 2023

harsh-98 commented Sep 4, 2023 • edited Loading

richard-ramos commented Sep 5, 2023

chaitanyaprem commented Sep 5, 2023 • edited Loading

harsh-98 commented Sep 6, 2023 • edited Loading

richard-ramos left a comment • edited Loading

Choose a reason for hiding this comment

harsh-98 commented Sep 6, 2023

chaitanyaprem left a comment

Choose a reason for hiding this comment

harsh-98 commented Sep 4, 2023 •

edited

Loading

status-im-auto commented Sep 4, 2023 •

edited

Loading

harsh-98 commented Sep 4, 2023 •

edited

Loading

harsh-98 commented Sep 4, 2023 •

edited

Loading

chaitanyaprem commented Sep 5, 2023 •

edited

Loading

harsh-98 commented Sep 6, 2023 •

edited

Loading

richard-ramos left a comment •

edited

Loading