Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: and optimising fetching membership events #706

Merged
merged 10 commits into from
Sep 7, 2023
Merged

fix: and optimising fetching membership events #706

merged 10 commits into from
Sep 7, 2023

Conversation

harsh-98
Copy link
Contributor

@harsh-98 harsh-98 commented Sep 4, 2023

Description

Bug and other optimisation found for membership fetching logic while checking #701.

Changes

  • errCh can be closed multiple times. It will result in error.
  • Some blocks might be missed btw loadOldEvents and watchNewEvents.
  • Use binary split of work for gettting events.

Tests

@status-im-auto
Copy link

status-im-auto commented Sep 4, 2023

Jenkins Builds

Click to see older builds (54)
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ e2a4f85 #1 2023-09-04 09:07:29 ~1 min linux 📦deb
✔️ e2a4f85 #1 2023-09-04 09:08:15 ~2 min nix-flake 📄log
✔️ e2a4f85 #1 2023-09-04 09:09:09 ~3 min tests 📄log
✔️ e2a4f85 #1 2023-09-04 09:09:13 ~3 min tests 📄log
✔️ e2a4f85 #1 2023-09-04 09:09:58 ~3 min android 📦tgz
✔️ e2a4f85 #1 2023-09-04 09:10:29 ~4 min ios 📦tgz
✔️ f2ba54a #2 2023-09-04 10:37:38 ~1 min linux 📦deb
✔️ f2ba54a #2 2023-09-04 10:38:19 ~1 min nix-flake 📄log
✔️ f2ba54a #2 2023-09-04 10:39:18 ~2 min tests 📄log
✔️ f2ba54a #2 2023-09-04 10:39:30 ~3 min tests 📄log
✔️ f2ba54a #2 2023-09-04 10:39:35 ~3 min android 📦tgz
✔️ f2ba54a #2 2023-09-04 10:40:12 ~3 min ios 📦tgz
✖️ bbf38f2 #3 2023-09-06 13:56:18 ~21 sec tests 📄log
✔️ bbf38f2 #3 2023-09-06 13:57:22 ~1 min linux 📦deb
✔️ bbf38f2 #3 2023-09-06 13:57:48 ~1 min nix-flake 📄log
✖️ bbf38f2 #3 2023-09-06 13:58:46 ~2 min tests 📄log
✔️ bbf38f2 #3 2023-09-06 13:59:35 ~3 min android 📦tgz
✔️ bbf38f2 #3 2023-09-06 14:00:16 ~4 min ios 📦tgz
✔️ 9ad1355 #4 2023-09-06 15:25:12 ~1 min linux 📦deb
✖️ 9ad1355 #4 2023-09-06 15:25:21 ~1 min tests 📄log
✖️ 9ad1355 #4 2023-09-06 15:25:35 ~1 min tests 📄log
✔️ 9ad1355 #4 2023-09-06 15:25:52 ~1 min nix-flake 📄log
✔️ 9ad1355 #4 2023-09-06 15:27:10 ~3 min android 📦tgz
✔️ 9ad1355 #4 2023-09-06 15:28:00 ~4 min ios 📦tgz
✖️ b5ff218 #5 2023-09-06 15:47:24 ~21 sec tests 📄log
✖️ b5ff218 #5 2023-09-06 15:47:27 ~24 sec tests 📄log
✔️ b5ff218 #5 2023-09-06 15:47:39 ~36 sec linux 📦deb
✔️ b5ff218 #5 2023-09-06 15:48:57 ~1 min nix-flake 📄log
✔️ b5ff218 #5 2023-09-06 15:50:23 ~3 min android 📦tgz
✔️ b5ff218 #5 2023-09-06 15:50:52 ~3 min ios 📦tgz
✖️ 80345a4 #6 2023-09-06 15:49:56 ~22 sec tests 📄log
✖️ 80345a4 #6 2023-09-06 15:49:56 ~22 sec tests 📄log
✔️ 80345a4 #6 2023-09-06 15:50:10 ~36 sec linux 📦deb
✔️ 80345a4 #6 2023-09-06 15:51:28 ~1 min nix-flake 📄log
✔️ 80345a4 #6 2023-09-06 15:53:24 ~2 min android 📦tgz
✔️ 80345a4 #6 2023-09-06 15:53:38 ~4 min ios 📦tgz
✔️ 0e0af05 #7 2023-09-06 15:58:44 ~34 sec linux 📦deb
✖️ 0e0af05 #7 2023-09-06 15:59:15 ~1 min tests 📄log
✖️ 0e0af05 #7 2023-09-06 15:59:45 ~1 min tests 📄log
✔️ 0e0af05 #7 2023-09-06 16:00:04 ~1 min nix-flake 📄log
✔️ 0e0af05 #7 2023-09-06 16:01:24 ~3 min android 📦tgz
✔️ 0e0af05 #7 2023-09-06 16:01:27 ~3 min ios 📦tgz
✔️ 81c9919 #8 2023-09-06 16:41:06 ~31 sec linux 📦deb
✔️ 81c9919 #8 2023-09-06 16:42:32 ~1 min nix-flake 📄log
✔️ 81c9919 #8 2023-09-06 16:43:50 ~3 min android 📦tgz
✔️ 81c9919 #8 2023-09-06 16:45:54 ~5 min ios 📦tgz
✖️ 81c9919 #8 2023-09-06 16:46:16 ~5 min tests 📄log
✖️ 81c9919 #8 2023-09-06 16:46:16 ~5 min tests 📄log
✔️ e72018d #9 2023-09-06 17:07:06 ~30 sec linux 📦deb
✔️ e72018d #9 2023-09-06 17:07:33 ~54 sec tests 📄log
✔️ e72018d #9 2023-09-06 17:08:23 ~1 min tests 📄log
✔️ e72018d #9 2023-09-06 17:08:30 ~1 min nix-flake 📄log
✔️ e72018d #9 2023-09-06 17:09:55 ~3 min android 📦tgz
✔️ e72018d #9 2023-09-06 17:11:25 ~4 min ios 📦tgz
Commit #️⃣ Finished (UTC) Duration Platform Result
✔️ 8ca4d4e #10 2023-09-07 07:13:20 ~36 sec linux 📦deb
✔️ 8ca4d4e #10 2023-09-07 07:13:46 ~59 sec tests 📄log
✔️ 8ca4d4e #10 2023-09-07 07:14:36 ~1 min tests 📄log
✔️ 8ca4d4e #10 2023-09-07 07:16:37 ~3 min nix-flake 📄log
✔️ 8ca4d4e #10 2023-09-07 07:17:38 ~4 min ios 📦tgz
✔️ 8ca4d4e #10 2023-09-07 07:18:06 ~5 min android 📦tgz
✔️ d00f053 #11 2023-09-07 16:14:18 ~3 min linux 📦deb
✔️ d00f053 #11 2023-09-07 16:14:29 ~3 min nix-flake 📄log
✔️ d00f053 #11 2023-09-07 16:15:27 ~4 min tests 📄log
✔️ d00f053 #11 2023-09-07 16:16:06 ~5 min tests 📄log
✔️ d00f053 #11 2023-09-07 16:16:16 ~5 min android 📦tgz
✔️ d00f053 #12 2023-09-07 16:23:35 ~4 min ios 📦tgz

@richard-ramos
Copy link
Member

I like the changes related to:

- errCh can be closed multiple times. It will result in error.
- Some blocks might be missed btw loadOldEvents and watchNewEvents.

However I think the current mechanism for loading events (using chunks) is more appropiate compared to using the proposed binary split, which was one of the initial approaches I had considered for loading the events.

Following this approach, while it works for Infura because it uses an offchain index for event logs, it might overwhelm a node you're running locally. The approach we're currently follow in go-waku is the same as eth2 nodes like prysm in which we fetch block ranges in chunks of 5000 blocks (prysm goes even lower at 1000 blocks per chunk), with some special handling for cases where we exceed the number of events, which is using the multiplicativeDecreaseDivisor to reduce the range to a smaller number of blocks, and then slowly increase the range using the additiveFactorMultiplier, until we get again back to the max number of blocks (5000).

While I agree that using a binary split might be faster, it causes more load over a local node, and also, syncing the chain in 5K blocks is still a fast operation that takes only some 10s of seconds and it's only executed once at the beginning, and since we already store the merkle tree in the RLN database,

@harsh-98
Copy link
Contributor Author

harsh-98 commented Sep 4, 2023

I like the changes related to:


- errCh can be closed multiple times. It will result in error.

- Some blocks might be missed btw loadOldEvents and watchNewEvents.

However I think the current mechanism for loading events (using chunks) is more appropiate compared to using the proposed binary split, which was one of the initial approaches I had considered for loading the events.

Following this approach, while it works for Infura because it uses an offchain index for event logs, it might overwhelm a node you're running locally. The approach we're currently follow in go-waku is the same as eth2 nodes like prysm in which we fetch block ranges in chunks of 5000 blocks (prysm goes even lower at 1000 blocks per chunk), with some special handling for cases where we exceed the number of events, which is using the multiplicativeDecreaseDivisor to reduce the range to a smaller number of blocks, and then slowly increase the range using the additiveFactorMultiplier, until we get again back to the max number of blocks (5000).

While I agree that using a binary split might be faster, it causes more load over a local node, and also, syncing the chain in 5K blocks is still a fast operation that takes only some 10s of seconds and it's only executed once at the beginning, and since we already store the merkle tree in the RLN database,

In this PR , we are also syncing in batches of 5k within the loadOldevents. it is just that if the getEvents call for 5k blocks fails with over-the-limit error, We split the work in 2 jobs of half range in getEvents.

@richard-ramos
Copy link
Member

Yes but do notice that subsequent requests will still try to load the 5000 requests and split it into separate jobs which on a period of heavy usage of the smart contract will mean that the tooMuchData error should appear frequently for some range periods while trying to load the events, while current approach will not use 5000, but the last valid chunk size + 10% in an attempt to slowly get back to to the original 5000.

Do you think the additive factor + chunk size could be kept with the approach you propose?

@harsh-98
Copy link
Contributor Author

harsh-98 commented Sep 4, 2023

Yes but do notice that subsequent requests will still try to load the 5000 requests and split it into separate jobs which on a period of heavy usage of the smart contract will mean that the tooMuchData error should appear frequently for some range periods while trying to load the events, while current approach will not use 5000, but the last valid chunk size + 10% in an attempt to slowly get back to to the original 5000.

Do you think the additive factor + chunk size could be kept with the approach you propose?

The current approach already divides in half by multiplicativedivisor in case of error so including additivefactor for increasing batch size will result in same sort of code. So i can revert that change.

But i also think that this error wont be frequent and will only happen during the start of the node. Plus all the waku node will probably have connection to different eth rpc. Thats why we can try with same batchsize instead of setting batchsize to decreased value and increasing over time.

Again this is subjective, i am comfortable with both the approach. let me know whats your verdict after this discussion.

@richard-ramos
Copy link
Member

I see. Not sure either what decision to take, I tend to prefer the original approach but I agree with the argument you present!
let's ask @chaitanyaprem for feedback!

@chaitanyaprem
Copy link
Collaborator

chaitanyaprem commented Sep 5, 2023

I see. Not sure either what decision to take, I tend to prefer the original approach but I agree with the argument you present! let's ask @chaitanyaprem for feedback!

Had a quick discussion with @harsh-98 regarding both the approaches.
Wrt the approach of halving only in case of error and reverting back to default blockFetchSize(5000) seems simple to me and would make the code simple and more maintainable and less error-prone.

I would suggest, we go ahead with the approach suggested by @harsh-98.

@harsh-98
Copy link
Contributor Author

harsh-98 commented Sep 6, 2023

@richard-ramos @chaitanyaprem This PR has become quite big as i had to rebase on master to account for web3Config. Bcz of that i had to refactor a bit. Also can you guys review this before there are more merge conflicts.

  • New changes are from test: fetching membership logic commit.
  • rlnInstance, rootTrack were previously created while creating rlnRelay
    but were assigned to groupManager on Start of rlnRelay. This created
    unncessary dependency of passing them to static and dynamic group
    manager.
  • Web3Config uses interface EthClientI for client, so that we can pass
    mock client for testing MembershipFetcher.
  • test for checking fetching membershipRegister events.

Copy link
Member

@richard-ramos richard-ramos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some observations.
Sorry for the rebase issues! but having said that, these will keep happening as the RLN feature is under heavy development. Even tho i'm one who always breaks this rule, Let's try to keep the PRs small!

waku/v2/protocol/rln/waku_rln_relay.go Outdated Show resolved Hide resolved
waku/v2/protocol/rln/web3/web3.go Outdated Show resolved Hide resolved
waku/v2/protocol/rln/group_manager/group_manager.go Outdated Show resolved Hide resolved
waku/v2/protocol/rln/onchain_test.go Outdated Show resolved Hide resolved
waku/v2/protocol/rln/group_manager/group_manager.go Outdated Show resolved Hide resolved
@harsh-98
Copy link
Contributor Author

harsh-98 commented Sep 6, 2023

Commenting so that i remember , how to test a PR local.

go vet ./...
go test -timeout 300s ./waku/... -coverprofile=./c.out.tmp
go test -race -timeout 30s ./... -v
go  test -v -count 1 -tags="gowaku_rln include_onchain_tests" github.com/waku-org/go-waku/waku/v2/protocol/rln
golangci-lint --exclude=SA1019 run ./... --deadline=5m

Copy link
Collaborator

@chaitanyaprem chaitanyaprem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do address comments.
Let's avoid such refactor's combining with fixes and other changes, it makes review very hard.
LGTM

Do wait for ok from @richard-ramos as he is actively working on this.

rlnInstance, rootTrack were previously created while creating rlnRelay
but were assigned to groupManager on Start of rlnRelay. This created
unncessary dependency of passing them to static and dynamic group
manager.
Web3Config uses interface EthClientI for client, so that we can pass
mock client for testing MembershipFetcher.
@harsh-98 harsh-98 merged commit 08cabab into master Sep 7, 2023
10 of 11 checks passed
@harsh-98 harsh-98 deleted the rlnFetcher branch September 7, 2023 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants