-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tower 72 fix #930
Tower 72 fix #930
Conversation
/// Submit a backlog of blocks that may have been mined while network is offline. | ||
/// Likely not more than 1. | ||
// only 72 proofs are allowed per epoch | ||
const MAX_PROOFS_PER_EPOCH: i64 = 72; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be fetched from the move globals so that we don't have duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference is to catch this from the error code which Move throws when the upper bound is reached. 130108
https://github.com/OLSF/libra/blob/main/language/diem-framework/modules/0L/TowerState.move#L278-L281
More importantly we're missing some error maps of 0L specific errors which the client should identify. See this draft issue I opened. #933
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fetch it from global store
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually after some thinking. I don't think this approach works because with my change tower backlog submission already aborts on any error from submission. So it doesn't matter error code we still should abort submission. This is to solve problem #2 I found with the tower.
And your suggestion to watch for specific error doesn't solve problem #1 in tower - we should not call backlog submission at all if we reached max number of proofs in epoch. Otherwise each tower will keep sending at least 1 proof every 30 mins causing move abort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing to keep in mind is that MAX_PROOFS_PER_EPOCH will not always be 72. And the users need to have the tower continue to make proofs if there's trouble connecting to network or submitting proofs (tower app should not exit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do I read number 72 from move config? Is there a code snipped to do that? When backup returns (success or failure) , tower proceeds to mine the next proof which will attempt to be submitted in the next invocation of backlog and so on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a function in Move which fetches this hardcoded value, but we don't keep it stored in state. So actually we may not be able to get it dynamically through usual APIs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to merge this PR, even if we then will have the allowed proofs per epoch configured at two places. Given the high complexity to make it perfect....
/// Submit a backlog of blocks that may have been mined while network is offline. | ||
/// Likely not more than 1. | ||
// only 72 proofs are allowed per epoch | ||
const MAX_PROOFS_PER_EPOCH: i64 = 72; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to merge this PR, even if we then will have the allowed proofs per epoch configured at two places. Given the high complexity to make it perfect....
Motivation
The issue #1 was found by Barmalei and he suggested the fix as well. While working on #1, I came across issue #2.
There are couple of problems with tower backlog code:
The fix is to
One gotcha found is that count_proofs_in_epoch is not necessarily proofs submitted in the current epoch, thus additional check is required to be able to use this field.
Have you read the [Contributing Guidelines on pull requests] yes
Test Plan
I tested with a single miner with a few pre-computed proofs. The code works as expected.
Related PRs
(If this PR adds or changes functionality, please take some time to update or suggest changes to the docs at https://developers.diem.com, and link to your PR here.)
If targeting a release branch, please fill the below out as well