Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R4R stake invariance bug #4389

Merged
merged 7 commits into from
May 25, 2019
Merged

R4R stake invariance bug #4389

merged 7 commits into from
May 25, 2019

Conversation

rigelrozanski
Copy link
Contributor

@rigelrozanski rigelrozanski commented May 21, 2019

closes #4383

  • also made onOperation not a global var but a flag --SimulateEveryOperation

test command: go test ./simapp/... -run TestFullAppSimulation -SimulationEnabled=true -SimulationNumBlocks=500 -SimulationGenesis= -SimulationVerbose=true -SimulationCommit=true -SimulationSeed=8090485 -SimulationPeriod=50 -v -timeout 24h

  • Linked to github-issue with discussion and accepted design OR link to spec that describes this work.
  • Wrote tests
  • Updated relevant documentation (docs/)
  • Added a relevant changelog entry: clog add [section] [stanza] [message]
  • rereviewed Files changed in the github PR explorer

For Admin Use:

  • Added appropriate labels to PR (ex. wip, ready-for-review, docs)
  • Reviewers Assigned
  • Squashed all commits, uses message "Merge pull request #XYZ: [title]" (coding standards)

@codecov
Copy link

codecov bot commented May 21, 2019

Codecov Report

Merging #4389 into master will increase coverage by 0.88%.
The diff coverage is 50%.

@@            Coverage Diff             @@
##           master    #4389      +/-   ##
==========================================
+ Coverage    57.2%   58.09%   +0.88%     
==========================================
  Files         241      235       -6     
  Lines       15125    14881     -244     
==========================================
- Hits         8653     8645       -8     
+ Misses       5842     5606     -236     
  Partials      630      630

@npinto
Copy link
Contributor

npinto commented May 21, 2019

Merging #4389 into master will increase coverage by 0.87%.

Glad this single seed is drastically increasing coverage! That's the whole point of our AI-driven explorations ;-)

@rigelrozanski
Copy link
Contributor Author

So what's happening is that this is an edge case of the same bug in #4094

In this particular case the currentStakeRoundUp is actually the same as currentStake due to the rounding direction being the same

reference calculation:

(100503150419.972974482127094959 * 85331552243) / 104618427897.719998091992964075 = 
81974944596.112730277011189509916758442503168929034

I'm not really concerned about this difference, so I've simply expanded the definition of currentStakeRoundUp:

		currentStakeRoundUp := sdk.MaxDec(
			val.TokensFromSharesRoundUp(del.GetShares()),
			currentStake.Add(sdk.SmallestDec()),
		)

x/distribution/keeper/delegation.go Outdated Show resolved Hide resolved
simapp/sim_test.go Show resolved Hide resolved
@alexanderbez alexanderbez requested a review from cwgoes May 22, 2019 14:11
@alexanderbez
Copy link
Contributor

Failing CI @rigelrozanski

Copy link
Contributor

@cwgoes cwgoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerned about this approach

// currentStake. This amount of error is tolerated and corrected for,
// however any greater amount should be considered a breach in expected
// behaviour.
if stake.Equal(currentStake.Add(sdk.SmallestDec())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that our rounding error will never exceed sdk.SmallestDec()? Why?

IMO the approach of always rounding in the same direction is much safer, is there a reason we can't do that?

Copy link
Contributor Author

@rigelrozanski rigelrozanski May 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the approach of always rounding in the same direction is much safer, is there a reason we can't do that?

We are always rounding in the same direction here. currentStake.Add(sdk.SmallestDec() is effectively rounding up by one smallest decimal point no matter what. The whole edge case of this but is that sometimes currentStake and currentStakeRoundUp were the same value, where due to the inconsistencies in calculating stake across multiple periods it would round up slightly greater. Note that the greatest value that currentStakeRoundUp could have ever achieved is currentStake.Add(sdk.SmallestDec()- now we're just capping it at that value

Are we sure that our rounding error will never exceed sdk.SmallestDec()? Why?

My current understanding of this discrepancy is that its probabilistic - based on specific uses of validator periods the rounding errors can compound to ultimately increase the lowest decimal place by greater than 1. I've not been able to show how this could be greater than one decimal point, however I don't have a proof that it contained within 1 decimal point. During all simulations however we have never seen greater than 1 decimal point increase, only here here we are seeing a 1 decimal point increase from currentStakeRoundUp which holds the same value as currentStake (see the reference calculation #4389 (comment))

If you have any specific alternative suggestions I'm open to them, however I honestly don't see a viable alternative given the current distribution mechanism design.


edit: we could potentially somehow temporarily increase the number of decimal places used within this calculation of calculateDelegationRewards however it would require significant modifications to decimal, I'm not prepared to take this on atm, however it would probably do the trick

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opened up an issue #4399 - I think that's the long term solution, however I don't have short-term capacity for that, and this is still a bug, so I suggest we merge this as is, until performing the changes to decimal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are always rounding in the same direction here.

I don't mean in the sanity check, I mean in all other instances contributing to stake

where due to the inconsistencies in calculating stake across multiple periods it would round up slightly greater

It's these "inconsistencies" that I'm concerned about - shouldn't it be possible to ensure that they are all "inconsistent" in the same (safe) direction?

If you have any specific alternative suggestions I'm open to them, however I honestly don't see a viable alternative given the current distribution mechanism design.

Yes, in principle I don't see how we can't ensure that all operations round in a particular, predetermined direction if in fact they will lose precision. Is there a counterexample of that?

My guess is that we ought to be rounding in some direction somewhere where we aren't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that we ought to be rounding in some direction somewhere where we aren't.

I don't mean in the sanity check, I mean in all other instances contributing to stake

Well without guessing, aka by reviewing the code for the Nth time, I can say with strong confidence that this guess is incorrect, in all calculations which contribute to forming the value held in the stake object, we round in a direction which reduces the value of stake. Here let me baby you through the important steps involved:


// event.Fraction comes from updatedFraction
// increasing event.Fraction decreases the multiplier, decreasing stake
stake = stake.MulTruncate(sdk.OneDec().Sub(event.Fraction)) 

// decreasing currentMultiplicand increases updatedFraction which decreases stake
// decreasing newMultiplicand increases updatedFraction which decreases stake
// the use of MulTruncate here, increases updatedFraction which decreases stake
updatedFraction := sdk.OneDec().Sub(currentMultiplicand.MulTruncate(newMultiplicand))

// current fraction comes from updatedFraction (recursive)
currentMultiplicand := sdk.OneDec().Sub(currentFraction) 

// fraction comes from effectiveFraction,
// increasing fraction causes decreasing newMultiplicand increases updatedFraction which decreases stake 
newMultiplicand := sdk.OneDec().Sub(fraction)
 
// here the use of QuoRoundUp increases effectiveFraction which (see above) desceases stake
effectiveFraction := tokensToBurn.ToDec().QuoRoundUp(validator.Tokens.ToDec()) // here tokensToBurn are actually applied to the tokens within staking

https://github.com/cosmos/cosmos-sdk/blob/711ad3a7b528e7b3586cadedcbae8507a1e9c132/x/distribution/keeper/delegation.go#L87 https://github.com/cosmos/cosmos-sdk/blob/711ad3a7b528e7b3586cadedcbae8507a1e9c132/x/distribution/keeper/validator.go#L111 https://github.com/cosmos/cosmos-sdk/blob/711ad3a7b528e7b3586cadedcbae8507a1e9c132/x/staking/keeper/slash.go#L109


Yes, in principle I don't see how we can't ensure that all operations round in a particular, predetermined direction if in fact they will lose precision. Is there a counterexample of that?

The counterexample is not of our concern as it reduces the value of stake which is a situation we ignore. However based on how the code is structured (see the code snippets ^ ) it's very clear that stake will very often be < currentStake based on our rounding direction.

It's these "inconsistencies" that I'm concerned about - shouldn't it be possible to ensure that they are all "inconsistent" in the same (safe) direction?

If we increase the decimal precision for use in this calculation we can ensure that the inconsistencies are tolerable and we can remove this special correction factor, aka:

if stake.Equal(currentStake.Add(sdk.SmallestDec())) {
	stake = currentStake

can be removed. See #4399 for more thoughts on that


tbh I think I'm just about through going over this code/explanations, I'm very confident this solution makes sense - Of course, I invite you to implement an alternative solution if you have strong justifications (or implement #4399, which I'll be taking on within the next few releases or so probably either way).

Copy link
Contributor

@cwgoes cwgoes May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rigelrozanski - sorry if I come across as obstinate, there's just a lot of room for error in multi-stage rounding (including possible bugs in code I previously wrote).

Is it possible to change the case which causes stake.GT(currentStake) - https://github.com/cosmos/cosmos-sdk/pull/4389/files#diff-de71192adba700b3255b1eab62bef50eR100 - to sometimes be true? If "all calculations which contribute to forming the value held in the stake object, we round in a direction which reduces the value of stake", then we should never hit this conditional, unless currentStake was also rounded down, right?

The old comment discusses stake being multiplied by slash fractions - https://github.com/cosmos/cosmos-sdk/pull/4389/files#diff-de71192adba700b3255b1eab62bef50eL101 - is that referring to "currentStake" being multiplied by a slash fraction, rounded down, and ending up less than stake? Have we considered instead rounding up when slashes happen (but continuing to round down when accounting in F1) - if that's safe, it might save us this check.

My current understanding of this discrepancy is that its probabilistic - based on specific uses of validator periods the rounding errors can compound to ultimately increase the lowest decimal place by greater than 1. I've not been able to show how this could be greater than one decimal point, however I don't have a proof that it contained within 1 decimal point. During all simulations however we have never seen greater than 1 decimal point increase, only here here we are seeing a 1 decimal point increase from currentStakeRoundUp which holds the same value as currentStake.

This is my specific concern. I guess this concern doesn't need to block this PR, since whatever such bug, if present, will have previously existed regardless - this PR at minimum reduces the likelihood, I agree about that. My line of questioning is as to whether we can completely eliminate the possibility of a discrepancy here.

Does that make sense?

Copy link
Contributor Author

@rigelrozanski rigelrozanski May 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no worries on being thorough - I guess I was just looking for more thoughtful criticism's off the bat, but I do now understand what you're referring too.


Is it possible to change the case which causes stake.GT(currentStake) - to sometimes be true? If "all calculations which contribute to forming the value held in the stake object, we round in a direction which reduces the value of stake", then we should never hit this conditional, unless currentStake was also rounded down, right?

So stake.GT(currentStake) will often be true, when the calculations are working as expected. We hit this conditional in a non-erroring (expected) way now because we are kind of "splitting up" the whole act of slashing stake within the distribution mechanism by validator periods. Without ever rounding in a direction which would increase this stake, just the fact that we've split this up means that where the slashing is not split up per delegation as it is in distribution, the combined slashing can slash more which is inherent to slashing for all all delegations simultaneously instead of by period. When we break out slashing by period, even if we round down for each slash fraction, it's possible due to how much is being rounded that we slash less when slashing by period instead of for when we slash without periods. In other words, The full slash, and the slashing by period could both be rounding down but the slashing by period is simply rounding down less.

I don't know how else I can explain this short of creating a numerical example (which you should try for yourself!)... But I get it, like this is a really bizarre occurrence.

is that referring to "currentStake" being multiplied by a slash fraction, rounded down, and ending up less than stake? Have we considered instead rounding up when slashes happen (but continuing to round down when accounting in F1) - if that's safe, it might save us this check.

This wouldn't affect this scenario. For the same reasons as explained above, this bug doesn't have to do so much with the direction of rounding (which are all setup to avoid this panic condition) but with the magnitude of rounding which is changed because we are splitting up the slashing by periods.

This is my specific concern. I guess this concern doesn't need to block this PR, since whatever such bug, if present, will have previously existed regardless - this PR at minimum reduces the likelihood, I agree about that. My line of questioning is as to whether we can completely eliminate the possibility of a discrepancy here.
Does that make sense?

I understand your thinking, but for the bizarre reasons I've explained, until we increase the precision for this calculation, we're going to have to live with this solution AFAICT

Copy link
Contributor

@cwgoes cwgoes May 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we break out slashing by period, even if we round down for each slash fraction, it's possible due to how much is being rounded that we slash less when slashing by period instead of for when we slash without periods. In other words, The full slash, and the slashing by period could both be rounding down but the slashing by period is simply rounding down less.

Ah now I see, thanks - yes, that would be possible, makes sense.

Can you copy your above explanation into the comment (it should have been there in the first place, which is probably on me)?

I understand your thinking, but for the bizarre reasons I've explained, until we increase the precision for this calculation, we're going to have to live with this solution AFAICT

Right, better precision would possibly enable a tighter bound. I wonder if we should actually increase the bound (out of which we panic) given the possible accumulated rounding errors and the unlikelihood of this being exploitable (since slashes, which are economically costly, are required), but if this never fails in simulation it does seem unlikely to happen in practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this conversation has brought further clarity - I will improve the comment to incorporate thoughts from our conversation.

Right, better precision would possibly enable a tighter bound.

That's the thinking, the hope is that by the end of the calculation, when the high precision decimal is being reduced back to default decimal places, the in bound is eliminated.

I wonder if we should actually increase the bound (out of which we panic) given the possible accumulated rounding errors and the unlikelihood of this being exploitable (since slashes, which are economically costly, are required)

I would be comfortable increasing it up to 5 smallest points, this should entirely eliminate the possibility of this error. Maybe we include 3 right now?

@rigelrozanski rigelrozanski merged commit 38f49e4 into master May 25, 2019
@rigelrozanski rigelrozanski deleted the rigel/4383-stake-invariance branch May 25, 2019 01:23
larry0x pushed a commit to larry0x/cosmos-sdk that referenced this pull request May 22, 2023
* add trouble seed

* currentStakeRoundUp is now always atleast currentStake + smallest-decimal-precision

* remove unused code

* remove debugs

* @alexanderbez comment

* compile fix

* better comment, increase tolerance to 3 smallest decimal points
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Random Fuzzer panic: calculated final stake greater than current stake (on f0b690ba6e2)
4 participants