Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deprecate f{32,64}::DIGITS #89238

Closed

Conversation

workingjubilee
Copy link
Member

@workingjubilee workingjubilee commented Sep 25, 2021

These constants are misleading: the number of significant digits
varies for each value that these floating point numbers may encode
but some programmers are taking them directly as an upper bound.
This is wrong and is leading to programmers creating applications
that directly mislead other users about their meaning, having
a negative ecosystem-wide impact on mathematical accuracy.

To contain the damage, deprecate them without replacement.
It is hoped this will force programmers to reevaluate their use.

Closes #89106.
This problem may seem trivial but instructions to alter valid float values being generated by an otherwise highly-regarded source (Clippy) is a bad result. While a PR is open against rust-clippy to fix this behavior, this pattern may have arisen in non-indexed code, even though it seems to be uncommon. As the constants don't seem to be greatly used in practice, while the damage they can do if misused is high, it seems reasonable to take this path.

@rust-highfive
Copy link
Collaborator

r? @m-ou-se

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 25, 2021
These constants are misleading: the number of significant digits
varies for each value that these floating point numbers may encode
but some programmers are taking them directly as an upper bound.
This is wrong and is leading to programmers creating applications
that directly mislead other users about their meaning, having
a negative ecosystem-wide impact on mathematical accuracy.

To contain the damage, deprecate them without replacement.
It is hoped this will force programmers to reevaluate their use.
#[rustc_deprecated(since = "TBD", reason = "replaced by the `DIGITS` associated constant on `f32`")]
#[rustc_deprecated(
since = "TBD",
reason = "this value is incorrect or misleading as it actually encompasses a range"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really tiny nit, but I feel like the word 'encompasses' is misleading here. I might read that and think that the deprecated value is the upper bound, and therefore safe.
Maybe change it to say "the actual number of significant digits varies over a range, and this value is the minimum number of significant digits" or something else that makes the problem clear.

@jyn514 jyn514 added A-floating-point Area: Floating point numbers and arithmetic T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 25, 2021
@CryZe
Copy link
Contributor

CryZe commented Sep 25, 2021

What about all the other constants? (I guess some of them would need replacements though)

@m-ou-se m-ou-se added the needs-fcp This change is insta-stable, so needs a completed FCP to proceed. label Sep 25, 2021
@m-ou-se
Copy link
Member

m-ou-se commented Sep 25, 2021

@rfcbot merge

@rfcbot
Copy link

rfcbot commented Sep 25, 2021

Team member @m-ou-se has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Sep 25, 2021
@workingjubilee
Copy link
Member Author

What about all the other constants? (I guess some of them would need replacements though)

Yeah, no need to handle them as a single blanket. These got entered as a group when they probably should have been evaluated on a more case-by-case basis in the first place, or even RFCed so let's not repeat that mistake.

@BurntSushi
Copy link
Member

I haven't read the linked issue, but I've read this issue as well as the deprecation message and updated docs, and I'm still not sure I understand what's going on here. In particular, I think my beef with the docs is that it's a tease. They say "usage is likely incorrect," but don't really elaborate. Maybe we can get some more details, and even better, an example?

@joshtriplett
Copy link
Member

Is there some particular reason these can't just be changed to the appropriate upper-bound values?

@workingjubilee
Copy link
Member Author

workingjubilee commented Sep 27, 2021

Of course, @BurntSushi.

An f32 can encode up to 9 significant decimal digits.
An f64 can encode up to 17 significant decimal digits.
Or rather: it takes 9 or 17 decimal digits to accurately encode the floating point binary value, because they can differ on the 9th digit. Both of these also have a lower bound for how many decimals are required: that is, for some binary values, the formatted decimal character string may only contain 6 or 15 significant decimal digits, and some decimal strings with 7-9 or 16-17 digits round to these binary values. These are what the constants f32::DIGITS and f64::DIGITS have for values. So there are many possible values for f{32,64}::DIGITS precisely because it is "approximate", and not due to rounding, but rather because it is inherently variable. It is, well, not really a constant but a range, const DIGITS: u32 declaration notwithstanding.

And the current name and value can lead to misleading conclusions. For example: Both numbers also have a RADIX constant for some reason, in spite of these numbers being defined in terms of the binary32 and binary64 which are both, well, binary. So one might imagine this other constant, named without qualifications as merely DIGITS, is the number of binary digits, perhaps in the significand, but no, that's MANTISSA_DIGITS, which is not the same as what MANTISSA_EXPLICIT_BITS (a currently non-extant constant) would be, which would be MANTISSA_DIGITS -1 (yes, this is confusing), even though both are about a notional quantity of binary data... floats hide an implicit 1 in some of their encodings, you see, so only 23 and 52 are explicitly encoded. So this constant specifies the number of decimal digits, which is only obvious if you double-check the documentation to begin with.

But then the documentation is not much help as it says it is "approximate", but doesn't specify how. DIGITS is the last accurately and precisely encoded number of decimal significant digits, where you can expect the parse-to-format to round trip using the same formatting rules that the decimal number was originally written with. But the documentation is already vague on where in the range of possible values this constant lies, or what the "standard deviation" may be. For many matters that concern floats, and especially how they are serialized, the preferred number to use 9 or 17 significant digits. This guarantees the binary value always "roundtrips" correctly under correct conditions, as described in the IEEE754 standard.

As far as a practical example of why it is bad:

fn main() {
    // A sequence of unique floats, printed by Rust with varying numbers of significant digits.
    let a: Vec<f32> = vec![
        0.1000734,
        0.100073405,
        0.10007341,
        0.10007342,
        0.10007343,
        0.100073434,
        0.10007344,
        0.10007345,
        0.10007346,
        0.100073464,
        0.10007347,
        0.10007348,
        0.10007349,
        0.100073494,
    ];

    for f in a {
        println!("{:.1$}", f, f32::DIGITS as usize); // All chopped down to the same string: 0.100073
    }
}

As this is a somewhat unusual formatting arg choice, you may wish to replace it with your choice of more custom string-length-evaluation code, et cetera, that uses f32::DIGITS for similar results. If you carefully use this constant with a lot of additional catches, this won't happen to you, but the documentation is of no assistance here.

@joshtriplett
There are a few plausible solutions. We could
a. Deprecate these constants, as this PR does. Plausibly disruptive, but washes our hands of this matter. May be appropriate if the standard library wants to minimize the amount of documentation on using floats correctly that it has to include.
b. Revalue the constants to the likely-more-useful value and explain the value thoroughly, as you propose. But this is silently disruptive to existing programs, so it's probably a bad idea, honestly. And the name remains confusing.
c. Add another associated constant to each type and describe the matter of the upper and lower bound for decimal digits. Least disruptive. Doesn't force possibly-incorrect or misleading code to change. Fairly appropriate.
d. Deprecate these constants and add another two associated constants, or a special function perhaps, that describe the matter of it being a range of significant digits. Forces people using this constant to rethink their code and be clearer on what they mean. Also appropriate, but slightly more disruptive.
e. Do something more imaginative that I haven't thought of yet.

Note that a. allows d. to be done later.

@yaahc
Copy link
Member

yaahc commented Sep 28, 2021

@rfcbot concern detailed explanation

+1 from me on this change but I'd also like to see more detail added to the deprecation message and documentation as mentioned by @BurntSushi.

Regarding the options you presented at the end of your last comment @workingjubilee I'd probably lean towards (c) or (d) through (a) as an initial step. IMO deprecate the constants now and then base any more resilient impl we add in the future on concrete usecases.

@tspiteri
Copy link
Contributor

The reason I can think of for these constants to be as they are is compatibility with the C standard library's {FLT,DBL}_DIG from #include <float.h>, and the C++ standard library's std::numeric_limits<{float,double}>::digits10 from #include <limits>. I don't think they're used that much in C and C++ either, however I figure it is worth being aware of the C and C++ constants when picking a way forward.

@m-ou-se m-ou-se added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 27, 2021
@camelid camelid added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Dec 10, 2021
@m-ou-se
Copy link
Member

m-ou-se commented Jan 19, 2022

@workingjubilee Any updates here?

@yaahc
Copy link
Member

yaahc commented Jan 26, 2022

Since this is only waiting on a small update to the docs we're going to let the fcp move forward and use the github UI to track that changes still need to be made prior to merging.

@rfcbot resolve detailed-explanation

Copy link
Member

@yaahc yaahc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just needs a minor clarification to the docs and then to set the since fields I believe.

/// Approximate number of significant digits in base 10.
/// Use [`f32::DIGITS`] instead.
/// A minimum number of significant digits in base 10.
/// This value is likely incorrect for usage,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just need to elaborate on this and add a small example of how this can be incorrect.

@@ -381,8 +388,14 @@ impl f32 {
#[stable(feature = "assoc_int_consts", since = "1.43.0")]
pub const MANTISSA_DIGITS: u32 = 24;

/// Approximate number of significant digits in base 10.
/// A minimum number of significant digits in base 10.
/// This value is likely incorrect for usage,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here as well

/// Approximate number of significant digits in base 10.
/// Use [`f64::DIGITS`] instead.
/// A minimum number of significant digits in base 10.
/// This value is likely incorrect for usage,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too

/// Approximate number of significant digits in base 10.

/// A minimum number of significant digits in base 10.
/// This value is likely incorrect for usage,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

@workingjubilee
Copy link
Member Author

Sorry, trying to catch up with everything and the part of my brain that was responsible for this particular PR is currently scheduling tasks, hope to catch up with this relatively soon and address concerns.

@bors
Copy link
Contributor

bors commented May 9, 2022

☔ The latest upstream changes (presumably #95960) made this pull request unmergeable. Please resolve the merge conflicts.

@Dylan-DPC
Copy link
Member

@workingjubilee any updates on this?

Copy link
Member

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the docs changes, but +1 for the solution proposed!

@workingjubilee
Copy link
Member Author

Apologies. I began to question myself a bit reflectively in a normal fashion after initially opening this PR and enumerating the options more distinctly upon prompting, and then I became concerned about the quality of the current upgrade path ("none") following a few remarks, including @yaahc's. Then I became very busy.

@camelid
Copy link
Member

camelid commented Feb 11, 2023

triage: Hi @workingjubilee, I just wanted to check on the status of this PR.

@Dylan-DPC
Copy link
Member

Closing this as inactive. If you wish to continue you can open a new PR and we can move forward from there

@Dylan-DPC Dylan-DPC closed this Mar 6, 2023
@rfcbot rfcbot removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Mar 6, 2023
@Dylan-DPC Dylan-DPC added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-floating-point Area: Floating point numbers and arithmetic needs-fcp This change is insta-stable, so needs a completed FCP to proceed. S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

{f32,f64}::DIGITS is misleading