Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P1949 negative impact on math heavy code #77

Closed
termi-official opened this issue Aug 8, 2022 · 4 comments
Closed

P1949 negative impact on math heavy code #77

termi-official opened this issue Aug 8, 2022 · 4 comments

Comments

@termi-official
Copy link

If I have understand everything correctly, then sg16 is responsible for this document, so let me elaborate an open issue with it. If sg16 is the wrong group, then please feel free to move the issue to the correct instance.

P1949 has quite a bit of side effects on code as it is written by parts of the numerical community. While I appreciate the work to improve the standard, for us numerics guys this one causes more harm than good. Let me elaborate. Pre-P1949 it was easier to write down code that is readable in a sense that close to the related theory, taking away some mental overhead. As a quick example let us take a simple time step controller, where theory states

$$ \Delta t_{n+1} = \varepsilon_{n+1}^{\beta_1/k} \cdot \varepsilon_{n}^{\beta_2/k} \cdot \varepsilon_{n-1}^{\beta_3/k} \cdot \Delta t_{n} $$

Here gcc and clang accepted pre-P1949 the following code

const auto Δtₙ₊₁ = std::pow(εₙ₊₁, β₁/k) * std::pow(εₙ, β₂/k) * std::pow(εₙ₋₁, β₃/k) * Δtₙ; 

Having this close corresponcence takes away some mental overhead when reading such codes, because we can directly relate the symbols back to the theory. Please also note that this is just a quick example and numerical codes can get way more complicated than this piece. In our internal projects such construct are quite common and I am pretty sure that there are more codes out there following such practices ( see e.g. llvm/llvm-project#54732 ).

I appreciate the time to work on the standard, but I think this proposal is the exact opposite of what we developers in the numerical community need. And I do not like the direction this is going, because if I see it correctly then some hard problems with identifiers are note addressed yet. Now back to P1949 - probably I am missing something, but wasn't the original point of P1949 to remove invisible and control characters? I am not really seeing how super and subscripted numerical indices are related and reading through the linked issue above it also reads as it was not accidental, but intentional to remove them from identifiers, although I could not find detailed information. I also noticed that super-/subscript letters and some super-/subscript symbols seem to be still valid, causing some weird inconsistency. Can you please elaborate?

The current direction also raises more questions from my side:

  1. Are there plans to restrict allowed characters further, especially the ones used in the computational sciences/basic math notation?
  2. Is there the possibility to bring back the numerical super-/subscripts?
  3. Related to this, and I know that the unicode consortium does want to hear this, but since this is really useful for the numerical community, is there any possibility to at least allow the very basic standard letters (greek and latin) in super-/subscripts - either directly via unicode or some extra mechanism in the language/editors? Yes, I read the opinions on this (and I am absolutely not fan of it, less am I agreeing), but viewing this from user-perspective, having just some characters avaiblable is really weird.

Quick link to online code above in godbolt: https://godbolt.org/z/nG7o5K141

Thank you for taking the time.

@jensmaurer
Copy link
Collaborator

jensmaurer commented Aug 9, 2022 via email

@tahonermann
Copy link
Member

Thank you for sharing your experience with us. Your experience is not unique; we have heard from a couple of other people that maintain projects that have been impacted, including from a member of the Unicode Consortium who is currently working on improvements to Unicode to support source code as text.

As P1949 explains, the previous character allowances resulted in surprising inconsistencies and lacked a principled approach for what characters were and were not allowed in identifiers. Jens' characterization of our goals is correct; we (SG16 and WG21) do not have the expertise or resources to audit the Unicode character set in order to make our own determination of what characters should and should not be allowed in identifiers. So, we chose to defer to the Unicode Consortium and UAX #31.

We don't use the GitHub issue tracker as a medium for discussion. I encourage you to resend your post to the SG16 mailing list. Doing so will reach more people, including some members of the Unicode Consortium. I'll respond there to put you in touch with people within the Unicode Consortium for further follow up.

@termi-official
Copy link
Author

Thanks for the detailed elaboration Tom and Jens. I responded to the linked mailing list https://lists.isocpp.org/sg16/2022/08/3340.php . Since this might affect several groups in the numerical community (and probably will after they upgrade their compilers), should I leave this issue open for now for visibility and as a signal when this is dealt with?

@tahonermann
Copy link
Member

Thank you for the post to the SG16 mailing list. I'm going to close this issue for now. Since the concerns you raise are not C++ specific (other languages that follow Unicode guidance and UAX #31) may have similar inconsistencies or allowance variations), my preference is that the Unicode Consortium address the question of whether the characters you reported (as well as other characters that are similarly questionable) should be allowed in identifiers. I'll put you in touch with the Unicode group working on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants