Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC 3349 precursors #120329

Merged
merged 7 commits into from
Jan 26, 2024
Merged

RFC 3349 precursors #120329

merged 7 commits into from
Jan 26, 2024

Commits on Jan 25, 2024

  1. Avoid useless checking in from_token_lit.

    The parser already does a check-only unescaping which catches all
    errors. So the checking done in `from_token_lit` never hits.
    
    But literals causing warnings can still occur in `from_token_lit`. So
    the commit changes `str-escape.rs` to use byte string literals and C
    string literals as well, to give better coverage and ensure the new
    assertions in `from_token_lit` are correct.
    nnethercote committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    314dbc7 View commit details
    Browse the repository at this point in the history
  2. Fix copy/paste error.

    The `CString` handling code is erroneously identical to the `ByteString`
    handling code.
    nnethercote committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    4b4bdb5 View commit details
    Browse the repository at this point in the history
  3. Use from instead of into in unescaping code.

    The `T` type in these functions took me some time to understand, and I
    find the explicit `T` in the use of `from` makes the code easier to
    read, as does the `u8` annotation in `scan_escape`.
    nnethercote committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    ef1e222 View commit details
    Browse the repository at this point in the history
  4. Rework CStrUnit.

    - Rename it as `MixedUnit`, because it will soon be used in more than
      just C string literals.
    - Change the `Byte` variant to `HighByte` and use it only for
      `\x80`..`\xff` cases. This fixes the old inexactness where ASCII chars
      could be encoded with either `Byte` or `Char`.
    - Add useful comments.
    - Remove `is_ascii`, in favour of `u8::is_ascii`.
    nnethercote committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    a1c0721 View commit details
    Browse the repository at this point in the history
  5. Rename and invert sense of Mode predicates.

    I find it easier if they describe what's allowed, rather than what's
    forbidden. Also, consistent naming makes them easier to understand.
    nnethercote committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    5e5aa6d View commit details
    Browse the repository at this point in the history
  6. Rename the unescaping functions.

    `unescape_literal` becomes `unescape_unicode`, and `unescape_c_string`
    becomes `unescape_mixed`. Because rfc3349 will mean that C string
    literals will no longer be the only mixed utf8 literals.
    nnethercote committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    86f371e View commit details
    Browse the repository at this point in the history
  7. Use unescape_unicode for raw C string literals.

    They can't contain `\x` escapes, which means they can't contain high
    bytes, which means we can used `unescape_unicode` instead of
    `unescape_mixed` to unescape them. This avoids unnecessary used of
    `MixedUnit`.
    nnethercote committed Jan 25, 2024
    Configuration menu
    Copy the full SHA
    6be2e56 View commit details
    Browse the repository at this point in the history