gh-76535: Make `PyUnicode_ToLowerFull` and friends public #136176

lysnikolaou · 2025-07-01T13:30:09Z

Make _PyUnicode_ToLowerFull, _PyUnicode_ToUpperFull, _PyUnicode_ToTitleFull and _PyUnicode_ToFoldedFull public and rename them to PyUnicode_ToLower etc.

Issue: Unclear intention of deprecating Py_UNICODE_TOLOWER / Py_UNICODE_TOUPPER #76535

📚 Documentation preview 📚: https://cpython-previews--136176.org.readthedocs.build/

Make `PyUnicode_ToLowerFull`, `PyUnicode_ToUpperFull` and `PyUnicode_ToTitleFull` public and rename them to `PyUnicode_ToLower` etc.

Doc/c-api/unicode.rst

lysnikolaou · 2025-07-01T14:53:13Z

Thanks for taking a look @vstinner! Feedback addressed.

serhiy-storchaka

In #76535 (comment) @vstinner suggested to provide a constant which is the minimum buffer size.

If this is indeed a hard constant which will never be changed in future Unicode standards, then I prefer this way. It is too expensive to allocate the output buffer dynamically.

cc @ezio-melotti, our Unicode expert.

lysnikolaou · 2025-07-01T15:09:49Z

Another question I have is whether we want to expose something like the following to handle the Greek letter sigma edge case:

int PyUnicode_ToLowerHandleSigma(Py_UCS4 *str, Py_UCS4 ch, Py_UCS4 *buffer, int size)

vstinner

Thanks, I prefer this API which is more future-proof, it doesn't depend on a specific Unicode version.

Objects/unicodeobject.c

Doc/c-api/unicode.rst

vstinner · 2025-07-01T15:14:26Z

PyUnicode_ToLowerHandleSigma

Would you mind to elaborate? I'm not aware of this special case.

vstinner · 2025-07-01T15:18:24Z

If this is indeed a hard constant which will never be changed in future Unicode standards

Even if it's a constant which will never (!) change, IMO it's better to request a size as an argument to make the caller responsible to check the buffer size. APIs which accept a pointer with no size are a bad pattern, like the deprecated gets() function.

lysnikolaou · 2025-07-01T15:33:17Z

Further feedback addressed.

Would you mind to elaborate? I'm not aware of this special case.

There's one special case, the Greek letter sigma, where the result of lower casing is context-specific. More specifically, Σ gets lower-cased to ς if it's at the end of the word or to σ otherwise. This is handled in lower_ucs4 right now.

vstinner

Can you try to add tests to Modules/_testcapi/unicode.c and Lib/test/test_capi/test_unicode.py?

Doc/c-api/unicode.rst

vstinner · 2025-07-01T15:46:38Z

There's one special case, the Greek letter sigma, where the result of lower casing is context-specific. More specifically, Σ gets lower-cased to ς if it's at the end of the word or to σ otherwise.

Oh, that's a tricky case. Proposed API takes a single character, so we don't know if Σ is at the end of a word or not. I don't think that it's worth it to handle this special case in proposed API.

lysnikolaou · 2025-07-01T16:32:41Z

Can you try to add tests to Modules/_testcapi/unicode.c and Lib/test/test_capi/test_unicode.py?

Done.

serhiy-storchaka · 2025-07-01T16:53:18Z

If we add too many parameters and runtime checks, this will make the API slower and more difficult to use.

pythongh-76535: Make PyUnicode_ToLowerFull and friends public

431abba

Make `PyUnicode_ToLowerFull`, `PyUnicode_ToUpperFull` and `PyUnicode_ToTitleFull` public and rename them to `PyUnicode_ToLower` etc.

bedevere-app bot added the awaiting core review label Jul 1, 2025

bedevere-app bot mentioned this pull request Jul 1, 2025

Unclear intention of deprecating Py_UNICODE_TOLOWER / Py_UNICODE_TOUPPER #76535

Open

lysnikolaou mentioned this pull request Jul 1, 2025

gh-76535: Add C API functions for changing case of a single codepoint #117117

Closed

vstinner reviewed Jul 1, 2025

View reviewed changes

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

vstinner reviewed Jul 1, 2025

View reviewed changes

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

Address feedback; add size parameter and do PyUnicode_ToFolded as well

d604fc8

📜🤖 Added by blurb_it.

fbbf841

serhiy-storchaka reviewed Jul 1, 2025

View reviewed changes

vstinner reviewed Jul 1, 2025

View reviewed changes

Objects/unicodeobject.c Outdated Show resolved Hide resolved

Doc/c-api/unicode.rst Outdated Show resolved Hide resolved

Address more feedback; assert return value and raise ValueError

f17aa0c

vstinner reviewed Jul 1, 2025

View reviewed changes

Doc/c-api/unicode.rst Show resolved Hide resolved

Add tests

4a70489

Document the maximum numbers of characters needed in the buffer

61afd9a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-76535: Make `PyUnicode_ToLowerFull` and friends public #136176

gh-76535: Make `PyUnicode_ToLowerFull` and friends public #136176

lysnikolaou commented Jul 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

lysnikolaou commented Jul 1, 2025

Uh oh!

serhiy-storchaka left a comment

Uh oh!

lysnikolaou commented Jul 1, 2025

Uh oh!

vstinner left a comment

Uh oh!

Uh oh!

Uh oh!

vstinner commented Jul 1, 2025

Uh oh!

vstinner commented Jul 1, 2025

Uh oh!

lysnikolaou commented Jul 1, 2025

Uh oh!

vstinner left a comment

Uh oh!

Uh oh!

vstinner commented Jul 1, 2025

Uh oh!

lysnikolaou commented Jul 1, 2025

Uh oh!

serhiy-storchaka commented Jul 1, 2025

Uh oh!

Uh oh!

Uh oh!

gh-76535: Make PyUnicode_ToLowerFull and friends public #136176

Are you sure you want to change the base?

gh-76535: Make PyUnicode_ToLowerFull and friends public #136176

Conversation

lysnikolaou commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lysnikolaou commented Jul 1, 2025

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

lysnikolaou commented Jul 1, 2025

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vstinner commented Jul 1, 2025

Uh oh!

vstinner commented Jul 1, 2025

Uh oh!

lysnikolaou commented Jul 1, 2025

Uh oh!

vstinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vstinner commented Jul 1, 2025

Uh oh!

lysnikolaou commented Jul 1, 2025

Uh oh!

serhiy-storchaka commented Jul 1, 2025

Uh oh!

Uh oh!

gh-76535: Make `PyUnicode_ToLowerFull` and friends public #136176

gh-76535: Make `PyUnicode_ToLowerFull` and friends public #136176

lysnikolaou commented Jul 1, 2025 •

edited

Loading