Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create v0.1.3 Release #21

Merged
merged 9 commits into from
Apr 30, 2024
Merged

Create v0.1.3 Release #21

merged 9 commits into from
Apr 30, 2024

Conversation

GrammAcc
Copy link
Owner

No description provided.

I added a `CLDR_VERSION` variable to the __init__.py for the linearmoney
package.

In order to have the CLDR_VERSION update automatically with the version
of the cldr data that is parsed into the library, I added a few lines
to the process_cldr_data.py script to open the
cldr-core package of the cldr-json data and read the `version` string.

I then write the version string to a new package resource file
`cldr_version.json`, which is loaded at import-time via the `resources`
module in order to ensure the CLDR_VERSION gives the correct version
information for the cldr data the library is currently using, even if
the user manually runs the processing script with a different version
of the cldr-json data than what linearmoney is distributed with and
rebuilds the package.

This should allow proper automated reporting of the correct cldr data
version without coupling the cldr version to the library source code.

This is not the most ideal solution as it requires a file read at
import-time, but it only happens once, and it is a very simple and
flexible implementation. A more robust solution that doesn't require
a package resource file read is likely possible, but it doesn't
provide much practical benefit over the simple package resource
solution, and I don't have the time to put into it right now, so I am
leaving this as is for now. This is something that can be improved in
the future though.
I added a `tests/cldr.py` test script for regression testing the cldr
data processing script when making changes.

The test script simply checks the produced JSON files against files
produced by a known correct version of the script. Basically, it's just
a snapshot test.

This means that the reference JSON files in the tests directory need to be
updated when the version of the CLDR data is updated, but I have the
`cldr_version.json` file as part of the test data, so if the version is
accidentally updated, then the test suite will fail on the check of that
file, and we will know that we need to either roll back changes or
verify that the new version works as expected and update the reference
JSON.
When I wrapped the loading of locale data from the CLDR json files into
memory in a function, I forgot to call the function afterwards, so the
cldr data processing script was not creating any locale data.

This is fixed.

I also added a new `cldr` environment to Hatch for building and testing
the cldr data. The tests are working correctly. That's actually how I
noticed that the script was broken and where to look for the problem.
I added a section to the CONTRIBUTING.md file explaining the process for
updating the CLDR data used by the library.

I also changed the structure of the cldr test data to keep the test
script and all data in a dedicated `tests/cldr` directory instead of
having everything sitting in the top-level `tests` directory.
fix #11

I finished the primary refactoring the `process_cldr_data.py` script.

It's still not great, but it's broken up into more clearly delineated
function now, so it should be much easier to make changes to the way we
parse the data or to change the output format.

Through this refactoring process, I also found a couple bugs in the
script that were causing the incorrect data to be parsed, so this commit
also includes updated currencies and locales data.

Specifically, several currencies were being parsed as having
`cash_places = 2` when they should have been `cash_places = 0`. This was
caused by using `j` instead of `i` in the loop in the old
`_build_fractions_data` function. This caused the branch that read the
data for `cash_places` to get skipped for all currencies, so any that
had specific `cash_places` would instead get the value of the `DEFAULT`
currency.

The other problem was that the comparisons with `" "` weren't working
for locales that use some other unicode whitespace character for their
spaces in the formatting patterns. This caused several locales to have
incorrect currency symbol and positive/negative sign placement.

I added a helper function for normalizing whitespace with the
`unicodedata` module, and this is now fixed. The new data should have
the correct sign and symbol placement and spacing.
The updates to the locales data affected the fr_FR locale, which was
being parsed incorrectly before and not including the space between the
numeric portion and the currency symbol.

The changes to the cldr data processing script fix this error in the
parsed locales data, but I forgot to update the test case that checkes
for non-ascii characters in the `scalars.l10n` function. This was not
caught because the venv that hatch created did not update with the new
data files when running the test suite locally, so it was not caught
until CI ran on the PR for this.

I manually recreated the venv locally to confirm the changes fix the
failing test case and reflect the correct localization formatting.
This PR includes the refactorings in the `process_cldr_data.py` script,
which was unmaintainable before.

I broke the script up into named functions that should make it easier to
make changes to the way the data is parsed or the format of the output
without breaking anything going forward.

The actual parsing of the data has been made somewhat more intelligible,
but there's still a lot of if chains in this script, so it's still not
easy to understand everything that's going on. There's still room for
improvement here.

I also added parsing of the version string of the CLDR data itself to
the script, so we now have an automated way to keep track of the
specific version of CLDR data that is shipped with the library. This is
accessible to the user through the `resources` module, which could help
with bug reports. Mainly though, this just makes it a lot easier to keep
track of what CLDR data is included with specific versions of the
linearmoney library. I've documented that a change in CLDR data version
constitutes a minor version change in linearmoney since this doesn't
break any API compatibility, but it can change the output presented to
the user, which could have unexpected consequences.

This PR also adds a `cldr` hatch environment with commands `cldr:build`
and `cldr:test`, which parse and test the data respectively. The
`cldr:test` command simply does a snapshot test comparing the current
resource files in the source directory to a known good version of the
resources stored in the tests/cldr directory. This ensures that running
the data processing script or `cldr:build` does not break anything. It
also checks the CLDR version file. This is helpful because if a
contributor has cloned the cldr-json repo and have a newer version than
what is shipped with linearmoney, then running the processing script
will likely change the output of functions like
`linearmoney.scalar.l10n` even if the data is *correct*. We want to
avoid making changes to the cldr data version like this unless the
decision was explicit.

I also added some instructions related to updating the CLDR data to the
CONTRIBUTING.md document.
Bumped version to 0.1.3 to create a release since the cldr data included
in v0.1.2 on pypi is incorrect. Version 0.1.3 will include the corrected
cldr data that was added with the cldr data processing refactor. See #20
@GrammAcc GrammAcc merged commit 5bc3a0b into main Apr 30, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant