Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: characters with special characters are secretly two characters #440

Open
nienna73 opened this issue Apr 5, 2023 · 0 comments
Open
Assignees

Comments

@nienna73
Copy link
Contributor

nienna73 commented Apr 5, 2023

When importing pretty much all of Jean's recordings, any character with a circumflex or a macron gets stored as two characters: ˆ+ e instead of ê. There's a script in progress that's supposed to find the unicode character for ˆ, which is \xcc\x82 and replace that with the correct single character.

This script isn't working.

The only way I've done this successfully is by doing it manually.

The script is here: https://github.com/UAlbertaALTLab/recording-validation-interface/blob/production/validation/management/commands/lookforcombinedcharacters.py

@fbanados fbanados self-assigned this Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants