-
-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spellcheck does not recognise words containing hyphens #184
Comments
It's not fixable in the tokenizer, but in the SpellChecker class we do already have some special cases. Currently there's one character of leading and trailing context that's used. Technically that could be extended to cover this, but I'm not really convinced this is worth it. I can't really think of many examples in languages I know, only some weird expressions like hanky-panky or topsy-turvy. https://github.com/otsaloma/gaupol/blob/master/aeidon/spell.py#L61
I had to pick a name when separating that user-interface independent module from the codebase. Gaupol doesn't mean anything either, so I continued with same style and also wanted the length to match, so that I could do a search replace across the codebase without needing to manually fix some hanging indents. |
Here are some ugly one-liners that seem to show words that would not be recognised:
There are probably some false positives but it's still not negligible IMO. Do you think it would affect performance a lot to take those into account? If we do, we also need to take into account words like |
I can't run those greps, my Debian doesn't seem to have myspell, only hunspell files and they probably have a different format. I think maybe we could make the function signature def check(self, word, extended_word="", leading_context="", trailing_context=""): And that |
Hi,
I'm not sure how to work around this issue but I see that the spell checker tokenizer splits words on hyphens:
gaupol/aeidon/spell.py
Line 255 in 30a2ed8
This can break spellchecking for the following French subtitle:
Although both words are present in the dictionary, they won't be recognised because neither
twin
ortalkie
are listed:I think in general splitting on hyphens is a good idea but maybe we could do something to expand the selection to the full word when the checker returns a mistake. It doesn't seem very straightforward, do you suppose it's worth it?
Off-topic questtion: does the name "aeidon" mean anything?
The text was updated successfully, but these errors were encountered: