-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation faults with a small corpus #37
Comments
Does the query program work as expected? There are a lot of unaligned accesses by design; are you using x86_64? |
Oh also if you specify a vocab id that's out of range, it reserves the right to segfault. |
Thanks! I fixed my issue with TrieModel. Calling As for ProbingModel, if the Valgrind's invalid writes are expected, then it's fine. I guess JNA or the JVM are to blame for the segfaults. FYI, on the TrieModel, the JVM complains here:
I am checking the vocab id with I'm using a x86_64 indeed. The query program is working well. |
Actually, I got lucky that |
Hi,
I can't get KenLM working on my corpus.
I've followed the usual steps:
./bin/lmplz -T /tmp/ --text corpus.txt --arpa myarpa.arpa
./bin/build_binary myarpa.arpa my_probing_model.mmap
Then I tried the snippet from here:
https://kheafield.com/code/kenlm/developers/
With a TrieModel, it always ends with a segfault, regardless of MAX_ORDER. The error occurs here:
With a ProbingModel, I get a segfault only for MAX_ORDER < 5:
For MAX_ORDER = 5, the C++ program runs only with a couple of Valgrind errors:
But a JNA wrapper around the same snippet raises a "malloc(): memory corruption" when loading the model.
I tried with and without pruning, with order 2 and 3, both with KenLM from the download section and this of github. The size of the corpus is about 1Gb.
One peculiarity of the vocabulary is that it contains A LOT of words that are substring of other words of the vocabulary.
I'm aware that it's probably not enough information for proper debugging, but I would be interested to know whether the valgrind errors are ok and if you can suggest me anything to help me find the problem.
My system is Mint 17. The compilation succeeded with no warning.
The text was updated successfully, but these errors were encountered: