You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I applied getvocab on a french text with the following line ./fast getvocab marie_claire.txt > new_vocab
However, I have seen a bug (if it is a bug!) : some tokens are duplicated, with the second copied token written with a line break. Here an example (it's just a cut extract of the full initial vocab output) :
You can see et and de in the example above. Furthermore, the vocab starts exactly as reported : a line break, a space and the frequence (2439). Still a bug ?
Hello guys,
I applied
getvocab
on a french text with the following line./fast getvocab marie_claire.txt > new_vocab
However, I have seen a bug (if it is a bug!) : some tokens are duplicated, with the second copied token written with a line break. Here an example (it's just a cut extract of the full initial vocab output) :
You can see
et
andde
in the example above. Furthermore, the vocab starts exactly as reported : a line break, a space and the frequence (2439). Still a bug ?Here the french text :
wget -O marie_claire.txt http://www.gutenberg.org/cache/epub/58501/pg58501.txt
Any idea ?
Thanks a lot for your help :)
The text was updated successfully, but these errors were encountered: