Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when training doc2vec #2942

Closed
Paul-E opened this issue Sep 10, 2020 · 5 comments
Closed

Segfault when training doc2vec #2942

Paul-E opened this issue Sep 10, 2020 · 5 comments

Comments

@Paul-E
Copy link

Paul-E commented Sep 10, 2020

Problem description

When attempting to train doc2vec, gensim segfaults.

Steps/code/corpus to reproduce

I run the code:

import faulthandler
import gensim
faulthandler.enable()
model = gensim.models.doc2vec.Doc2Vec(corpus_file = "yelp_tripadvisor_linesentence.txt", vector_size=250, min_count=10, epochs=40, workers = 5)

I get the output:

Fatal Python error: Segmentation fault

Current thread 0x00007f2d9effd700 (most recent call first):
  File "/home/paul/.local/lib/python3.8/site-packages/gensim/models/doc2vec.py", line 431 in _do_train_epoch
  File "/home/paul/.local/lib/python3.8/site-packages/gensim/models/base_any2vec.py", line 172 in _worker_loop_corpusfile
  File "/usr/lib/python3.8/threading.py", line 870 in run
  File "/usr/lib/python3.8/threading.py", line 932 in _bootstrap_inner
  File "/usr/lib/python3.8/threading.py", line 890 in _bootstrap

Thread 0x00007f2d9f7fe700 (most recent call first):
  File "/home/paul/.local/lib/python3.8/site-packages/gensim/models/doc2vec.py", line 431 in _do_train_epoch
  File "/home/paul/.local/lib/python3.8/site-packages/gensim/models/base_any2vec.py", line 172 in _worker_loop_corpusfile
  File "/usr/lib/python3.8/threading.py", line 870 in run
  File "/usr/lib/python3.8/threading.py", line 932 in _bootstrap_inner
  File "/usr/lib/python3.8/threading.py", line 890 in _bootstrap

Thread 0x00007f2d9ffff700 (most recent call first):
  File "/home/paul/.local/lib/python3.8/site-packages/gensim/models/doc2vec.py", line 431 in _do_train_epoch
  File "/home/paul/.local/lib/python3.8/site-packages/gensim/models/base_any2vec.py", line 172 in _worker_loop_corpusfile
  File "/usr/lib/python3.8/threading.py", line 870 in run
  File "/usr/lib/python3.8/threading.py", line 932 in _bootstrap_inner
  File "/usr/lib/python3.8/threading.py", line 890 in _bootstrap

Thread 0x00007f2da48df700 (most recent call first):
  File "/home/paul/.local/lib/python3.8/site-packages/gensim/models/doc2vec.py", line 431 in _do_train_epoch
  File "/home/paul/.local/lib/python3.8/site-packages/gensim/models/base_any2vec.py", line 172 in _worker_loop_corpusfile
  File "/usr/lib/python3.8/threading.py", line 870 in run
  File "/usr/lib/python3.8/threading.py", line 932 in _bootstrap_inner
  File "/usr/lib/python3.8/threading.py", line 890 in _bootstrap

Thread 0x00007f2da50e0700 (most recent call first):
  File "/home/paul/.local/lib/python3.8/site-packages/gensim/models/doc2vec.py", line 431 in _do_train_epoch
  File "/home/paul/.local/lib/python3.8/site-packages/gensim/models/base_any2vec.py", line 172 in _worker_loop_corpusfile
  File "/usr/lib/python3.8/threading.py", line 870 in run
  File "/usr/lib/python3.8/threading.py", line 932 in _bootstrap_inner
  File "/usr/lib/python3.8/threading.py", line 890 in _bootstrap

Thread 0x00007f3055bd1740 (most recent call first):
  File "/usr/lib/python3.8/threading.py", line 302 in wait
  File "/usr/lib/python3.8/queue.py", line 170 in get
  File "/home/paul/.local/lib/python3.8/site-packages/gensim/models/base_any2vec.py", line 345 in _log_epoch_progress
  File "/home/paul/.local/lib/python3.8/site-packages/gensim/models/base_any2vec.py", line 430 in _train_epoch_corpusfile
  File "/home/paul/.local/lib/python3.8/site-packages/gensim/models/base_any2vec.py", line 554 in train
  File "/home/paul/.local/lib/python3.8/site-packages/gensim/models/base_any2vec.py", line 1063 in train
  File "/home/paul/.local/lib/python3.8/site-packages/gensim/models/doc2vec.py", line 554 in train
  File "/home/paul/.local/lib/python3.8/site-packages/gensim/models/doc2vec.py", line 360 in __init__
  File "reproduce_segfault.py", line 4 in <module>
Segmentation fault (core dumped)

When run in gdb I get:

Thread 36 "python3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffd450ca700 (LWP 112905)]
0x00007fffc9347737 in saxpy_kernel_16 ()
   from /home/paul/.local/lib/python3.8/site-packages/scipy/spatial/../../scipy.libs/libopenblasp-r0-085ca80a.3.9.so

The backtrace I get is:

(gdb) backtrace
#0  0x00007fffc9347737 in saxpy_kernel_16 ()
   from /home/paul/.local/lib/python3.8/site-packages/scipy/spatial/../../scipy.libs/libopenblasp-r0-085ca80a.3.9.so
#1  0x00007fffc934792f in saxpy_k_ZEN ()
   from /home/paul/.local/lib/python3.8/site-packages/scipy/spatial/../../scipy.libs/libopenblasp-r0-085ca80a.3.9.so
#2  0x00007fffc84402cb in saxpy_ ()
   from /home/paul/.local/lib/python3.8/site-packages/scipy/spatial/../../scipy.libs/libopenblasp-r0-085ca80a.3.9.so
#3  0x00007fffa0e81782 in ?? ()
   from /home/paul/.local/lib/python3.8/site-packages/gensim/models/doc2vec_corpusfile.cpython-38-x86_64-linux-gnu.so
#4  0x00007fffa0e8243f in ?? ()
   from /home/paul/.local/lib/python3.8/site-packages/gensim/models/doc2vec_corpusfile.cpython-38-x86_64-linux-gnu.so
#5  0x00000000005f17e5 in cfunction_call_varargs (kwargs=<optimized out>, args=<optimized out>, 
    func=<built-in function d2v_train_epoch_dm>) at ../Objects/call.c:772
#6  PyCFunction_Call (func=<built-in function d2v_train_epoch_dm>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:772
#7  0x00000000005f2406 in _PyObject_MakeTpCall (callable=<built-in function d2v_train_epoch_dm>, args=<optimized out>, 
    nargs=<optimized out>, keywords=<optimized out>) at ../Include/internal/pycore_pyerrors.h:13
#8  0x000000000056cfd4 in _PyObject_Vectorcall (kwnames=('doctag_vectors', 'doctag_locks'), nargsf=<optimized out>, 
    args=<optimized out>, callable=<built-in function d2v_train_epoch_dm>) at ../Include/cpython/abstract.h:125
#9  _PyObject_Vectorcall (kwnames=('doctag_vectors', 'doctag_locks'), nargsf=<optimized out>, args=<optimized out>, 
    callable=<built-in function d2v_train_epoch_dm>) at ../Include/cpython/abstract.h:115
#10 call_function (kwnames=('doctag_vectors', 'doctag_locks'), oparg=<optimized out>, pp_stack=<synthetic pointer>, 
    tstate=<optimized out>) at ../Python/ceval.c:4987
#11 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3515
#12 0x0000000000565972 in PyEval_EvalFrameEx (throwflag=0, 
    f=Frame 0x7ffd34001710, for file /home/paul/.local/lib/python3.8/site-packages/gensim/models/doc2vec.py, line 1199, in _do_train_epoch (self=<Doc2Vec(sg=0, alpha=<float at remote 0x7fffa19bd8d0>, window=5, random=<numpy.random.mtrand.RandomState at remote 0x7fffa09f4640>, min_alpha=<float at remote 0x7fffa19bd910>, hs=0, negative=5, ns_exponent=<float at remote 0x7fffa19bd930>, cbow_mean=1, compute_loss=False, running_training_loss=<float at remote 0x7fffa19bd870>, min_alpha_yet_reached=<float at remote 0x7fffa19bd8d0>, corpus_count=9643078, corpus_total_words=1099181249, vector_size=250, workers=5, epochs=40, train_count=0, total_train_time=0, batch_words=10000, model_trimmed_--Type <RE--Type <RET> for more, q to quit, c to contin--Type <RET> for more, q to quit, c to continue without--Type <RET> for more, q --Type <RET> fo--Typ--Typ--Type <RET> for more, q to quit, c to continue without paging--
post_training=False, callbacks=(), load=<function at remote 0x7ffff412f310>, dbow_words=0, dm_concat=0, dm_tag_count=1, vocabulary=<Doc2VecVocab(max_vocab_size=None, min_count=10, sample=<float at remote 0x7fffa12f9670>, sorted_vocab=True, null_word=0, cum_table=<numpy.ndarray at remote 0x7fffa0998c10>, raw_vocab={}, max_final_vocab=None,...(truncated)) at ../Python/ceval.c:741
#13 _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, 
    kwnames=<optimized out>, kwargs=0x7fffa0a01d68, kwcount=<optimized out>, kwstep=1, defs=0x7fffa12ad0f8, defcount=4, kwdefs=0x0, closure=0x0, name='_do_train_epoch', 
    qualname='Doc2Vec._do_train_epoch') at ../Python/ceval.c:4298
#14 0x00000000005f1d85 in _PyFunction_Vectorcall (func=<optimized out>, stack=0x7fffa0a01d30, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:435
#15 0x0000000000507729 in _PyObject_Vectorcall (
    kwnames=('total_examples', 'total_words', 'start_alpha', 'end_alpha', 'word_count', 'compute_loss', 'offsets', 'start_doctags'), nargsf=7, args=0x7fffa0a01d30, 
    callable=<function at remote 0x7fffa12ba3a0>) at ../Include/cpython/abstract.h:127
#16 method_vectorcall (method=<optimized out>, args=<optimized out>, nargsf=<optimized out>, 
    kwnames=('total_examples', 'total_words', 'start_alpha', 'end_alpha', 'word_count', 'compute_loss', 'offsets', 'start_doctags')) at ../Objects/classobject.c:89
#17 0x00000000005f1107 in PyVectorcall_Call (kwargs=<optimized out>, tuple=<optimized out>, callable=<method at remote 0x7fff9d8c7600>) at ../Objects/call.c:199
#18 PyObject_Call (callable=<method at remote 0x7fff9d8c7600>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:227
#19 0x0000000000568e1f in do_call_core (
    kwdict={'total_examples': 9643078, 'total_words': 1099181249, 'start_alpha': <float at remote 0x7fffa19bd8d0>, 'end_alpha': <float at remote 0x7fffa19bd910>, 'word_count': 0, 'compute_loss': False, 'offsets': [0, 1186792315, 2373585688, 3560378663, 4747171525], 'start_doctags': [0, 1296629, 3235497, 5388103, 7520884]}, 
    callargs=('yelp_tripadvisor_linesentence.txt', 4, <float at remote 0x7fff98254b10>, <gensim.models.word2vec_corpusfile.CythonVocab at remote 0x7fff9dde38e0>, (<numpy.ndarray at remote 0x7fff9de26170>, <numpy.ndarray at remote 0x7fff9de26990>), 0), func=<method at remote 0x7fff9d8c7600>, tstate=<optimized out>)
    at ../Python/ceval.c:5034
#20 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3559
#21 0x0000000000565972 in PyEval_EvalFrameEx (throwflag=0, 
    f=Frame 0x7ffd34000ba0, for file /home/paul/.local/lib/python3.8/site-packages/gensim/models/base_any2vec.py, line 940, in _worker_loop_corpusfile (self=<Doc2Vec(sg=0, alpha=<float at remote 0x7fffa19bd8d0>, window=5, random=<numpy.random.mtrand.RandomState at remote 0x7fffa09f4640>, min_alpha=<float at remote 0x7fffa19bd910>, hs=0, negative=5, ns_exponent=<float at remote 0x7fffa19bd930>, cbow_mean=1, compute_loss=False, running_training_loss=<float at remote 0x7fffa19bd870>, min_alpha_yet_reached=<float at remote 0x7fffa19bd8d0>, corpus_count=9643078, corpus_total_words=1099181249, vector_size=250, workers=5, epochs=40, train_count=0, total_train_time=0, batch_words=10000, model_trimmed_post_training=False, callbacks=(), load=<function at remote 0x7ffff412f310>, dbow_words=0, dm_concat=0, dm_tag_count=1, vocabulary=<Doc2VecVocab(max_vocab_size=None, min_count=10, sample=<float at remote 0x7fffa12f9670>, sorted_vocab=True, null_word=0, cum_table=<numpy.ndarray at remote 0x7fffa0998c10>, raw_vocab={}, max_final...(truncated)) at ../Python/ceval.c:741
#22 _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=<optimized out>, 
    kwnames=<optimized out>, kwargs=0x7fffa0a01ce0, kwcount=<optimized out>, kwstep=1, defs=0x7fffa178ba58, defcount=3, kwdefs=0x0, closure=0x0, 
    name='_worker_loop_corpusfile', qualname='BaseAny2VecModel._worker_loop_corpusfile') at ../Python/ceval.c:4298
#23 0x00000000005f1d85 in _PyFunction_Vectorcall (func=<optimized out>, stack=0x7fffa0a01cb0, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:435
#24 0x0000000000507729 in _PyObject_Vectorcall (
    kwnames=('start_alpha', 'end_alpha', 'word_count', 'compute_loss', 'offsets', 'start_doctags', 'cur_epoch', 'total_examples', 'total_words'), nargsf=6, 
    args=0x7fffa0a01cb0, callable=<function at remote 0x7fffa17970d0>) at ../Include/cpython/abstract.h:127
#25 method_vectorcall (method=<optimized out>, args=<optimized out>, nargsf=<optimized out>, 
    kwnames=('start_alpha', 'end_alpha', 'word_count', 'compute_loss', 'offsets', 'start_doctags', 'cur_epoch', 'total_examples', 'total_words'))
    at ../Objects/classobject.c:89
#26 0x00000000005f1107 in PyVectorcall_Call (kwargs=<optimized out>, tuple=<optimized out>, callable=<method at remote 0x7fff9d84cdc0>) at ../Objects/call.c:199
#27 PyObject_Call (callable=<method at remote 0x7fff9d84cdc0>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:227
#28 0x0000000000568e1f in do_call_core (
    kwdict={'start_alpha': <float at remote 0x7fffa19bd8d0>, 'end_alpha': <float at remote 0x7fffa19bd910>, 'word_count': 0, 'compute_loss': False, 'offsets': [0, 1186792315, 2373585688, 3560378663, 4747171525], 'start_doctags': [0, 1296629, 3235497, 5388103, 7520884], 'cur_epoch': 0, 'total_examples': 9643078, 'total_words': 1099181249}, 
    callargs=('yelp_tripadvisor_linesentence.txt', 4, <float at remote 0x7fff98254b10>, <gensim.models.word2vec_corpusfile.CythonVocab at remote 0x7fff9dde38e0>, <Queue(maxsize=0, queue=<collections.deque at remote 0x7fff9dde3d00>, mutex=<_thread.lock at remote 0x7fff98710420>, not_empty=<Condition(_lock=<_thread.lock at remote 0x7fff98710420>, acquire=<built-in method acquire of _thread.lock object at remote 0x7fff98710420>, release=<built-in method release of _thread.lock object at remote 0x7fff98710420>, _waiters=<collections.deque at remote 0x7fff9dde3ca0>) at remote 0x7fff98710460>, not_full=<Condition(_lock=<_thread.lock at remote 0x7fff98710420>, acquire=<built-in method acquire of _thread.lock object at remote 0x7fff98710420>, release=<built-in method release of _thread.lock object at remote 0x7fff98710420>, _waiters=<collections.deque at remote 0x7fff9dde3c40>) at remote 0x7fff987104c0>, all_tasks_done=<Condition(_lock=<_thread.lock at remote 0x7fff98710420>, acquire=<built-in method acquire of _thre--Type <RET> for more, q to quit, c to continue without paging--
ad.lock objec...(truncated), func=<method at remote 0x7fff9d84cdc0>, tstate=<optimized out>) at ../Python/ceval.c:5034
#29 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3559
#30 0x00000000005f1b8b in PyEval_EvalFrameEx (throwflag=0, 
    f=Frame 0x7fff9f3ee740, for file /usr/lib/python3.8/threading.py, line 870, in run (self=<Thread(_target=<method at remote 0x7fff9d84cdc0>, _name='Thread-5', _args=('yelp_tripadvisor_linesentence.txt', 4, <float at remote 0x7fff98254b10>, <gensim.models.word2vec_corpusfile.CythonVocab at remote 0x7fff9dde38e0>, <Queue(maxsize=0, queue=<collections.deque at remote 0x7fff9dde3d00>, mutex=<_thread.lock at remote 0x7fff98710420>, not_empty=<Condition(_lock=<_thread.lock at remote 0x7fff98710420>, acquire=<built-in method acquire of _thread.lock object at remote 0x7fff98710420>, release=<built-in method release of _thread.lock object at remote 0x7fff98710420>, _waiters=<collections.deque at remote 0x7fff9dde3ca0>) at remote 0x7fff98710460>, not_full=<Condition(_lock=<_thread.lock at remote 0x7fff98710420>, acquire=<built-in method acquire of _thread.lock object at remote 0x7fff98710420>, release=<built-in method release of _thread.lock object at remote 0x7fff98710420>, _waiters=<collections.deque at remote 0x7fff9dd...(truncated)) at ../Python/ceval.c:741
#31 function_code_fastcall (globals=<optimized out>, nargs=<optimized out>, args=<optimized out>, co=<optimized out>) at ../Objects/call.c:283
#32 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:410
#33 0x00000000005677c7 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fff9f35c7b8, callable=<function at remote 0x7ffff732e9d0>)
    at ../Include/cpython/abstract.h:127
#34 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0xac0530) at ../Python/ceval.c:4987
#35 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3486
#36 0x00000000005f1b8b in PyEval_EvalFrameEx (throwflag=0, 
    f=Frame 0x7fff9f35c640, for file /usr/lib/python3.8/threading.py, line 932, in _bootstrap_inner (self=<Thread(_target=<method at remote 0x7fff9d84cdc0>, _name='Thread-5', _args=('yelp_tripadvisor_linesentence.txt', 4, <float at remote 0x7fff98254b10>, <gensim.models.word2vec_corpusfile.CythonVocab at remote 0x7fff9dde38e0>, <Queue(maxsize=0, queue=<collections.deque at remote 0x7fff9dde3d00>, mutex=<_thread.lock at remote 0x7fff98710420>, not_empty=<Condition(_lock=<_thread.lock at remote 0x7fff98710420>, acquire=<built-in method acquire of _thread.lock object at remote 0x7fff98710420>, release=<built-in method release of _thread.lock object at remote 0x7fff98710420>, _waiters=<collections.deque at remote 0x7fff9dde3ca0>) at remote 0x7fff98710460>, not_full=<Condition(_lock=<_thread.lock at remote 0x7fff98710420>, acquire=<built-in method acquire of _thread.lock object at remote 0x7fff98710420>, release=<built-in method release of _thread.lock object at remote 0x7fff98710420>, _waiters=<collections.deque at rem...(truncated)) at ../Python/ceval.c:741
#37 function_code_fastcall (globals=<optimized out>, nargs=<optimized out>, args=<optimized out>, co=<optimized out>) at ../Objects/call.c:283
#38 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:410
#39 0x00000000005677c7 in _PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x7fff9f3ee6f8, callable=<function at remote 0x7ffff732eca0>)
    at ../Include/cpython/abstract.h:127
#40 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0xac0530) at ../Python/ceval.c:4987
#41 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at ../Python/ceval.c:3486
#42 0x00000000005f1b8b in PyEval_EvalFrameEx (throwflag=0, 
    f=Frame 0x7fff9f3ee580, for file /usr/lib/python3.8/threading.py, line 890, in _bootstrap (self=<Thread(_target=<method at remote 0x7fff9d84cdc0>, _name='Thread-5', _args=('yelp_tripadvisor_linesentence.txt', 4, <float at remote 0x7fff98254b10>, <gensim.models.word2vec_corpusfile.CythonVocab at remote 0x7fff9dde38e0>, <Queue(maxsize=0, queue=<collections.deque at remote 0x7fff9dde3d00>, mutex=<_thread.lock at remote 0x7fff98710420>, not_empty=<Condition(_lock=<_thread.lock at remote 0x7fff98710420>, acquire=<built-in method acquire of _thread.lock object at remote 0x7fff98710420>, release=<built-in method release of _thread.lock object at remote 0x7fff98710420>, _waiters=<collections.deque at remote 0x7fff9dde3ca0>) at remote 0x7fff98710460>, not_full=<Condition(_lock=<_thread.lock at remote 0x7fff98710420>, acquire=<built-in method acquire of _thread.lock object at remote 0x7fff98710420>, release=<built-in method release of _thread.lock object at remote 0x7fff98710420>, _waiters=<collections.deque at remote 0x...(truncated)) at ../Python/ceval.c:741
#43 function_code_fastcall (globals=<optimized out>, nargs=<optimized out>, args=<optimized out>, co=<optimized out>) at ../Objects/call.c:283
#44 _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:410
#45 0x000000000050722c in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>)
    at ../Include/cpython/abstract.h:127
#46 method_vectorcall (method=<optimized out>, args=0x7ffff7634058, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/classobject.c:89
#47 0x00000000005f1107 in PyVectorcall_Call (kwargs=<optimized out>, tuple=<optimized out>, callable=<method at remote 0x7fff9d8c7540>) at ../Objects/call.c:199
#48 PyObject_Call (callable=<method at remote 0x7fff9d8c7540>, args=<optimized out>, kwargs=<optimized out>) at ../Objects/call.c:227
#49 0x000000000064fb98 in t_bootstrap (boot_raw=boot_raw@entry=0x7fff9f33a150) at ../Modules/_threadmodule.c:1002
#50 0x000000000066ee14 in pythread_wrapper (arg=<optimized out>) at ../Python/thread_pthread.h:237
#51 0x00007ffff7d96609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#52 0x00007ffff7ed2103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

I can provide the corpus at request.

Versions

Linux-5.4.0-45-generic-x86_64-with-glibc2.29
Python 3.8.2 (default, Jul 16 2020, 14:00:26) 
[GCC 9.3.0]
Bits 64
NumPy 1.19.2
SciPy 1.5.2
gensim 3.8.3
FAST_VERSION 1
@Paul-E Paul-E changed the title Segfault when trainding doc2vec Segfault when training doc2vec Sep 10, 2020
@gojomo
Copy link
Collaborator

gojomo commented Sep 11, 2020

Thanks for the detailed report! Can you say a little more about the corpus size? If enabling logging at the INFO level, how much progress is shown before the fault? Is it fast always at the same point?

@Paul-E
Copy link
Author

Paul-E commented Sep 11, 2020

The corpus has 9,643,078 documents and 1,099,181,249 total words.

I forgot to include that I am running this on Ubuntu 20.04.

Attached is the output from setting logging to INFO

train_logs.txt

@gojomo
Copy link
Collaborator

gojomo commented Sep 11, 2020

This may be the same issue as #2894 - fixed in the develop branch. If you're able to test with a development code checkout (which might require other changes in your code, though not in the single line of instantiation code you've shown above), you might not see the crash.

Essentially: instead of using a package from PyPI or Conda repos: do a git checkout; ensure your system has key Ubuntu packages like build-essentials and Python packages like Cython; do a pip install -e . from within the project directory.

@gojomo
Copy link
Collaborator

gojomo commented Sep 11, 2020

(Also: that bug is only in the corpus_file path, so another workaround could be to supply your docs via the traditional iterable-of-TaggedDocument-instances API. That won't achieve as much utilization/throughput with as many workers, but if training succeeds, it'll be a slower option & confirm the problem is specific to corpus_file and probably the same as #2894.)

@Paul-E
Copy link
Author

Paul-E commented Sep 15, 2020

I have successfully trained my model by installing gensim from github. Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants