Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot use gensim 3.8.x when nltk package is installed #2697

Closed
yapus opened this issue Dec 4, 2019 · 8 comments · Fixed by #3012
Closed

Cannot use gensim 3.8.x when nltk package is installed #2697

yapus opened this issue Dec 4, 2019 · 8 comments · Fixed by #3012
Assignees

Comments

@yapus
Copy link

yapus commented Dec 4, 2019

Problem description

What are you trying to achieve? What is the expected result? What are you seeing instead?

In my script i'm trying to import gensim.models.keyedvectors and also import another package, that requires nltk package internally. Whenever i have NLTK installed in the same virtualenv (i'm not using virtualenv, but a docker image actually) - the gensim model fails to import.

Steps/code/corpus to reproduce

# pip list | grep -E 'gensim|nltk'
gensim                        3.8.1

# pip install nltk
Processing /root/.cache/pip/wheels/96/86/f6/68ab24c23f207c0077381a5e3904b2815136b879538a24b483/nltk-3.4.5-cp36-none-any.whl
Requirement already satisfied: six in /usr/local/lib/python3.6/site-packages (from nltk) (1.13.0)
Installing collected packages: nltk
Successfully installed nltk-3.4.5

# pip list | grep -E 'gensim|nltk'
gensim                        3.8.1
nltk                          3.4.5

# python
Python 3.6.8 (default, Jun 11 2019, 01:16:11)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/site-packages/gensim/__init__.py", line 5, in <module>
    from gensim import parsing, corpora, matutils, interfaces, models, similarities, summarization, utils  # noqa:F401
  File "/usr/local/lib/python3.6/site-packages/gensim/corpora/__init__.py", line 14, in <module>
    from .wikicorpus import WikiCorpus  # noqa:F401
  File "/usr/local/lib/python3.6/site-packages/gensim/corpora/wikicorpus.py", line 539, in <module>
    class WikiCorpus(TextCorpus):
  File "/usr/local/lib/python3.6/site-packages/gensim/corpora/wikicorpus.py", line 577, in WikiCorpus
    def __init__(self, fname, processes=None, lemmatize=utils.has_pattern(), dictionary=None,
  File "/usr/local/lib/python3.6/site-packages/gensim/utils.py", line 1614, in has_pattern
    from pattern.en import parse  # noqa:F401
  File "/usr/local/lib/python3.6/site-packages/pattern/text/en/__init__.py", line 61, in <module>
    from pattern.text.en.inflect import (
  File "/usr/local/lib/python3.6/site-packages/pattern/text/en/__init__.py", line 80, in <module>
    from pattern.text.en import wordnet
  File "/usr/local/lib/python3.6/site-packages/pattern/text/en/wordnet/__init__.py", line 57, in <module>
    nltk.data.find("corpora/" + token)
  File "/usr/local/lib/python3.6/site-packages/nltk/data.py", line 673, in find
    return find(modified_name, paths)
  File "/usr/local/lib/python3.6/site-packages/nltk/data.py", line 660, in find
    return ZipFilePathPointer(p, zipentry)
  File "/usr/local/lib/python3.6/site-packages/nltk/compat.py", line 228, in _decorator
    return init_func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/nltk/data.py", line 506, in __init__
    zipfile = OpenOnDemandZipFile(os.path.abspath(zipfile))
  File "/usr/local/lib/python3.6/site-packages/nltk/compat.py", line 228, in _decorator
    return init_func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/nltk/data.py", line 1055, in __init__
    zipfile.ZipFile.__init__(self, filename)
  File "/usr/local/lib/python3.6/zipfile.py", line 1131, in __init__
    self._RealGetContents()
  File "/usr/local/lib/python3.6/zipfile.py", line 1198, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Versions

 python
Python 3.6.8 (default, Jun 11 2019, 01:16:11)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import platform; print(platform.platform())
Linux-5.0.0-050000rc8-generic-x86_64-with-debian-9.11
>>> import sys; print("Python", sys.version)
Python 3.6.8 (default, Jun 11 2019, 01:16:11)
[GCC 6.3.0 20170516]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.17.4
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 1.3.3
>>> import gensim; print("gensim", gensim.__version__)
gensim 3.8.1
>>> from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION)
FAST_VERSION 1
@piskvorky
Copy link
Owner

piskvorky commented Dec 4, 2019

Looks like an issue pattern (a 3rd party optional dependency) again. Thanks for reporting.

@mpenkov how about we either A) drop pattern altogether, or B) import it only in functions where it's really needed? (as opposed to "import at module-level scope").

@piskvorky
Copy link
Owner

piskvorky commented Dec 4, 2019

Looking at the traceback more closely, B) doesn't seem possible, because of that top-level lemmatize=utils.has_pattern() call. So I'm in favour of A) then, or setting a lemmatize=False default.

And I guess there's also C) "wait for pattern maintainers to fix their stuff", but that sounds like a continued headache for us…

@yapus
Copy link
Author

yapus commented Dec 4, 2019

@piskvorky so is this actually an nltk problem not gensim? should i report to NLTK authors?

@piskvorky
Copy link
Owner

piskvorky commented Dec 4, 2019

According to your traceback, it is some problem inside the pattern library trying to use NLTK's wordnet:

File "/usr/local/lib/python3.6/site-packages/pattern/text/en/__init__.py", line 80, in <module>
    from pattern.text.en import wordnet
File "/usr/local/lib/python3.6/site-packages/pattern/text/en/wordnet/__init__.py", line 57, in <module>
    nltk.data.find("corpora/" + token)

I have no idea what that is, it's something internal to pattern.

@yapus
Copy link
Author

yapus commented Dec 4, 2019

ok, looks like it's actually already fixed in clips/pattern@master, so i'm good for now, thanks. as far as i got it - it's somehow related to Python 3.7

@piskvorky
Copy link
Owner

piskvorky commented Dec 4, 2019

Your traceback shows Python 3.6, not 3.7.

Anyway thanks for confirming. I'll leave this ticket open because we probably still want to avoid using pattern. Or at least defer importing it until absolutely the last moment possible (definitely not at Gensim import time, when people might not need it at all).

@astridesa
Copy link

Hey, have you resolved this error? I have the same error and have no idea how to fix it...

@yapus
Copy link
Author

yapus commented Dec 16, 2020

@GaoFansakura i just used clips/pattern@master module (not clips/pattern from pip)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants