Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve NumPy compatibility hell #3231

Closed
wants to merge 12 commits into from
94 changes: 15 additions & 79 deletions .github/workflows/build-wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,93 +21,18 @@ jobs:
os: [ubuntu-latest, macos-latest, windows-latest]
platform: [x64]
include:
#
# We want the _oldest_ possible manylinux version to ensure our
# wheels work on the widest possible range of distros. Version 1
# seems to break for certain Python versions under Linux and Windows,
# so we use 2010, which is the next oldest.
#
# When selecting the numpy version to build against, we need to satisfy
# two conditions. First, we want the wheel to be available for the
# version of Python we're building against, because building numpy
# wheels on our own is too much work.
#
# Second, in order to guarantee compatibility with the greatest range
# of numpy versions, we want to build against the oldest possible numpy
# version, as long as it's 1.17.0 or newer. Building versions earlier
# than 1.17.0 yields wheels that are incompatible with some newer
# versions of numpy. See https://github.com/RaRe-Technologies/gensim/issues/3226
# for details.
#
# The logic for numpy version selection is based on
# https://github.com/scipy/oldest-supported-numpy/blob/master/setup.cfg
# with the exception that we enforce the minimum version to be 1.17.0.
#
- os: ubuntu-latest
manylinux-version: 2010
python-version: 3.6
build-depends: numpy==1.17.0

- os: ubuntu-latest
manylinux-version: 2010
python-version: 3.7
build-depends: numpy==1.17.0

- os: ubuntu-latest
manylinux-version: 2010
python-version: 3.8
build-depends: numpy==1.17.3

- os: ubuntu-latest
manylinux-version: 2010
python-version: 3.9
build-depends: numpy==1.19.3

- os: macos-latest
travis-os-name: osx
manylinux-version: 1
python-version: 3.6
build-depends: numpy==1.17.0

- os: macos-latest
travis-os-name: osx
manylinux-version: 1
python-version: 3.7
build-depends: numpy==1.17.0

- os: macos-latest
travis-os-name: osx
manylinux-version: 1
python-version: 3.8
build-depends: numpy==1.21.0

- os: macos-latest
travis-os-name: osx
travis-os-name: osx # For multibuild
manylinux-version: 1
python-version: 3.9
build-depends: numpy==1.21.0

- os: windows-latest
manylinux-version: 2010
python-version: 3.6
build-depends: numpy==1.17.0

- os: windows-latest
manylinux-version: 2010
python-version: 3.7
build-depends: numpy==1.17.0

- os: windows-latest
manylinux-version: 2010
python-version: 3.8
build-depends: numpy==1.17.3

- os: windows-latest
manylinux-version: 2010
python-version: 3.9
build-depends: numpy==1.19.3

env:
SKIP_NETWORK_TESTS: 1
PKG_NAME: gensim
REPO_DIR: gensim
BUILD_COMMIT: HEAD
Expand All @@ -117,11 +42,10 @@ jobs:
TEST_DEPENDS: Morfessor==2.0.2a4 python-levenshtein==0.12.0 visdom==0.1.8.9 pytest mock cython nmslib pyemd testfixtures scikit-learn pyemd
DOCKER_TEST_IMAGE: multibuild/xenial_x86_64
TRAVIS_OS_NAME: ${{ matrix.travis-os-name }}
SKIP_NETWORK_TESTS: 1
MB_ML_VER: ${{ matrix.manylinux-version }}
WHEELHOUSE_UPLOADER_USERNAME: ${{ secrets.AWS_ACCESS_KEY_ID }}
WHEELHOUSE_UPLOADER_SECRET: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
BUILD_DEPENDS: ${{ matrix.build-depends }}
BUILD_DEPENDS: oldest-supported-numpy

steps:
- uses: actions/checkout@v2
Expand Down Expand Up @@ -162,6 +86,18 @@ jobs:
install_run ${{ matrix.PLAT }}
echo ::endgroup::
#
# Guarantee maximum compatibility with older numpy versions and prevent regression of #3226
#
- name: Test wheel against oldest-supported-numpy (Multibuild)
if: matrix.os != 'windows-latest'
env:
TEST_DEPENDS: oldest-supported-numpy
run: |
source multibuild/common_utils.sh
source multibuild/travis_steps.sh
source config.sh
install_run ${{ matrix.PLAT }}
#
# We can't use multibuild on Windows, so we have to roll our own build script.
# Adapted from
# https://github.com/RaRe-Technologies/gensim-wheels/commit/084b863390edee05bbe15d4ec05d1ab726e52202
Expand Down
6 changes: 3 additions & 3 deletions gensim/models/keyedvectors.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@

from gensim import utils, matutils # utility fnc for pickling, common scipy operations etc
from gensim.corpora.dictionary import Dictionary
from gensim.utils import deprecated
from gensim.utils import deprecated, default_rng


logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -2018,8 +2018,8 @@ def prep_vectors(target_shape, prior_vectors=None, seed=0, dtype=REAL):
if prior_vectors.shape == target_shape:
return prior_vectors
target_count, vector_size = target_shape
rng = np.random.default_rng(seed=seed) # use new instance of numpy's recommended generator/algorithm
new_vectors = rng.random(target_shape, dtype=dtype) # [0.0, 1.0)
rng = default_rng(seed)
new_vectors = rng.random(target_shape).astype(dtype) # [0.0, 1.0)
new_vectors *= 2.0 # [0.0, 2.0)
new_vectors -= 1.0 # [-1.0, 1.0)
new_vectors /= vector_size
Expand Down
7 changes: 3 additions & 4 deletions gensim/models/word2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@
from numpy import float32 as REAL
import numpy as np

from gensim.utils import keep_vocab_item, call_on_class_only, deprecated
from gensim.utils import keep_vocab_item, call_on_class_only, deprecated, default_rng
from gensim.models.keyedvectors import KeyedVectors, pseudorandom_weak_vector
from gensim import utils, matutils

Expand Down Expand Up @@ -384,7 +384,7 @@ def __init__(

self.window = int(window)
self.shrink_windows = bool(shrink_windows)
self.random = np.random.RandomState(seed)
self.random = default_rng(seed)

self.hs = int(hs)
self.negative = int(negative)
Expand Down Expand Up @@ -1956,8 +1956,7 @@ def _load_specials(self, *args, **kwargs):
if not hasattr(self.wv, 'vectors_lockf') and hasattr(self.wv, 'vectors'):
self.wv.vectors_lockf = np.ones(1, dtype=REAL)
if not hasattr(self, 'random'):
# use new instance of numpy's recommended generator/algorithm
self.random = np.random.default_rng(seed=self.seed)
self.random = default_rng(self.seed)
if not hasattr(self, 'train_count'):
self.train_count = 0
self.total_train_time = 0
Expand Down
14 changes: 12 additions & 2 deletions gensim/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,18 @@
)
"""An exception that gensim code raises when Cython extensions are unavailable."""

#: A default, shared numpy-Generator-based PRNG for any/all uses that don't require seeding
default_prng = np.random.default_rng()

#
# Numpy versions older than 1.17.0 use an older method for pseudo-random number generation.
# The older method (RandomState) is available everywhere but has performance issues on some platforms.
# The newer method (default_rng) is preferable, when it is available.
#
def default_rng(seed=42):
mpenkov marked this conversation as resolved.
Show resolved Hide resolved
cls = getattr(np.random, 'default_rng', np.random.RandomState)
return cls(seed)


default_prng = default_rng()


def get_random_state(seed):
Expand Down
8 changes: 7 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -315,7 +315,13 @@ def run(self):
'pandas',
]

NUMPY_STR = 'numpy >= 1.17.0'
#
# The numpy version here is especially significant because it affects our ability to build
# wheels that are backwards-compatible with older numpy versions. If restrict this to
# e.g. numpy>=1.17.0, then our wheel building process becomes significantly more
# complicated. Related issue: #3266.
#
NUMPY_STR = 'oldest-supported-numpy'
mpenkov marked this conversation as resolved.
Show resolved Hide resolved
#
# We pin the Cython version for reproducibility. We expect our extensions
# to build with any sane version of Cython, so we should update this pin
Expand Down
2 changes: 1 addition & 1 deletion tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ commands = flake8-rst gensim/ docs/ {posargs}
basepython = python3
recreate = True

deps = numpy
deps = oldest-supported-numpy
commands = python setup.py build_ext --inplace


Expand Down