Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve NumPy compatibility hell #3231

Closed
wants to merge 12 commits into from

Conversation

menshikh-iv
Copy link
Contributor

@menshikh-iv menshikh-iv commented Sep 13, 2021

Idea:

  • Instead of manual pinning of numpy version everywhere, use oldest-supported-numpy everywhere (on every python and platform)
  • Build always on oldest, test on 2: oldest and newest

Goal: Make gensim works with numpy correctly (avoid issues like #3226).

Fixes #3226.

@menshikh-iv menshikh-iv changed the title [FIX 3226] Attempt to resolve numpy hell WIP: [FIX 3226] Attempt to resolve numpy hell Sep 13, 2021
@menshikh-iv menshikh-iv changed the title WIP: [FIX 3226] Attempt to resolve numpy hell [FIX 3226] Attempt to resolve numpy hell Sep 13, 2021
@piskvorky
Copy link
Owner

piskvorky commented Sep 13, 2021

Instead of manual pinning of numpy version everywhere, use oldest-supported-numpy everywhere (on every python and platform)

Wouldn't that preclude us from using any new API functions from numpy? (new = introduced since oldest-supported-numpy)

If so, that doesn't sound like a desirable solution.

For example, this PR seems to introduce a regression: #2864 and #3220 fixed default_rng, this PR brings the problem back.

@@ -1956,8 +1956,7 @@ def _load_specials(self, *args, **kwargs):
if not hasattr(self.wv, 'vectors_lockf') and hasattr(self.wv, 'vectors'):
self.wv.vectors_lockf = np.ones(1, dtype=REAL)
if not hasattr(self, 'random'):
# use new instance of numpy's recommended generator/algorithm
self.random = np.random.default_rng(seed=self.seed)
self.random = np.random.RandomState(self.seed)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regresses #2782.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed ab53e62

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that preclude us from using any new API functions from numpy? (new = introduced since oldest-supported-numpy)

It doesn't have to. Where we can easily replicate the new API functions, we should do so (as @menshikh-iv has done with gensim.utils.default_rng function). Where we can't (so far, there are no such cases, but there may be in the future), we could do something like:

def np_cool_new_function(*args, **kwargs):
    try:
        fn = np.cool_new_function
    except AttributeError:
        raise RuntimeError("your numpy version is too old, please upgrade")
    else:
        return fn(*args, **kwargs)

Copy link
Owner

@piskvorky piskvorky Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps. Although this smells of re-implementing dependency resolution internally in Gensim, which would be extra hell.

But I guess as long as setup.py requires an actual working version of numpy (not just oldest-supported-numpy), it should be OK.

But requiring some dependency version in setup.py, and then raising RuntimeErrors dynamically suggesting that version is too old, would not be OK.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I guess as long as setup.py requires an actual working version of numpy (not just oldest-supported-numpy), it should be OK.

The problem with going that way is that it prevents us from building against oldest-supported-numpy. We then have to reinvent that particular wheel in our build system infrastructure.

Copy link
Owner

@piskvorky piskvorky Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if I got this right:

numpy API (setup.py): oldest-support-numpy numpy API (setup.py): 1.17.0
numpy ABI (wheels): oldest-support-numpy this #3231 current develop
numpy ABI (wheels): 1.17.0 ? desired, #3226 (comment)

Building against the oldest numpy we actually support makes more sense to me than building against oldest-supported-numpy and then throwing RuntimeError because it's too old, or maintaining separate code paths for deprecated functions we don't really need/want (RandomState).

@mpenkov what exactly was the problem with the "desired" quadrant? Too much code?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not exactly code, it's build system configuration.

We're essentially replicating this with slightly different business logic.

Copy link
Owner

@piskvorky piskvorky Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Reading the comment in your .yml link, that sounds exactly right, exactly like what we want = the "desired" state. Why does that not work?

@gojomo can you check too? I may have gotten lost in the PRs and comments.

@mpenkov not to get too philosophical – let's release bugfix 4.1.1 ASAP. We can discuss a "proper" solution once the fire is put out.

@menshikh-iv
Copy link
Contributor Author

Wouldn't that preclude us from using any new API functions from numpy? (new = introduced since oldest-supported-numpy)

If we'll have tests for oldest & newest numpy in CI, these will be detected automatically

setup.py Show resolved Hide resolved
Adding some comments to the code
@piskvorky piskvorky changed the title [FIX 3226] Attempt to resolve numpy hell Resolve NumPy compatibility hell Sep 14, 2021
@mpenkov
Copy link
Collaborator

mpenkov commented Mar 31, 2022

Closing this, because we've moved on.

I also recall a problem with oldest-supported-numpy, but can't remember what it is at this stage.

@menshikh-iv Thank you for your effort and for guiding us through these numpy issues.

@mpenkov mpenkov closed this Mar 31, 2022
@menshikh-iv
Copy link
Contributor Author

🙏

@menshikh-iv menshikh-iv deleted the numpy-hell branch March 31, 2022 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

numpy 1.19.2 incompatible with gensim 4.1.0
3 participants