-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop Py2 support #2548
Comments
👍 I spent A LOT of time for py2-only workarounds when I was a maintainer, this is a really deep hole with new surprises every time (esp. now, when our dependencies drop py2 support).
So, what's important here
|
Are there any major LTS distributions that still ship py2 as the default? Otherwise the plan looks good. +1 on moving to at least 3.3 (because of optimized memory for unicode strings in 3.3), and possibly even later. What are the pros and cons of different choices of |
Gensim doesn't support |
I don't know, to be honest. Why does it matter?
I don't recommend 3.3. We explicitly do not support Python 3 less than 3.5. Adding support for those currently unsupported versions could be a lot of work. Nobody has ever asked for that support either, so it could end up being a complete waste of time.
Like @menshikh-iv said, our only real option at the moment is x=5. |
Because we care about what people use (and lots of people use LTS). PyPI also provides some sort of download statistics AFAIR (py version, maybe even OS / distro?).
Why are I'd prefer to make this choice based on facts, not emotions. |
Regarding LTS, I still fail to see why it matters. People with Py2 will always have two options:
I think that's sufficient. Do you disagree? Regarding the versions, sorry, my wording was awkward. I substituted x=5 into 3.x+. In plain English, we will support Py3.5 and above. |
Because I like to make decisions based on facts. What Python versions come out of the box in major LTS distributions, or how popular / hard to install they are across OSes, is an important consideration.
My question was "what are the pros and cons of the different choices?". What we will support is yet to be decided. |
OK, but the facts are not stopping numpy, pytest and a growing number of our dependencies from abandoning Py2. Why would the situation be any different for us? |
My question has little to do with abandoning py2:
In particular, 3.6+ looks enticing because of f-strings. I saw someone already started using Of course, what I find enticing as a developer is not nearly as significant as what users actually use and need. |
Also, it's worth pointing out that we're touching upon two different questions here:
Personally, I think 1) is the urgent issue here. It's causing us (myself and @menshikh-iv, previously) significant grief. To keep the scope manageable, I think we can leave 2) for another ticket. It's an independent issue. |
I agree with @mpenkov, it doesn't matter which default versions, because
About 3.x, definitely 3.5, because
|
I don't think there's any outstanding dispute here, is there? I'd like to see the stats of current Gensim usage (by Python version, distro…) and LTS support, to understand the impact on users. Otherwise I already +1'ed your plan in my first comment.
Not quite. When cleaning up the code syntax after removing 2.x, we should decide what to modernize the constructs with. Doing this piece-meal is unnecessary overhead. |
OK, glad we align on the Py2 issue, it wasn't obvious to me when I wrote my previous message.
The majority of our Py2 support in the code happens via the six library. Dropping Py2 support from the code base will involve (from @menshikh-iv's comment above):
From my understanding, in the vast majority of cases, the above work will look identical regardless of whether we're working with Py3.5, Py3.6 or Py3.7, because it will involve functionality common to all Py3 versions. This is why I think we can move the Py3.x discussion out of scope. It's hard for me to quantify "vast majority" without actually digging into the code and trying to remove six. |
Yes, but stats and facts are important too. Let me summarize the current arguments (ignoring unsubstantiated / emotional ones):
I'm personally leaning toward 3.6 because of the f-strings, but 3.5 sounds OK too.
Removing |
Just to state the obvious, in general, as you go down your table:
Next, I think there are two important Py3.5+ pros missing from your table:
I don't think Py3.5 being 4 years old is much of a con from gensim's point of view (it may be from the user's POV, but I think that's a different question here). As maintainers, we don't expend that much effort supporting Py3.5 specifically: it doesn't have any quirky behavior requiring special handling like Py2.7 does. It does cost us in terms of CI effort (see discussion above) but that is minimal. Similarly, the pros you listed for 3.7 aren't really pros from the gensim point of view, either. They're pros for the user. They don't really matter for us as project maintainers.
I strongly disagree here. We should modernize, but not when we make the switch. We should do it after. I'll expand below. Ideally, yes, we should modernize our code base to 3.x properly. We should update the code to use the full feature set of the Python 3.x language. However, this requires significant effort: to do this properly and completely, we'll have to comb through the entire code base, looking for potential improvements. Each improvement we make increases the size of the change set (it's already going to be a very large PR), increasing the load on developers and reviewers, and pushing back the date when we can finally say goodbye to Py2. None of the above improvements are strictly necessary or time-sensitive. On the other hand, dropping Py2 is both strictly necessary and time-sensitive. More and more of our dependencies will drop Py2 support, breaking our builds and requiring us to divert effort to fix them.
I think this optimization of the number of code review passes is misleading. If we do everything in a single step (as opposed to "next steps") we have the following pros and cons:
So, in practice, doing everything in one step isn't going to save us anything. It's going to cost us more, because maintaining Py2 support has a price tag, and the price is growing each day. To summarize, I think the best way forward is to do the absolute bare minimum to say goodbye to Py2, as soon as possible (it's already going to be a fairly large change as it is). Once we've successfully done that, we can consolidate and make any required improvements. |
Sure, the table is woefully incomplete. That's my point. If there's "something important missing", then add it. Ideally grounded in facts: how many users, what OSes or distros. CC @gojomo for thoughts. Again, none of your plan sounds wrong to me. We do want to get rid of py2 in the new major release. Py3.5 does sound acceptable. I'd just like a more evidence-based decision making, so we can make educated choices and prepare for potential impacts better. For example, your "Least surprising" and "Least effort" points apply near-equally well to any 3.x. So they are not much of an argument either way: 3.3 was even less "surprising" and I don't see its extra "effort" quantified anywhere… yet we dropped it. 3.6+ requires no additional "effort" at all (Python is backward compatible) and nobody bothered to quantify the extra "surprise"… yet I keep hearing "must support 3.5".
This is a terrible, unacceptable mindset. Please snap out of it.
I don't feel strongly either way, but for the record, I see the opposite as true. The effort to remove py2 will be significant and so also fixing py3-style (comprehensions, f-strings; probably not annotations) if done at the same time is minimal. But significant again if done later, separately. There's also some risk that it will be done never. Gensim has seen enough hacky "upgrades" and refactorings over the past 4 years, due to weak project oversight and steering. But I'm not too worried now, things are falling in place again. |
Dropping Py2 support for new features & major releases makes good sense to me. However, since it's possible we'd still want to do high-priority bugfix micro-releases for the last version that supports Py2 (eg: a "gensim 3.9.0-2" or "gensim 3.9.2"), I wouldn't rush to remove all Py2-isms and cross-version support (like that enabled by For which Python 3.x version to target as a minimum: 3.4 is already past its official "end-of-life" from the Python Software Foundation. 3.5 hits its "end-of-life" in less-than-a-year: 2020-09-20. While it'd be of maximum-generosity to lagging users to still support Python 3.5 a little longer (and it'd match some of our dependencies), I doubt that group of users is very large, and such users may be OK, like Python 2.x users, with just the recommendation to use a slightly-older It'd be interesting to find some summary of which distros have moved to which minimum Python, but ultimately being fully dependent on the system Python, rather than installing a project-specific Python yourself, is suboptimal practice: somewhat fragile & limiting, especially when considering production deployments over time. I think it's OK to nudge people towards the "best practice" of explicitly choosing, and installing, the right Python for your project – separate from the system Python. (Conda, especially, makes this very very easy.) So overall I'd say adopting a minimum-target for new-work as Python 3.6 is defensible. And, by 2020-09-20, it should become a no-brainer – as 3.5 hits end-of-life and even our dependencies are likely to increment their minimum-targets. |
@piskvorky Hang on, what's so terrible and unacceptable about what I said? Which of those two features (speed and bug fixes) affect any of the gensim code? I'm not aware of any part of gensim that relies on Py3.7-specific behavior. Contrast that to other features you listed as pros e.g. f-strings. This particular feature actually matters to us as project maintainers, because we get to actually use it in the gensim code and obtain tangible improvements. In this case, f-strings are a pro for us as maintainers.
When speaking of "least surprising" and "least effort", I'm comparing each option to the status quo. For example, we currently explicitly support Py3.5, 3.6, 3.7 and above. If we continue to do so, then we don't surprise anybody. If we decide to go with Py3.6 and above, we surprise all of our Py3.5 users. If we go with Py3.7 and above, we surprise the Py3.5 and 3.6 crowd. The same thing goes for effort. For Py3.3 and 3.4, which are currently unsupported, we have to apply additional effort to get them working with gensim. For 3.5, we don't have to anything, because it's the status quo. For 3.6 and 3.7, we "have" to modify the code base to support the new features offered by those versions. I put "have" in quotation marks above because it's not strictly necessary to use the new features. But if we don't use the new features, what is the point of requiring our users to use the newer Python version? For example, if we don't use f-strings, then what do we gain by requiring people use Py3.6 as opposed to Py3.5? There are multiple ways to quantify the sizes of each crowd, here's one.
Those figures are for 30 days. You can see that while 3.6 has overtaken 3.5, the latter is still fairly common. |
@gojomo OK, so how about we remove Py2 support in multiple stages? This is slightly different to my original proposal, but still works well, in my opinion. The first stage would be to warn people that we're dropping Py2 support via a warning message. We could include this in the next minor release, 3.9.0. The second stage would be to drop support for Py2 officially. This would be 4.0.0. We'll stop running Py2 builds on our CI. The code will still continue to work on Py2 via the previous mechanisms, e.g. six, but we will avoid providing any support for it via github or the mailing lists (except for bugfix releases like 3.9.1, if necessary). The third and final stage will remove Py2 support entirely: replacing six with standard library code. We will add an explicit assertion that people are running Py3.x in setup.py, preventing Py2.7 users from even attempting to install and run the latest version of gensim. We could do this in 4.1.0 or a later version. @piskvorky What do you think about this alternative? |
When we do drop py2 (4.0.0 = your "stage 2"), I'd prefer to drop it fully. As in, trying to install gensim 4.0.0 on py2 should fail. A clear unambiguous transition. I don't feel particularly strongly about it, I just don't like wishy-washy changes. It leads to confusion and more trouble down the road. I don't anticipate us doing any backports either, we don't have the resources for that. |
What is the benefit of this approach? From my point of view, as one of the people likely to perform this change, I see a few demerits:
|
I listed the perceived benefits in my message = clarity for users. I don't believe it should take more than a few lines in the build configs + documentation to officially drop py2. This is assuming we're still talking about the same topic: The first and third stages sound fine. |
I'd agree that for clarity the 1st version that no longer officially supports Python 2 (eg gensim=4.0.0) should resist being installed under Python 2. (Otherwise, I'd expect confusion where maybe some classes still work, for some operations, but other activities start throwing errors & generating bug-reports & support-questions.) I'm not sure any "1st stage" "warning message" is necessary. Users tend to ignore warnings. The truly in-your-face notice would be when someone tries to install a Python 3 version (gensim>=4.0.0) and gets a clear error message that's not supported (and they should consider using an older gensim version). Such a warning could be especially annoying to devs who have explicitly chosen the "last-good-on-Py2" version. I don't think any explicit "3rd stage", to "remove Py2 support entirely", is necessary. Few quality developers have time for a cleanup of many lines of old code when such a cleanup, alone, offers no new features or performance benefits. Code that works shouldn't be touched. Even harmless-looking refactorings have a risk of breaking things (and definitely make diffs/histories harder to read, as with blanket style cleanups). Even setting such a cleanup as an unassigned goal could attract low-quality mechanical-edit contributions needing review, and incurring other risks/costs. Instead, since tests will no longer run in a Python 2 environment, and contributors will have the "green light" to use Py3 features, new work will just never-add Py2-specific stuff (like the use of Finally, even if there's a preference & explicit commitment that "gensim=3.9.0 is the last version to support Python 2 and no further 3.9.xxx releases are planned", it's possible something might make a 3.9.1 (etc) release desirable. Maybe there's a serious security issue, or an egregious correctness/performance error with a simple fix. Maybe even such a fix comes as a high-quality PR from elsewhere. So, even while |
Good points. I agree on no 1st stage warnings on py2, haven't thought of that ( Summarizing: The rest is all agreed, I think. Or was there anything else? @mpenkov the final decision is yours, I don't feel too strongly about any of these remaining points. |
@piskvorky you didn't include my PoV to summarization, let me do that myself: @menshikh-iv : 4.0.0 = py3.5, hard cliff, get rid all "python2" stuff. |
Slight quibble: I'm not quite "keep Py2 around indefinitely", but rather "only remove Py2 stuff when code gets touched for other reasons". I don't get the case for searching-and-removing all historic |
@gojomo It means less code and a cleaner code base, the benefit of which are increased maintainer sanity and easier future maintenance and code evolution = the main reasons for dropping py2 in the first place. We're not dropping py2 just to spite our py2 user base (which is still by far the largest out of all Python versions, according to @mpenkov 's stats above). Also, explaining to new contributors why they should use some constructs and not others, when the existing code around is not doing so, will be tricky. Py2 will proliferate, not be reduced over time. People like to follow existing code standards and patterns they see. Plus data science people are typically not great SW engineers, familiar with the ins and outs of Python versions (or programming as such). So it's better fixed by core devs, familiar with the project objectives and having a wider perspective, and IMO ideally done in one go = when dropping py2, if drop it we must. |
The PR simply disabled CI builds. Reopening this because there is still work to do. |
RIP Py2.7, closing. @piskvorky FYI |
An increasing number of our dependencies are dropping support for Py2. Numpy and pytest are the more visible examples, but there are others. Dealing with this requires identifying such packages (usually, through failed builds) and working around the problem. This hits us hard on two fronts:
I recommend that we prioritize removing Py2 support. To get there:
I think the above plan is a reasonable compromise:
@piskvorky @menshikh-iv Thoughts?
The text was updated successfully, but these errors were encountered: