Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed deprecated contrib files and replace them with PEP-562 getattr #26153

Merged
merged 1 commit into from
Sep 5, 2022

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented Sep 4, 2022

The conrib modules are carried from Airflow 1.10 and we cannot
remove the classes that were used there without bumping Airlfow
to 3, however we can replace the files with dynamic attribute
retrieval following PEP-562. This way we will have far less number
of files in contrib and users will not get auto-import and
autocomplete when they will be developing their DAGs.

This does not break compatibility but it providers much stronger
signal to the users that they should switch to new classes:

  • they will not see the classes in autocomplete and autoimport
    any more
  • it will seem that the classes are not existing in contrib until
    they are used
  • if anyone performs static analysis on the DAGs, the analysis
    will fail (and this is an intended behaviour)

^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented Sep 4, 2022

I attempted to replace all the contrib classes with PEP-562 ones. Looking forward to comments.
I left the tool that I used to generate "__deprecated_classas" dictionaries as there are quite a number of non-contrib deprecations we could remove as next step, so it might be handy. But I did not want to apply it just yet to the other files - they need a bit more selective approach, so I wanted to make sure the approach is cool.

Also the presence of the tool might be good as part of review - to see that I have not missed anything. This change is -10K lines, so likely easier to review the "method", than the result. All the __deprecated_classes were generated using this tool.

Few things I would like to get comments on (It's the first time I apply PEP-562):

  • I only dynamically generate modules and classes in the modules. I left packages in "contrib" folders as they were. Likely it would be possible to also dynamically generate packages - but this is beyond PEP-562 andI think this is a good balance between number of files/lines left and "explicitness" of the solution. But if others have comments / experience here to generate also packages, I woudl love to hear it.

  • I had to remove all the tests for the classes moved to __deprecated_classes. The reason is that our tests attempt to reload the modules, and fail with those errors:

ModuleNotFoundError: spec not found for the module 'airflow.contrib.operators.s3_to_gcs_operator'

And if you think of it - it makes sense. We generated the modules out of thin air, so they do not have spec to use to reload them. And those tests are I think quite a bit redundant now - since we know the warnings are generated by __getattr__ and we can test it manually (I did), there is no reason it should fail for the others. So keeping extra tests for those seems quite redundant.

@potiuk
Copy link
Member Author

potiuk commented Sep 4, 2022

BTW. This loooks really cool:

Screenshot 2022-09-05 at 00 39 39

@potiuk potiuk force-pushed the remove-deprecated-contrib-classes branch 3 times, most recently from 3f3cca2 to 669173c Compare September 5, 2022 07:12
@potiuk potiuk added this to the Airflow 2.4.0 milestone Sep 5, 2022

def add_deprecated_classes(module_imports: Dict[str, Dict[str, str]], module: str):
for submodule_name, imports in module_imports.items():
module_type = ModuleType(f"{module}.{submodule_name}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that it matters, but this approach does "pollute" sys.modules with all deprecated modules when any contrib module is loaded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It does. That was my worry too.

But I think this does not really matter - those are super small modules (just getattr methods and the dictionaries) and i would say price to pay to access contrib for the users :). We WANT the friction to be there :).

Do you think it's worth to do it differently ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it works, but can you put a __getattr__ on airflow.contrib.hooks that "loads" each module on demand instead?

Copy link
Member Author

@potiuk potiuk Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried it, and it does not work, module loader does not use __getattr__, it uses much more complex machinery to find modules. By default (with defalt modulespec and loader) it expects the Package object (which should be added to sys.modules[package])' to return pathattribiute (list of paths to search). I even went as far as creating this dynamically created package but then I have nopaths` to provide it to :).

I think we would have to hack and change the loader implementation to not search in __path__ but implement ModuleSpec based on https://peps.python.org/pep-0451 I believe. This would also eventually enable complete removal of sub-packages in contrib, but I somehow feel it's not worth it as it's a bit too magical IMHO and we do not need the whole machinery to reload the modules which is behind it).

The nice thing about this solution is that it's fairly straightforward to get how it works and it's pretty much based on simple standards (PEP 562 and the fact that all modules should land in sys.modules).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW. I updated the naming in the proposed solution - replacing module with package and submodule with module 🤦

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, totally not worth building a custom loader mechanism for this!

The conrib modules are carried from Airflow 1.10 and we cannot
remove the classes that were used there without bumping Airlfow
to 3, however we can replace the files with dynamic attribute
retrieval following PEP-562. This way we will have far less number
of files in contrib and users will not get auto-import and
autocomplete when they will be developing their DAGs.

This does not break compatibility but it providers much stronger
signal to the users that they should switch to new classes:

* they will not see the classes in autocomplete and autoimport
  any more
* it will seem that the classes are not existing in contrib until
  they are used
* if anyone performs static analysis on the DAGs, the analysis
  will fail (and this is an intended behaviour)
@potiuk potiuk force-pushed the remove-deprecated-contrib-classes branch from 669173c to bcfeb47 Compare September 5, 2022 10:54
@potiuk
Copy link
Member Author

potiuk commented Sep 5, 2022

This is what we get now:

image

@potiuk potiuk merged commit 5264b63 into apache:main Sep 5, 2022
@potiuk potiuk deleted the remove-deprecated-contrib-classes branch September 5, 2022 15:24
potiuk added a commit to potiuk/airflow that referenced this pull request Sep 6, 2022
The apache#26153 missed the classes that were added by deriving a new
class from the imported ones. This was mostly about non-consistent
spelling of class names.

This PR fixes it and brings missing contrib classes back.
ashb pushed a commit that referenced this pull request Sep 7, 2022
The #26153 missed the classes that were added by deriving a new
class from the imported ones. This was mostly about non-consistent
spelling of class names.

This PR fixes it and brings missing contrib classes back.
@ephraimbuddy ephraimbuddy added the type:misc/internal Changelog: Misc changes that should appear in change log label Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools type:misc/internal Changelog: Misc changes that should appear in change log
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants