Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Python libraries for GCP services #21019

Closed
damccorm opened this issue Jun 4, 2022 · 9 comments · Fixed by #24599
Closed

Update Python libraries for GCP services #21019

damccorm opened this issue Jun 4, 2022 · 9 comments · Fixed by #24599

Comments

@damccorm
Copy link
Contributor

damccorm commented Jun 4, 2022

Some libraries from the GCP requirements of the Apache Beam Python package reference old versions, e.g. Bigtable and Spanner (that had several major version bumps since).

The setup file describes those dependencies as only being required by tests, but it looks like some transforms are referencing them. Therefore I'm not sure of the real cost behind upgrading them.

I'm currently using the Spanner client in a custom transform, but I have to stick to the google-cloud-spanner version used by Apache Beam. Would it be possible to upgrade those dependencies?

 

Cheers,

Flo

Imported from Jira BEAM-12817. Original Jira may contain additional context.
Reported by: flovouin.

@thclark
Copy link

thclark commented Jul 13, 2022

Yeah, it's really necessary that dependencies are updated; it makes it nearly impossible to do anything with current versions of google libraries, being blocked by installation of dependencies that aren't even used in the actual install, only in test.

As a simple step to mitigation, the test dependencies could be split into a separate section (e.g. gcp-test) to only get installed when actually testing beam, and not constrain dependencies when installing it.

My current workaround is to not install apache-beam[gcp] and just to install apache-beam, then manually install the gcp dependencies myself, thereby avoiding the infinite install in this closely related issue by pinning dependencies. It's a real pain to do that though.

Here's an install_requires (in the setup.py) for a package that needs to avoid this problem:

    install_requires=[
        "apache-beam==2.40.0",  # actually I need apache-beam[gcp] but see https://github.com/apache/beam/issues/21019
        "cachetools>=3.1.0,<5",  # from apache-beam[gcp] - update if you update apache-beam
        "google-apitools>=0.5.31,<0.5.32",  # from apache-beam[gcp] - update if you update apache-beam
        "google-auth>=1.18.0,<3",  # from apache-beam[gcp] - update if you update apache-beam
        "google-auth-httplib2>=0.1.0,<0.2.0",  # from apache-beam[gcp] - update if you update  apache-beam
        "google-cloud-datastore>=1.8.0,<2",  # from apache-beam[gcp] - update if you update apache-beam
        "google-cloud-pubsub>=2.1.0,<3",  # from apache-beam[gcp] - update if you update apache-beam
        "google-cloud-pubsublite>=1.2.0,<2",  # from apache-beam[gcp] - update if you update apache-beam
        # ... and I don't bother installing the unused test dependencies which include tight bigquery and spanner constraints
    ]

@tvalentyn
Copy link
Contributor

tvalentyn commented Aug 3, 2022

I'm currently using the Spanner client in a custom transform, but I have to stick to the google-cloud-spanner version used by Apache Beam. Would it be possible to upgrade those dependencies?

We are evaluating current usage of Spanner dependency. It's currently used by an experimental IO, which only works with a very old client. If we find this IO has little usage, we may have to exclude it from the dependency, since we plan to switch to X-Lang spanner IO.

I know this is not convenient, but (assuming you don't use the experimental spanner IO), you can have a custom build of Apache Beam that does not require the google-cloud-spanner, or just force-install the version you need even if there is a dependency conflict.

@tvalentyn
Copy link
Contributor

We are also working on a better process of managing Python dependencies to avoid having outdated libraries in our chain.

@ElTav
Copy link

ElTav commented Sep 19, 2022

@tvalentyn Are there any plans to update the PubSub Python dependency?

@tvalentyn
Copy link
Contributor

@amardeep
Copy link

amardeep commented Mar 3, 2023

Any update on this?

Currently, when I install beam, I get google-cloud-bigquery=1.28.2 which is really old.

Here are the steps to reproduce

conda create -p ./env -c conda-forge python=3.10 poetry
conda activate ./env
poetry init -q
poetry add 'apache-beam[gcp]=2.45.0'

This results in following versions of google libraries:

$ pip list | grep google
google-api-core                 2.8.2
google-apitools                 0.5.31
google-auth                     2.16.2
google-auth-httplib2            0.1.0
google-cloud-bigquery           1.28.2
google-cloud-bigquery-storage   2.16.0
google-cloud-bigtable           1.7.3
google-cloud-core               2.3.2
google-cloud-datastore          1.15.5
google-cloud-dlp                3.9.0
google-cloud-language           1.3.2
google-cloud-pubsub             2.13.7
google-cloud-pubsublite         1.7.0
google-cloud-recommendations-ai 0.7.1
google-cloud-spanner            3.22.0
google-cloud-storage            2.1.0
google-cloud-videointelligence  1.16.3
google-cloud-vision             3.1.2
google-crc32c                   1.5.0
google-resumable-media          1.3.3
googleapis-common-protos        1.56.4
grpc-google-iam-v1              0.12.4

@tvalentyn tvalentyn added this to the 2.47.0 Release milestone Mar 3, 2023
@tvalentyn
Copy link
Contributor

we will have an update before next release for many dependencies.

@tvalentyn
Copy link
Contributor

cc: @AnandInguva

@AnandInguva
Copy link
Contributor

Yes, PR addressing this issue #24599

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants