Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making version work in envs where get_distribution does not. #561

Closed
wants to merge 1 commit into from

Conversation

dhermes
Copy link
Contributor

@dhermes dhermes commented Jan 22, 2015

See http://stackoverflow.com/a/28095663/1068170 for context.

@tseaver I haven't thought deeply about this solution and I'd wager you've got some experience here.

  1. Maybe there is a more preferred alternative?
  2. Should we hardcode the __version__ instead of using None? (It will result in gcloud-python/None as the user-agent.)

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jan 22, 2015
@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 269728a on dhermes:version-for-gae into 1bfa469 on GoogleCloudPlatform:master.

@tseaver
Copy link
Contributor

tseaver commented Jan 22, 2015

I don't think making import pkg_resources conditional will fix the error:

  File "/Users/sheridangray/Projects/adt-bundle-mac-x86_64-20140702/sdk/google-cloud-sdk/platform/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/setuptools-0.6c11/pkg_resources.py", line 565, in resolve
    raise DistributionNotFound(req)  # XXX put more info here
DistributionNotFound: gcloud

pkg_resources is present, but it cannot find the installation metadata for gcloud.

@dhermes
Copy link
Contributor Author

dhermes commented Jan 22, 2015

Good catch! I didn't realize pkg_resources shipped with App Engine. I double checked and it also imports in production GAE:

<module 'pkg_resources' from '/base/data/home/runtimes/python27/python27_lib/versions/third_party/setuptools-0.6c11/pkg_resources.pyc'>

I now wonder why my hack works.

I fixed and pushed again (erased history).

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 6ac6a81 on dhermes:version-for-gae into 927c534 on GoogleCloudPlatform:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling a126550 on dhermes:version-for-gae into 927c534 on GoogleCloudPlatform:master.

@dhermes
Copy link
Contributor Author

dhermes commented Jan 22, 2015

@tseaver Any remaining issues?

@tseaver
Copy link
Contributor

tseaver commented Jan 22, 2015

No issues. I don't know if having 'None' in the agent string will be useful to the folks who consume it (the back-end logs, I would guess), but can't think of a better default.

@dhermes
Copy link
Contributor Author

dhermes commented Jan 22, 2015

We could hardcode it there and have a Travis script check that it matches the one in setup.py. I suppose by that logic we could just remove the try...except altogether.

WDYT?

@tseaver
Copy link
Contributor

tseaver commented Jan 22, 2015

Maybe we should be adding more stuff to the agent string: for example, if we can tell via the environment that we are on GAE / GCE, maybe that should be there too? Then the 'None' wouldn't be as puzzling: somebody who saw it would be able to find this issue.

@silvolu
Copy link
Contributor

silvolu commented Jan 22, 2015

The P0 is having gcloud-python in the user-agent; the version can be helpful once we have many existing versions to understand the size of each segment of usage, but it's fine if we have a None value or the proposed GCE/GAE.

@dhermes
Copy link
Contributor Author

dhermes commented Jan 22, 2015

Well here this None corresponds to GAE (or at least as the motivation). I want to resolve the issue at hand first (code won't run when manually copied from a virtualenv for GAE) and then we can muck around with the user agent?

I'm a fan of adding something like this to setup.py

with open(os.path.join(here, 'gcloud', '__init__.py')) as file_obj:
    PACKAGE_INIT = file_obj.read()

if PACKAGE_INIT.count('__version__') != 1:
    raise EnvironmentError('Expected exactly one version.')

VERSION = '0.3.0'
VERSION_LINE = '\n__version__ = %r\n' % (VERSION,)
LOC = PACKAGE_INIT.find('__version__')
if VERSION_LINE != PACKAGE_INIT[LOC - 1:LOC + len(VERSION_LINE)]:
    raise EnvironmentError('Version defined in package disagrees '
                           'with %r.' % (VERSION,))

This would cause failures in tox since it will run setup.py every time, so we would be very aware of it.


It seems like something we would put on the Connection class, since our connections (e.g. from oauth2client.GoogleCredentials.get_default_credentials) know what environ they belong in.

@tseaver
Copy link
Contributor

tseaver commented Jan 22, 2015

That seems like a large amount of complexity in setup.py to deal with a pretty edgy case (why is the appengine tooling not copying the egg-info from the virtualenv, anyway?). Or is it a PEBKAC?

@dhermes
Copy link
Contributor Author

dhermes commented Jan 22, 2015

App Engine doesn't support Python packaging in the traditional sense, so what people typically do is copy-paste.

It may be a PEBKAC (fingers crossed) but I have never encountered people copying egg-info in the wild. Let me muck around a bit with my sample and I'll report back.

@dhermes
Copy link
Contributor Author

dhermes commented Jan 22, 2015

@tseaver It was PEBKAC indeed :) I didn't know enough about what lives inside the egg; learning more every day!

It turns out my old pal @jonparrott has solved this problem with Darth Vendor.

I filed #566 to incorporate the suggestion and am closing this out.

I've updated the project and will hopefully update the StackOverflow answer to reflect the "ease" of using Darth Vendor.

@dhermes dhermes closed this Jan 22, 2015
@dhermes dhermes deleted the version-for-gae branch January 22, 2015 23:19
@theacodes
Copy link
Contributor

@dhermes give a shout if anything needs to be added to Darth to help out this use case. Happy to help.

@dhermes
Copy link
Contributor Author

dhermes commented Jan 23, 2015

@jonparrott Will do. I just ignorantly ignored it on my first pass; it works quite well for third party deps.

@timoh
Copy link

timoh commented Mar 9, 2018

@jonparrott @dhermes I am trying to get Google Cloud BigQuery working on AWS Glue so that I could write from an AWS Glue sandbox to a GC BigQuery dataset. Since AWS Glue doesn't seem to support doing anything on the CLI, and you need to install packages as .zip files on S3 (see here), I have been looking for a way to package all 3rd party libs in a single zip file and then give that zip file through S3 to PySpark.

I have managed to get it to work with most packages although it has been quite a struggle. However, with Google Cloud Python libs, the problem is that it seems to be using Implicit Namespace Packages (PEP-420) and since AWS Glue is Python 2.7 and does not support the regular route of doing pip install, the result is that importing breaks; it just cannot find the libraries even when the packages have been loaded and other libs in the same zip, like the Python package requests, loads a-ok.

I've worked around this with adding empty __init__.py files in the folders google and google/cloud, but since the google/cloud/bigquery folder already has an __init__.py file and uses pkg_resources.get_distribution(), I get the error of

raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'google-api-core' distribution was not found and is required by the application

Darth Vendor seems to be GAE specific, so is there anything I could do to make Google Cloud for BigQuery work for AWS Glue?

To prove that this isn't just something I'm trying to do in isolation, here's Eran Kampf's tutorial on the subject (especially relevant is the section with the heading "Handling 3rd Party Dependencies"): https://developerzen.com/best-practices-writing-production-grade-pyspark-jobs-cb688ac4d20f

@theacodes
Copy link
Contributor

We're not using implicit namespace packages, we're using pkg_resources style packages (see here). When pip installs these, it does environment detection and most of the time will install the packages without their __init__.pys and create -nspkg.pth file in site-packages. The interpreter picks these up via sitecustomize. Darth Vendor makes these work because it uses addsitedir just like sitecustomize.

For getting pkg_resources to also work you need to do one more step (and we could even add this to darth vendor, there's no reason not to): tell pkg_resources about the installed packages. Use pkg_resources.working_set.add_entry('lib').

@timoh
Copy link

timoh commented Mar 21, 2018

@jonparrott thanks for your response.

This shell script (came up with it after this discussion here: https://twitter.com/ekampf/status/973623247785795584) is what finally got it to work for me on AWS Glue on Python 2.7:

pip install -U -r requirements.txt --install-option --install-lib="/absolute/path/to/deps/"
echo "" > ./deps/google/__init__.py
echo "" > ./deps/google/cloud/__init__.py
sed -i .bak "s/get_distribution('google-cloud-bigquery').version/'0.31.0'/g" ./deps/google/cloud/bigquery/__init__.py
sed -i .bak "s/get_distribution('google-cloud-core').version/'0.26.0'/g" ./deps/google/cloud/_http.py
sed -i .bak "s/pkg_resources.get_distribution('grpcio').version/'1.10.0'/g" ./deps/google/api_core/gapic_v1/client_info.py
sed -i .bak "s/pkg_resources.get_distribution('google-api-core').version/'1.1.0'/g" ./deps/google/api_core/gapic_v1/client_info.py
sed -i .bak "s/get_distribution('google-api-core').version/'1.1.0'/g" ./deps/google/api_core/__init__.py

@theacodes
Copy link
Contributor

Oof, I would not recommend that. Again, the right solution is to use pkg_resources.working_set.add_entity or ping the PySpark folks to properly support package installation.

vchudnov-g pushed a commit that referenced this pull request Sep 20, 2023
Source-Link: googleapis/synthtool@c4dd595
Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:ce3c1686bc81145c81dd269bd12c4025c6b275b22d14641358827334fddb1d72
parthea pushed a commit that referenced this pull request Oct 21, 2023
parthea pushed a commit that referenced this pull request Oct 21, 2023
Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
parthea pushed a commit that referenced this pull request Oct 21, 2023
Source-Link: googleapis/synthtool@d52e638
Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:4f9b3b106ad0beafc2c8a415e3f62c1a0cc23cabea115dbe841b848f581cfe99

Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement. packaging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants