-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making version work in envs where get_distribution does not. #561
Conversation
I don't think making File "/Users/sheridangray/Projects/adt-bundle-mac-x86_64-20140702/sdk/google-cloud-sdk/platform/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/lib/setuptools-0.6c11/pkg_resources.py", line 565, in resolve
raise DistributionNotFound(req) # XXX put more info here
DistributionNotFound: gcloud
|
269728a
to
6ac6a81
Compare
Good catch! I didn't realize
I now wonder why my hack works. I fixed and pushed again (erased history). |
6ac6a81
to
a126550
Compare
@tseaver Any remaining issues? |
No issues. I don't know if having 'None' in the agent string will be useful to the folks who consume it (the back-end logs, I would guess), but can't think of a better default. |
We could hardcode it there and have a Travis script check that it matches the one in WDYT? |
Maybe we should be adding more stuff to the agent string: for example, if we can tell via the environment that we are on GAE / GCE, maybe that should be there too? Then the 'None' wouldn't be as puzzling: somebody who saw it would be able to find this issue. |
The P0 is having gcloud-python in the user-agent; the version can be helpful once we have many existing versions to understand the size of each segment of usage, but it's fine if we have a None value or the proposed GCE/GAE. |
Well here this I'm a fan of adding something like this to with open(os.path.join(here, 'gcloud', '__init__.py')) as file_obj:
PACKAGE_INIT = file_obj.read()
if PACKAGE_INIT.count('__version__') != 1:
raise EnvironmentError('Expected exactly one version.')
VERSION = '0.3.0'
VERSION_LINE = '\n__version__ = %r\n' % (VERSION,)
LOC = PACKAGE_INIT.find('__version__')
if VERSION_LINE != PACKAGE_INIT[LOC - 1:LOC + len(VERSION_LINE)]:
raise EnvironmentError('Version defined in package disagrees '
'with %r.' % (VERSION,)) This would cause failures in It seems like something we would put on the |
That seems like a large amount of complexity in |
App Engine doesn't support Python packaging in the traditional sense, so what people typically do is copy-paste. It may be a PEBKAC (fingers crossed) but I have never encountered people copying egg-info in the wild. Let me muck around a bit with my sample and I'll report back. |
@tseaver It was PEBKAC indeed :) I didn't know enough about what lives inside the egg; learning more every day! It turns out my old pal @jonparrott has solved this problem with Darth Vendor. I filed #566 to incorporate the suggestion and am closing this out. I've updated the project and will hopefully update the StackOverflow answer to reflect the "ease" of using Darth Vendor. |
@dhermes give a shout if anything needs to be added to Darth to help out this use case. Happy to help. |
@jonparrott Will do. I just ignorantly ignored it on my first pass; it works quite well for third party deps. |
@jonparrott @dhermes I am trying to get Google Cloud BigQuery working on AWS Glue so that I could write from an AWS Glue sandbox to a GC BigQuery dataset. Since AWS Glue doesn't seem to support doing anything on the CLI, and you need to install packages as .zip files on S3 (see here), I have been looking for a way to package all 3rd party libs in a single zip file and then give that zip file through S3 to PySpark. I have managed to get it to work with most packages although it has been quite a struggle. However, with Google Cloud Python libs, the problem is that it seems to be using Implicit Namespace Packages (PEP-420) and since AWS Glue is Python 2.7 and does not support the regular route of doing pip install, the result is that importing breaks; it just cannot find the libraries even when the packages have been loaded and other libs in the same zip, like the Python package requests, loads a-ok. I've worked around this with adding empty
Darth Vendor seems to be GAE specific, so is there anything I could do to make Google Cloud for BigQuery work for AWS Glue? To prove that this isn't just something I'm trying to do in isolation, here's Eran Kampf's tutorial on the subject (especially relevant is the section with the heading "Handling 3rd Party Dependencies"): https://developerzen.com/best-practices-writing-production-grade-pyspark-jobs-cb688ac4d20f |
We're not using implicit namespace packages, we're using pkg_resources style packages (see here). When pip installs these, it does environment detection and most of the time will install the packages without their For getting |
@jonparrott thanks for your response. This shell script (came up with it after this discussion here: https://twitter.com/ekampf/status/973623247785795584) is what finally got it to work for me on AWS Glue on Python 2.7:
|
Oof, I would not recommend that. Again, the right solution is to use |
Source-Link: googleapis/synthtool@c4dd595 Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:ce3c1686bc81145c81dd269bd12c4025c6b275b22d14641358827334fddb1d72
Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
Source-Link: googleapis/synthtool@d52e638 Post-Processor: gcr.io/cloud-devrel-public-resources/owlbot-python:latest@sha256:4f9b3b106ad0beafc2c8a415e3f62c1a0cc23cabea115dbe841b848f581cfe99 Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
See http://stackoverflow.com/a/28095663/1068170 for context.
@tseaver I haven't thought deeply about this solution and I'd wager you've got some experience here.
__version__
instead of usingNone
? (It will result ingcloud-python/None
as the user-agent.)