Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery: add create_job method that takes any kind of job config (and public configuration property to job classes) #14

Closed
3 tasks
tswast opened this issue Nov 12, 2019 · 1 comment · Fixed by #32
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@tswast
Copy link
Contributor

tswast commented Nov 12, 2019

Is your feature request related to a problem? Please describe.

Some job failures are just due to current conditions and the job will succeed if started from the beginning. The problem is that this isn't a simple "retry" because the job configuration is mutable.

For example, BigQuery automatically populates a destination table if not set. In this case, the destination table should not be part of the retried request. In my opinion, this kind of logic is outside the scope of the client libraries, as it's not clear when destination table needs to be cleared just from the job resource.

Describe the solution you'd like

  • Provide a create_job method that takes any JobConfig object.
  • Add a configuration property to job classes.
  • Add an example of retrying a job that has failed due to 403 rateLimitExceeded (possibly with resetting the destination table if it was a query job)

Describe alternatives you've considered

Add .retry() method to job classes. This is problematic, primarily because the configuration may have changed since the job was initially created, leading to unintended consequences or hard-to-debug failures.

Additional context
Add any other context or screenshots about the feature request here.

See customer request for a method to retry any job at googleapis/google-cloud-python#5555

@yan-hic
Copy link

yan-hic commented Nov 12, 2019

Great. Here some raw info from our implementation that may help.

Retriable job conditions:

if 'Could not serialize access' in e.message or 'Exceeded rate limits' in e.message or 'RESOURCE_EXHAUSTED' in e.message

Code:

di = bq.get_job(job_id).to_api_repr()
di['jobReference']['jobId'] = str(uuid4())
jo = bq.job_from_resource(di)
jo._begin()
  • clear destination_table when statement_type is SELECT or CTAS.

For SCRIPT, do not attempt to retry/review the child jobs. If one fails, best is to retry the script.

@HemangChothani HemangChothani self-assigned this Nov 18, 2019
@plamut plamut transferred this issue from googleapis/google-cloud-python Feb 4, 2020
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Feb 4, 2020
@plamut plamut added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. api: bigquery Issues related to the googleapis/python-bigquery API. and removed api: bigquery Issues related to the googleapis/python-bigquery API. labels Feb 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants