Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add progress bar for to_arrow method #352

Merged
merged 10 commits into from
Nov 16, 2020

Conversation

HemangChothani
Copy link
Contributor

Fixes #343

  • PR open for the feedback and suggestions.

  • Currently implemented for 'QueryJob.to_arrow' method

  • When the cacheHit for result is true at time query_plan is blank so can't implement progress bar.

@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Oct 29, 2020
@@ -20,6 +20,9 @@
import copy
import re
import threading
import tqdm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be an optional import. See:

try:
import tqdm
except ImportError: # pragma: NO COVER
tqdm = None

@@ -3275,6 +3283,27 @@ def result(
rows._preserve_order = _contains_order_by(self.query)
return rows

def _get_progress_bar(self, progress_bar_type, description, total, unit):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is very close to the logic in table.py

def _get_progress_bar(self, progress_bar_type):

I'd suggest creating a _tqdm_helpers.py module similar to our pandas helpers to hold this logic for both Table and Job logic.

@HemangChothani HemangChothani marked this pull request as ready for review November 2, 2020 07:51
@HemangChothani HemangChothani requested a review from a team November 2, 2020 07:51
@@ -3337,7 +3339,49 @@ def to_arrow(

..versionadded:: 1.17.0
"""
return self.result().to_arrow(
if self.query_plan and progress_bar_type:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this logic necessary if we are calling to_arrow? Can't we rely on to_arrow's progress bar support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, didn't get this point, are you talking about table.to_arrow() progress bar support?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I was getting this confused with RowIterator, where to_dataframe relies on to_arrow's progress bar support. I do wonder if some of this "waiting for the query to finish" logic could be refactored into a _tqdm_helpers.py function, since it should be identical between to_dataframe and to_arrow

@@ -3406,7 +3450,46 @@ def to_dataframe(
Raises:
ValueError: If the `pandas` library cannot be imported.
"""
return self.result().to_dataframe(
query_plan = self.query_plan
if query_plan and progress_bar_type:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query_plan can get updated as the job progresses. I'd prefer if this always created a progress bar, but had a generic message when the job is still pending and there isn't a query plan.

)

try:
query_result = self.result(timeout=0.5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make constants for this (PROGRESS_BAR_UPDATE_INTERVAL, for example)

Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. A few nits regarding naming and indentation.

return None


def _query_job_result_helper(query_job, progress_bar_type=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use active verbs in this helper name.

Suggested change
def _query_job_result_helper(query_job, progress_bar_type=None):
def wait_for_query(query_job, progress_bar_type=None):


def _query_job_result_helper(query_job, progress_bar_type=None):
"""Return query result and display a progress bar while the query running, if tqdm is installed."""
if progress_bar_type:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can reduce the level of indentation if we exit early. Also, I think we'll want to pass through some keyword arguments to result(), correct?

Suggested change
if progress_bar_type:
if progress_bar_type is None:
return query_job.result()

Aside (not relevant for this PR): we'll eventually want to pass additional arguments to result() whenever we implement #296

_PROGRESS_BAR_UPDATE_INTERVAL = 0.5


def _get_progress_bar(progress_bar_type, description, total, unit):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to use these helpers from other modules, let's remove the (redundant because of private module) leading _.

Suggested change
def _get_progress_bar(progress_bar_type, description, total, unit):
def get_progress_bar(progress_bar_type, description, total, unit):

progress_bar = _get_progress_bar(
progress_bar_type, "Query is running", 1, "query"
)
if query_job.query_plan:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be one level deeper. I'd like to see the while True loop, even when query_plan is not initially populated.

Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect the while loop to contain the following steps:

  1. Call result() with a short-ish timeout.
  2. Call reload()
  3. Update the progress bar (different cases for query_plan present or not)

"""Return query result and display a progress bar while the query running, if tqdm is installed."""
if progress_bar_type is None:
query_result = query_job.result()
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that you could return query_job.result() here. The else statement is then unnecessary, saving us 1 level of indentation.

i += 1
continue
else:
query_result = query_job.result()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a timeout here. It's not clear to me why the above try block is even in the if statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the only difference is the presence of total, which could be set to a default value in the case where query_plan is not present.

while True:
if query_job.query_plan:
total = len(query_job.query_plan)
query_job.reload() # Refreshes the state via a GET request.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't we only want to call reload after result times out?

Also, why would we only call reload inside this if statement? The job might not have a query_plan until after reload is called in many cases.

@HemangChothani
Copy link
Contributor Author

system test failed not related to changes.

Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

P.S. I have opened #388 to fix those failing tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: progress bar during query in QueryJob.to_dataframe and QueryJob.to_arrow
2 participants