Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --fail-fast argument for dbt run and dbt test #2224

Merged
merged 6 commits into from
Mar 24, 2020

Conversation

Raalsky
Copy link
Contributor

@Raalsky Raalsky commented Mar 20, 2020

resolves #1649

Description

Optional flag -x or --fail-fast which enforce a dbt run to end up on first error (or test failure during dbt test). It's really beneficial during long-time runs and development (Continuous Integration pipelines will be grateful).

I'm currently looking for suggestions how to (unit/integration) test the linker, graph parts and of course the tests severity levels. 🍺

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

@drewbanin
Copy link
Contributor

Thanks for opening this PR @Raalsky!! I'll let @beckjake handle the review, but I'm super excited about this feature :D

@cla-bot check

@cla-bot
Copy link

cla-bot bot commented Mar 20, 2020

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Standard.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot
Copy link

cla-bot bot commented Mar 20, 2020

The cla-bot has been summoned, and re-checked this pull request!

@dbt-labs dbt-labs deleted a comment from cla-bot bot Mar 20, 2020
@dbt-labs dbt-labs deleted a comment from cla-bot bot Mar 20, 2020
@drewbanin
Copy link
Contributor

Hmmm... it looks like the CLA bot is mad about those first two commits in this PR. If you're able to rebase this PR and force-push, i think that issue should go away

@Raalsky
Copy link
Contributor Author

Raalsky commented Mar 20, 2020

@drewbanin I've "amend" the first commit but seems to pushed old commits also after rebase. Maybe I should switch to new branch 😢 ?

@beckjake
Copy link
Contributor

beckjake commented Mar 20, 2020

Here's what I do for that kind of problem:

  • git rebase -i dev/octavius-catto - this will bring up a text editor
  • find the bad commit (I assume it'll be the first commit in the list) and change the word pick on it to edit
  • save + quit
  • you'll get dumped back to your prompt
  • git commit --amend --reset-author
  • git rebase --continue
  • git push -f once the rebase is done

You can also use the -i to interactively squash commits if that's a thing you're looking to clean up a bit.

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Can you please rebase your branch on top of dev/octavius-catto? I merged the latest dev/barbara-gittings changes into it, and I think it will make this diff a bit easier for me to read.

CHANGELOG.md Show resolved Hide resolved
@Raalsky Raalsky force-pushed the fail-fast branch 5 times, most recently from 19a4882 to 6f33e6c Compare March 20, 2020 19:18
@Raalsky
Copy link
Contributor Author

Raalsky commented Mar 20, 2020

@beckjake Ok. I've cleanup the dups commits, squash all mine and rebase. Now It looks pretty neat. Enjoy 🚀

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is shaping up to be a great PR!

I'm currently looking for suggestions how to (unit/integration) test the linker, graph parts and of course the tests severity levels. 🍺

Unfortunately, cancellation is always a high-risk, hard-to-test change. The postgres RPC tests should exercise a decent chunk of this code's existing behavior, in particular the stuff around KeyboardInterrupt handling. So if those all pass I'll feel pretty good about us not having regressions!

One path for a kind of hacky integration test I can think of:

  • In setup, create a new audit table in the schema that has a model_name column or something
  • Create a pre-hook that inserts the current model's name into the audit table
  • Create a few models that all should fail at runtime (maybe something like select id from {{ this.schema }}.table_that_does_not_exist?)
  • Run them with something like self.run_dbt(['run', '--threads', '1']). You will need to wrap the call in a with self.assertRaises(FailFastError):... I think.
  • assert that there is only one row in the audit table

@@ -153,6 +155,7 @@ def mark_done(self, node_id):
self.graph.remove_node(node_id)
self._find_new_additions()
self.inner.task_done()
self.some_task_done.notify_all()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to change the context manager here to use with self.some_task_done instead of with self.lock? I assume it's equivalent since some_task_done just owns the lock, but that's what queue.Queue does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great advise 🥇 . I was thinking that it was at least safe and working. I know it should be more consistent with self.lock. I'll be working about it 😉. Thanks a lot for suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. You're right that it's same to use self.lock as the Condition is actually based upon it. I've changed that to be more confident with other GraphQueue methods as suggested.

@@ -248,6 +264,34 @@ def _handle_result(self, result):
cause = None
self._mark_dependent_errors(node.unique_id, result, cause)

def _cancel_connections(self, pool):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea, we should have split this out a long time ago!

@beckjake
Copy link
Contributor

You don't need to do this in this PR unless you're feeling ambitious, but a note for us: Once this merges, we should add the fail_fast flag as a parameter to various (all?) RPC methods and add handling. Someday we'll have the RPC and the cli flags properly unified, but that day is not today and instead it's an irritating manual process :(.

@Raalsky
Copy link
Contributor Author

Raalsky commented Mar 20, 2020

I was thinking about the RPC part but currently I'm not comfortable with at least running an RPC server/client (only a few moments with dbt Cloud editor) 😢

@beckjake
Copy link
Contributor

I was thinking about the RPC part but currently I'm not comfortable with at least running an RPC server/client (only a few moment with dbt Cloud editor) 😢

No problem at all! That's on me, I have been putting off an argument unification issue/PR for a long time and fixing it up every time we add a new flag is just the price I have to pay for that 😄.

@@ -281,6 +281,17 @@ def __init__(self, message, project=None, result_type='invalid_project'):
self.result_type = result_type


class FailFastException(Exception):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beckjake What about this part? I'm not feeling 100% ok with all this logic about Exception/Errors and I was working about something similar to the other ones but without deep understanding what should be here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's an interesting question. Maybe derive from RuntimeException and skip __init__/__str__? It would be nice to attach the failed node to the exception, if we can!

Copy link
Contributor

@beckjake beckjake Mar 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when raising it, something like this (if it works - I have not tested literally anything about it!):
raise FailFastException(result.error, node=getattr(result, 'node', None))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 😉

@cla-bot
Copy link

cla-bot bot commented Mar 22, 2020

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Rafał Jankowski.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email email@example.com
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@cla-bot cla-bot bot removed the cla:yes label Mar 22, 2020
@cla-bot cla-bot bot added the cla:yes label Mar 22, 2020
@Raalsky Raalsky changed the title WIP: Add --fail-fast argument for dbt run and dbt test Add --fail-fast argument for dbt run and dbt test Mar 22, 2020
@Raalsky
Copy link
Contributor Author

Raalsky commented Mar 22, 2020

This is shaping up to be a great PR!

I'm currently looking for suggestions how to (unit/integration) test the linker, graph parts and of course the tests severity levels. 🍺

Unfortunately, cancellation is always a high-risk, hard-to-test change. The postgres RPC tests should exercise a decent chunk of this code's existing behavior, in particular the stuff around KeyboardInterrupt handling. So if those all pass I'll feel pretty good about us not having regressions!

One path for a kind of hacky integration test I can think of:

* In setup, create a new audit table in the schema that has a `model_name` column or something

* Create a pre-hook that inserts the current model's name into the audit table

* Create a few models that all should fail at runtime (maybe something like `select id from {{ this.schema }}.table_that_does_not_exist`?)

* Run them with something like `self.run_dbt(['run', '--threads', '1'])`. You will need to wrap the call in a `with self.assertRaises(FailFastError):`... I think.

* assert that there is only one row in the audit table

I've added such integration test.
I also encounter that before pg_sleep(random()) there was possibility to run 2 of 2 models before end of main thread (during the adapter connection cancellation). I think the reason is that --threads 1 means not actually one thread but main + 1 worker. The worker could get next task and execute hooks right before beeing stoped due to connection cancellation. I think it's not a real problem and I don't think there is a way to work with this without raising inside the workers.

@beckjake
Copy link
Contributor

I think the reason is that --threads 1 means not actually one thread but main + 1 worker. The worker could get next task and execute hooks right before beeing stoped due to connection cancellation.

That's totally correct, great point.

For the tests, it's almost like the [gw1] thread in the test suite went haywire 😕 - I've kicked it off again, hopefully it's just the flaky/weird part of the RPC suite acting up again.

@beckjake beckjake merged commit 8487166 into dbt-labs:dev/octavius-catto Mar 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: --fail-fast argument for dbt run and dbt test
3 participants