Add --fail-fast argument for dbt run and dbt test #2224

Raalsky · 2020-03-20T14:36:50Z

resolves #1649

Description

Optional flag -x or --fail-fast which enforce a dbt run to end up on first error (or test failure during dbt test). It's really beneficial during long-time runs and development (Continuous Integration pipelines will be grateful).

I'm currently looking for suggestions how to (unit/integration) test the linker, graph parts and of course the tests severity levels. 🍺

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
I have updated the CHANGELOG.md and added information about my change to the "dbt next" section.

drewbanin · 2020-03-20T15:14:56Z

Thanks for opening this PR @Raalsky!! I'll let @beckjake handle the review, but I'm super excited about this feature :D

@cla-bot check

cla-bot · 2020-03-20T15:14:57Z

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Standard.
This is most likely caused by a git client misconfiguration; please make sure to:

check if your git client is configured with an email to sign commits git config --list | grep email
If not, set it up using git config --global user.email email@example.com
Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

cla-bot · 2020-03-20T15:14:59Z

The cla-bot has been summoned, and re-checked this pull request!

drewbanin · 2020-03-20T15:16:00Z

Hmmm... it looks like the CLA bot is mad about those first two commits in this PR. If you're able to rebase this PR and force-push, i think that issue should go away

Raalsky · 2020-03-20T16:07:50Z

@drewbanin I've "amend" the first commit but seems to pushed old commits also after rebase. Maybe I should switch to new branch 😢 ?

beckjake · 2020-03-20T16:59:14Z

Here's what I do for that kind of problem:

git rebase -i dev/octavius-catto - this will bring up a text editor
find the bad commit (I assume it'll be the first commit in the list) and change the word pick on it to edit
save + quit
you'll get dumped back to your prompt
git commit --amend --reset-author
git rebase --continue
git push -f once the rebase is done

You can also use the -i to interactively squash commits if that's a thing you're looking to clean up a bit.

beckjake

This looks good! Can you please rebase your branch on top of dev/octavius-catto? I merged the latest dev/barbara-gittings changes into it, and I think it will make this diff a bit easier for me to read.

CHANGELOG.md

Raalsky · 2020-03-20T19:29:21Z

@beckjake Ok. I've cleanup the dups commits, squash all mine and rebase. Now It looks pretty neat. Enjoy 🚀

beckjake

This is shaping up to be a great PR!

I'm currently looking for suggestions how to (unit/integration) test the linker, graph parts and of course the tests severity levels. 🍺

Unfortunately, cancellation is always a high-risk, hard-to-test change. The postgres RPC tests should exercise a decent chunk of this code's existing behavior, in particular the stuff around KeyboardInterrupt handling. So if those all pass I'll feel pretty good about us not having regressions!

One path for a kind of hacky integration test I can think of:

In setup, create a new audit table in the schema that has a model_name column or something
Create a pre-hook that inserts the current model's name into the audit table
Create a few models that all should fail at runtime (maybe something like select id from {{ this.schema }}.table_that_does_not_exist?)
Run them with something like self.run_dbt(['run', '--threads', '1']). You will need to wrap the call in a with self.assertRaises(FailFastError):... I think.
assert that there is only one row in the audit table

beckjake · 2020-03-20T19:19:58Z

core/dbt/linker.py

@@ -153,6 +155,7 @@ def mark_done(self, node_id):
            self.graph.remove_node(node_id)
            self._find_new_additions()
            self.inner.task_done()
+            self.some_task_done.notify_all()


Do we need to change the context manager here to use with self.some_task_done instead of with self.lock? I assume it's equivalent since some_task_done just owns the lock, but that's what queue.Queue does.

That's a great advise 🥇 . I was thinking that it was at least safe and working. I know it should be more consistent with self.lock. I'll be working about it 😉. Thanks a lot for suggestion.

Ok. You're right that it's same to use self.lock as the Condition is actually based upon it. I've changed that to be more confident with other GraphQueue methods as suggested.

beckjake · 2020-03-20T19:22:01Z

core/dbt/task/runnable.py

@@ -248,6 +264,34 @@ def _handle_result(self, result):
                cause = None
            self._mark_dependent_errors(node.unique_id, result, cause)

+    def _cancel_connections(self, pool):


This is a good idea, we should have split this out a long time ago!

beckjake · 2020-03-20T19:37:49Z

You don't need to do this in this PR unless you're feeling ambitious, but a note for us: Once this merges, we should add the fail_fast flag as a parameter to various (all?) RPC methods and add handling. Someday we'll have the RPC and the cli flags properly unified, but that day is not today and instead it's an irritating manual process :(.

Raalsky · 2020-03-20T19:45:50Z

I was thinking about the RPC part but currently I'm not comfortable with at least running an RPC server/client (only a few moments with dbt Cloud editor) 😢

beckjake · 2020-03-20T19:49:11Z

I was thinking about the RPC part but currently I'm not comfortable with at least running an RPC server/client (only a few moment with dbt Cloud editor) 😢

No problem at all! That's on me, I have been putting off an argument unification issue/PR for a long time and fixing it up every time we add a new flag is just the price I have to pay for that 😄.

Raalsky · 2020-03-20T19:59:18Z

core/dbt/exceptions.py

@@ -281,6 +281,17 @@ def __init__(self, message, project=None, result_type='invalid_project'):
        self.result_type = result_type


+class FailFastException(Exception):


@beckjake What about this part? I'm not feeling 100% ok with all this logic about Exception/Errors and I was working about something similar to the other ones but without deep understanding what should be here.

Hmm, that's an interesting question. Maybe derive from RuntimeException and skip __init__/__str__? It would be nice to attach the failed node to the exception, if we can!

So when raising it, something like this (if it works - I have not tested literally anything about it!):
raise FailFastException(result.error, node=getattr(result, 'node', None))

cla-bot · 2020-03-22T17:23:26Z

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Rafał Jankowski.
This is most likely caused by a git client misconfiguration; please make sure to:

check if your git client is configured with an email to sign commits git config --list | grep email
If not, set it up using git config --global user.email email@example.com
Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Raalsky · 2020-03-22T17:35:59Z

This is shaping up to be a great PR!

I'm currently looking for suggestions how to (unit/integration) test the linker, graph parts and of course the tests severity levels. 🍺

Unfortunately, cancellation is always a high-risk, hard-to-test change. The postgres RPC tests should exercise a decent chunk of this code's existing behavior, in particular the stuff around KeyboardInterrupt handling. So if those all pass I'll feel pretty good about us not having regressions!

One path for a kind of hacky integration test I can think of:
* In setup, create a new audit table in the schema that has a `model_name` column or something

* Create a pre-hook that inserts the current model's name into the audit table

* Create a few models that all should fail at runtime (maybe something like `select id from {{ this.schema }}.table_that_does_not_exist`?)

* Run them with something like `self.run_dbt(['run', '--threads', '1'])`. You will need to wrap the call in a `with self.assertRaises(FailFastError):`... I think.

* assert that there is only one row in the audit table

I've added such integration test.
I also encounter that before pg_sleep(random()) there was possibility to run 2 of 2 models before end of main thread (during the adapter connection cancellation). I think the reason is that --threads 1 means not actually one thread but main + 1 worker. The worker could get next task and execute hooks right before beeing stoped due to connection cancellation. I think it's not a real problem and I don't think there is a way to work with this without raising inside the workers.

beckjake · 2020-03-23T13:08:26Z

I think the reason is that --threads 1 means not actually one thread but main + 1 worker. The worker could get next task and execute hooks right before beeing stoped due to connection cancellation.

That's totally correct, great point.

For the tests, it's almost like the [gw1] thread in the test suite went haywire 😕 - I've kicked it off again, hopefully it's just the flaky/weird part of the RPC suite acting up again.

dbt-labs deleted a comment from cla-bot bot Mar 20, 2020

Raalsky force-pushed the fail-fast branch from 96a1d0b to 5c3ab45 Compare March 20, 2020 16:03

cla-bot bot added the cla:yes label Mar 20, 2020

beckjake reviewed Mar 20, 2020

View reviewed changes

CHANGELOG.md Show resolved Hide resolved

Raalsky force-pushed the fail-fast branch 5 times, most recently from 19a4882 to 6f33e6c Compare March 20, 2020 19:18

Added --fail-fast argument for dbt run and dbt test

c961301

Raalsky force-pushed the fail-fast branch from 6f33e6c to c961301 Compare March 20, 2020 19:26

beckjake reviewed Mar 20, 2020

View reviewed changes

Raalsky commented Mar 20, 2020

View reviewed changes

Raalsky force-pushed the fail-fast branch from ea660da to ee0cd70 Compare March 22, 2020 17:23

cla-bot bot removed the cla:yes label Mar 22, 2020

Raalsky added 3 commits March 22, 2020 18:27

FailFastException refactor to RuntimeException

1478be7

Linker some_task_done method lock refactor to be more consistent

a23ad3a

Added integration test 058_fail_fast

42a0461

Raalsky force-pushed the fail-fast branch from ee0cd70 to 42a0461 Compare March 22, 2020 17:28

cla-bot bot added the cla:yes label Mar 22, 2020

Raalsky changed the title ~~WIP: Add --fail-fast argument for dbt run and dbt test~~ Add --fail-fast argument for dbt run and dbt test Mar 22, 2020

058_fail_fast test method renamed

efebabe

Merge branch 'dev/octavius-catto' into fail-fast

0a0e71a

beckjake merged commit 8487166 into dbt-labs:dev/octavius-catto Mar 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --fail-fast argument for dbt run and dbt test #2224

Add --fail-fast argument for dbt run and dbt test #2224

Raalsky commented Mar 20, 2020 •

edited

Loading

drewbanin commented Mar 20, 2020

cla-bot bot commented Mar 20, 2020

cla-bot bot commented Mar 20, 2020

drewbanin commented Mar 20, 2020

Raalsky commented Mar 20, 2020

beckjake commented Mar 20, 2020 •

edited

Loading

beckjake left a comment •

edited

Loading

Raalsky commented Mar 20, 2020 •

edited

Loading

beckjake left a comment

beckjake Mar 20, 2020

Raalsky Mar 20, 2020

Raalsky Mar 20, 2020

beckjake Mar 20, 2020

beckjake commented Mar 20, 2020

Raalsky commented Mar 20, 2020 •

edited

Loading

beckjake commented Mar 20, 2020

Raalsky Mar 20, 2020

beckjake Mar 20, 2020

beckjake Mar 20, 2020 •

edited

Loading

Raalsky Mar 20, 2020

Raalsky Mar 20, 2020

cla-bot bot commented Mar 22, 2020

Raalsky commented Mar 22, 2020

beckjake commented Mar 23, 2020

		@@ -281,6 +281,17 @@ def __init__(self, message, project=None, result_type='invalid_project'):
		self.result_type = result_type


		class FailFastException(Exception):

Add --fail-fast argument for dbt run and dbt test #2224

Add --fail-fast argument for dbt run and dbt test #2224

Conversation

Raalsky commented Mar 20, 2020 • edited Loading

Description

Checklist

drewbanin commented Mar 20, 2020

cla-bot bot commented Mar 20, 2020

cla-bot bot commented Mar 20, 2020

drewbanin commented Mar 20, 2020

Raalsky commented Mar 20, 2020

beckjake commented Mar 20, 2020 • edited Loading

beckjake left a comment • edited Loading

Choose a reason for hiding this comment

Raalsky commented Mar 20, 2020 • edited Loading

beckjake left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beckjake commented Mar 20, 2020

Raalsky commented Mar 20, 2020 • edited Loading

beckjake commented Mar 20, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

beckjake Mar 20, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cla-bot bot commented Mar 22, 2020

Raalsky commented Mar 22, 2020

beckjake commented Mar 23, 2020

Raalsky commented Mar 20, 2020 •

edited

Loading

beckjake commented Mar 20, 2020 •

edited

Loading

beckjake left a comment •

edited

Loading

Raalsky commented Mar 20, 2020 •

edited

Loading

Raalsky commented Mar 20, 2020 •

edited

Loading

beckjake Mar 20, 2020 •

edited

Loading