Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly handle an empty dbt_project.yml in dbt debug #2120

Merged

Conversation

nchammas
Copy link
Contributor

@nchammas nchammas commented Feb 11, 2020

It's confusing to access the dbt project from within _load_profile() since the project has not been validated yet. If the project is invalid, the code validating the profile fails.

I've tried to handle this case gracefully using the existing patterns.

Old behavior:

$ dbt debug
Running with dbt=0.15.2
dbt version: 0.15.2
python version: 3.7.6
python path: .../venv/bin/python
os info: Darwin-18.7.0-x86_64-i386-64bit
Using profiles.yml file at .../.dbt/profiles.yml

Encountered an error:
'NoneType' object does not support item assignment

New behavior:

$ dbt debug
Running with dbt=0.16.0-b1
dbt version: 0.16.0-b1
python version: 3.7.6
python path: .../dbt/env/bin/python
os info: Darwin-18.7.0-x86_64-i386-64bit
Using profiles.yml file at .../.dbt/profiles.yml
Using dbt_project.yml file at .../dbt/dbt_project.yml

Configuration:
  profiles.yml file [OK found and valid]
  dbt_project.yml file [ERROR invalid]

Project loading failed for the following reason:
Runtime Error
  dbt_project.yml does not parse to a dictionary

Required dependencies:
 - git [OK found]

Could not load dbt_project.yml

Fixes #2116.

@cla-bot
Copy link

cla-bot bot commented Feb 11, 2020

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, don't hesitate to ping @drewbanin.

CLA has not been signed by users: @nchammas

@@ -189,6 +189,12 @@ def _raw_project_from(project_root: str) -> Dict[str, Any]:
)

project_dict = _load_yaml(project_yaml_filepath)
Copy link
Contributor Author

@nchammas nchammas Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we should just validate the complete project schema here, perhaps using something like StrictYAML.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cool! I'll have to look into it. Sadly it looks like they don't support anchors, but maybe we'd live without that. We do validate the parsed project using hologram, but I would love a smaller, more close-to-the-input schema definition for your yaml files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are anchors?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foo: &my_anchor
  a: 1
  b: [2, 3]
  c: false
bar: *my_anchor

Is equivalent to:

foo:
  a: 1
  b: [2, 3]
  c: false
bar:
  a: 1
  b: [2, 3]
  c: false

There's also a field-override syntax, which is very convenient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, StrictYAML removes a lot of YAML features like that.

I'm not familiar with Hologram, but I have heard of Cerberus as an option for flexible data validation.

core/dbt/task/debug.py Outdated Show resolved Hide resolved
@drewbanin
Copy link
Contributor

Thanks for opening this PR @nchammas! @cla-bot check

Adding @beckjake to review :)

@cla-bot cla-bot bot added the cla:yes label Feb 11, 2020
@cla-bot
Copy link

cla-bot bot commented Feb 11, 2020

The cla-bot has been summoned, and re-checked this pull request!

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nchammas, this looks great! Thanks for your contribution. Do you think you could add a test? You should be able to copy+paste and change one of the tests in test/integration/049_debug_tests/test_debug.py to generate an empty profile (I'd probably start with the test_postgres_invalid_project_outside_current_dir test).

@@ -189,6 +189,12 @@ def _raw_project_from(project_root: str) -> Dict[str, Any]:
)

project_dict = _load_yaml(project_yaml_filepath)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cool! I'll have to look into it. Sadly it looks like they don't support anchors, but maybe we'd live without that. We do validate the parsed project using hologram, but I would love a smaller, more close-to-the-input schema definition for your yaml files.

core/dbt/task/debug.py Show resolved Hide resolved
@nchammas
Copy link
Contributor Author

@beckjake - I'm trying to run that test locally, but it seems the way we've setup DBTIntegrationTest.database_host() makes it so you can't run tests locally:

$ pytest -k test_postgres_invalid_project_outside_current_dir
...
>           raise dbt.exceptions.FailedToConnectException(str(e))
E           dbt.exceptions.FailedToConnectException: Database Error
E             could not translate host name "database" to address: nodename nor servname provided, or not known

.../dbt/plugins/postgres/dbt/adapters/postgres/connections.py:111: FailedToConnectException

Here's the culprit: https://github.com/fishtown-analytics/dbt/blob/4e58589afdbb5e74701753e239536e318c5d8db7/test/integration/base.py#L128-L131

Why does that method return database? And what's the proper way for me to override that? The test.env.sample does not show anything for Postgres, and trying stuff like POSTGRES_TEST_HOST or POSTGRESQL_TEST_HOST in my test.env doesn't seem to change how dbt tries to connect to the database.

@beckjake
Copy link
Contributor

@nchammas I recommend using docker-compose! The test assume you're using the database in our docker-compose.yml and that you've run the setup script. From the dbt repository checkout:

docker-compose up -d postgres
docker-compose run --rm test /bin/bash ./test/setup_db.sh
docker-compose run --rm test tox -e explicit-py36 -- -s -m profile_postgres ./test/integration/049_dbt_debug_test/test_debug.py

I think that will get your tests working properly.

@nchammas
Copy link
Contributor Author

OK, I'll try that. The contributing guide isn't clear that this is a requirement for running the tests, but then again it doesn't suggest calling pytest directly either. 😄

@nchammas
Copy link
Contributor Author

I believe you meant docker-compose up -d database, no?

And on the third step you provided I'm getting the same error I reported in #2128.

@beckjake
Copy link
Contributor

Oops, yes, you're correct - postgres is the image name, database is the service name!

As I commented on #2128, something is broken with docker image building (I obviously don't do it often!). The circleci tests use fishtownjacob/test-container:3 for their image, instead of building the local docker image - you can always substitute that in to the docker-compose file to get running locally.

@nchammas
Copy link
Contributor Author

I'm getting the same error with fishtownjacob/test-container:3. Are you able to reproduce the issue locally?

@beckjake
Copy link
Contributor

Yeah, I am. I didn't really want to spend my day on this but I guess I'm going to have to figure out what's wrong since I can't run tests anymore.

@nchammas nchammas force-pushed the dbt-debug-empty-project branch 2 times, most recently from 440b9ba to cf90232 Compare February 13, 2020 20:24
@nchammas
Copy link
Contributor Author

nchammas commented Feb 13, 2020

Tests pass locally with:

docker-compose run --rm test tox -e explicit-py36 -- -s -m profile_postgres \
    ./test/integration/049_dbt_debug_test/test_debug.py::TestDebugInvalidProject

By the way, that set of 4 tests alone takes 12 seconds to run. Is that normal?

@nchammas
Copy link
Contributor Author

Not sure why some of the checks aren't running, but the ones that did run, passed.

Copy link
Contributor

@beckjake beckjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why some of the checks aren't running, but the ones that did run, passed.

The ones that aren't running require manual intervention to run. I'll kick them off, thanks for your contribution!

@beckjake
Copy link
Contributor

@nchammas before I merge/kick off tests, could you rebase this against dev/barbara-gittings, and squash it into one commit? Those tests shouldn't fail if these passed.

@nchammas
Copy link
Contributor Author

I did recently merge dev/barbara-gittings into this branch as part of
5595267, and GitHub lets you squash and merge PRs, which is what I normally do to merge PRs.

Is that not good enough? If not, I can manually rebase and squash as you request, but it seems like something we shouldn't need to do.

@beckjake
Copy link
Contributor

@nchammas sorry, I haven't been attentive to github! We have squash and merge disabled so I actually can't do it at merge-time.

@nchammas
Copy link
Contributor Author

OK, done.

@beckjake
Copy link
Contributor

Thanks! I've kicked off the tests. Once they pass, I'll merge this.

@beckjake beckjake merged commit ea4948f into dbt-labs:dev/barbara-gittings Feb 18, 2020
@nchammas nchammas deleted the dbt-debug-empty-project branch February 18, 2020 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dbt debug cannot handle an empty dbt_project.yml
3 participants