Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add testing #22

Merged
merged 31 commits into from
Aug 11, 2022
Merged

Add testing #22

merged 31 commits into from
Aug 11, 2022

Conversation

JCZuurmond
Copy link
Contributor

This PR is a proposal for adding testing using pytest. I would like to try this first here. If we are happy with the set-up, I want to move this into a python package so that it can be used for all dbt-spark projects.

Copy link
Contributor Author

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 : I would love to hear what you think about this set-up!

The main goal is to find a way to test our macros more easily. In expect that this could be a lightweight set-up to allow dbt-spark users to define pytest test for their macros.

There are some integration points with the dbt-core I would like to get your feedback on.

test-requirements.txt Outdated Show resolved Hide resolved
dbt.tracking.active_user = User(os.getcwd())


class _SparkConnectionManager(SparkConnectionManager):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened an issue in dbt-spark to add a pyspark connection to the package. Then it is not needed to do this anymore.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really interesting, responded in the issue

@pytest.mark.parametrize(
"macro_generator", ["macro.spark_utils.get_tables"], indirect=True
)
def test_spark_session(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I consider the power of this set-up: write a pytest unit test to test your dbt macros!

Copy link
Collaborator

@jtcohen6 jtcohen6 Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really neat!!!

We've long wanted/needed a way to write reasonable unit tests for dbt macros. (We've done it before, but in ways that are pretty hard to untangle/replicate.)

If we could find a way to do this over and over, with minimal setup, it would open up better testing for:

  • adapter plugin development (write + test each macro as you go)
  • package development (such as this package)
  • custom macro/materialization development in users' own projects

from dbt.parser.manifest import ManifestLoader
from dbt.tracking import User
from pyspark.sql import SparkSession
from sodaspark.scan import Connection
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the issue in dbt-spark about replacing the Connection with a reference to the SparkSession

from sodaspark.scan import Connection


dbt.tracking.active_user = User(os.getcwd())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbt complained about not having an active_user (it was None). If possible, I would like to remove this.

Copy link
Collaborator

@jtcohen6 jtcohen6 Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you see these complaints when you instantiated RuntimeConfig? Most of our integration tests expect active_user to be None (i.e. anonymous tracking disabled)

args = Args()

# Sets the Spark plugin in dbt.adapters.factory.FACTORY
config = RuntimeConfig.from_args(args)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbt is written as a command line tool. The entrypoint of dbt - the dbt cli - uses the parsed arguments heavily. I tried to find a balance between reusing what is in dbt-core with minimal dependencies: here we still use the from_args class method to create a RuntimeConfig - but with a smaller Args class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heard. For our part, we know that:

Copy link
Contributor Author

@JCZuurmond JCZuurmond Jan 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that would help. Also, see the comment (in the updated code):

requires a profile in your project which also exists in your profiles file

I would like to not require a profile to be present. Especially for dbt packages this does not makes sense. Also, when running the test suite in a CI, a profile is not necessarily present.

# Sets the Spark plugin in dbt.adapters.factory.FACTORY
config = RuntimeConfig.from_args(args)

register_adapter(config)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stores the Spark adapter in a global environment variable

def adapter(config: RuntimeConfig) -> AdapterContainer:
adapter = get_adapter(config)

connection_manager = _SparkConnectionManager(adapter.config)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, I would like to move this into dbt-spark

Copy link
Collaborator

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JCZuurmond Very neat, thanks for sketching this out! I remember your issue last year that encouraged us to lean much more into pytest built-ins (dbt-labs/dbt-core#3193), reinventing fewer wheels and making it easier for external contributors (especially ones with some python know-how) to jump in.

As you mention, a lot of the initial code in this PR is easier setup/mocking for a dbt-spark connection in python. Let's continue that conversation over in dbt-labs/dbt-spark#272. While it's obviously relevant to this PR/package, I sense it would be a much lighter lift for other adapters/databases.

So, the best bits are right at the end: I'm particularly taken by how you've used @pytest.fixture to mock macro inputs and outputs. This feels incredibly compelling to me. @gshank @emmyoop I'd be especially interested in getting your eyes and thoughts on this approach, as we think about our next-generation approach to testing dbt (with modularity and extensibilty top of mind).

@JCZuurmond
Copy link
Contributor Author

As just said in dbt-labs/dbt-spark#272, I will have a first go at a package for unit testing dbt-spark logic at the end of the month.

I would like some guidance on where to integrate with dbt-core. Let's discuss those in the comments above.

Copy link
Contributor Author

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 : Could you have another look? I have created a package (Github repo) that contains the set-up logic for testing. It is written as a Pytest plugin.

I think it would be best to add the Spark session stuff to the dbt-spark repo. This makes the pytest-dbt-core warehouse independent. I'll continue the discussion about this in this issue.

This PR is not ready to be merged until that issue is resolved.

test-requirements.txt Outdated Show resolved Hide resolved
@jtcohen6
Copy link
Collaborator

jtcohen6 commented Feb 7, 2022

@JCZuurmond This is really neat!! Your work has sparked and anticipated many conversations internally, as we're working to refactor our own testing framework.

If you're interested, check out dbt-labs/dbt-core#4691. We're just exiting the exploratory phase, and experimenting toward a set of concrete proposals. I'd love to hear your feedback over there, and (if you're up for it) find ways to include you in our ongoing work.

@JCZuurmond
Copy link
Contributor Author

@jtcohen6 : Curious to get feedback again! Do you think this way of testing benefits the project? Will you merge this PR?

Copy link
Collaborator

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JCZuurmond I don't have any real opposition to including this sort of testing, as a neat experiment of what's possible for macro unit testing when you have a "database" that can run in-memory. Some neat possibilities for SQLite / DuckDB, perhaps? (cc @dbeatty10)

In practice, this package isn't one I'm able to regularly maintain, so I can't promise taking this any further than where you've brought it to date. I am interested in finding other folks who can help out — at dbt Labs, or potentially in the wider community.

profiles.yml Outdated Show resolved Hide resolved
tests/test_macros.py Outdated Show resolved Hide resolved
@JCZuurmond
Copy link
Contributor Author

Thanks for reviewing the PR, @jtcohen6. I have not tested the library against other adapters than Spark. The lib is written adapter agnostic, but I expect it is easiest to use on in-memory databases.

I am keen to include others on the project or to move parts of it to another project. If you could help with that, that would be great. I have ideas about what I want to change further, however, I prefer to include others first. To hear what others think and to allow the library to grow to be community based.

Copy link
Collaborator

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JCZuurmond Last ask before we merge this: Could you update the README with a quick overview of this test? Plus, some code annotation in tests/test_macros.py?

tests/test_macros.py Outdated Show resolved Hide resolved
@JCZuurmond
Copy link
Contributor Author

@jtcohen6 : Could you have another look?

@JCZuurmond
Copy link
Contributor Author

The CI fails due to tests/functional/test_utils.py expecting a dbt.tests.adapter module.

test-requirements.txt Outdated Show resolved Hide resolved

- name: Run pytest
shell: bash
run: DBT_PROFILES_DIR=$PWD pytest tests/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - this tries to run tests/functional/test_utils.py as well (added in #25), which has other requirements (specified in dev-requirements.txt).

Options:

  1. Unify: Add python -m pip install -r dev-requirements.txt into the "Install dependencies" step above—or better yet, unify dev-requirements + test-requirements into a single file—and try running the tests with --profile session. I imagine that a few may fail (as in dbt-spark), but the rest should succeed, and we can mark the failing ones to skip on the "session" profile.
  2. Separate: Treat these as two totally independent testing frameworks, which is really what they are. Create tests/functional for the tests using the dbt-core functional-testing framework, and tests/unit for the tests using the pytest-dbt-core unit-testing framework. Have clearly separate READMEs and files used by each: separate requirements, separate profile setup, all that jazz.

What do you think? I lean toward option 2, "separate" — and then tests/unit could be a clear, self-contained, and real-world demonstration of the potential of pytest-dbt-core

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to option 2, "separate" (especially different folders tests/unit vs. tests/functional).

Separate folders won't undermine the functionality at all, and it will make the delineations between different frameworks more clear.

Side note: I don't see a problem with them sharing a single dev-requirements.txt though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the test_macros.py in a unit subfolder. And, I merged the test-requirements.txt into the dev-requirements.txt. I think it is confusing to have a test-requirements.txt that does not contain all dependencies for the functional tests.

I had to move the conftest.py into the functional subdirectory because it was interfering with the unit tests.

tests/test_macros.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@jtcohen6 jtcohen6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huzzah!! 🎉

@JCZuurmond Thanks for all your work to get this over the last mile. Excited to have this package as a demo of different approaches to testing complex macros.

@jtcohen6 jtcohen6 merged commit 143c1e7 into dbt-labs:main Aug 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants