Add testing #22

JCZuurmond · 2021-12-28T15:48:08Z

This PR is a proposal for adding testing using pytest. I would like to try this first here. If we are happy with the set-up, I want to move this into a python package so that it can be used for all dbt-spark projects.

JCZuurmond

@jtcohen6 : I would love to hear what you think about this set-up!

The main goal is to find a way to test our macros more easily. In expect that this could be a lightweight set-up to allow dbt-spark users to define pytest test for their macros.

There are some integration points with the dbt-core I would like to get your feedback on.

test-requirements.txt

JCZuurmond · 2021-12-28T15:51:48Z

tests/test_macros.py

+dbt.tracking.active_user = User(os.getcwd())
+
+
+class _SparkConnectionManager(SparkConnectionManager):


I have opened an issue in dbt-spark to add a pyspark connection to the package. Then it is not needed to do this anymore.

Really interesting, responded in the issue

JCZuurmond · 2021-12-28T15:52:51Z

tests/test_macros.py

+@pytest.mark.parametrize(
+    "macro_generator", ["macro.spark_utils.get_tables"], indirect=True
+)
+def test_spark_session(


This is what I consider the power of this set-up: write a pytest unit test to test your dbt macros!

This is really neat!!!

We've long wanted/needed a way to write reasonable unit tests for dbt macros. (We've done it before, but in ways that are pretty hard to untangle/replicate.)

If we could find a way to do this over and over, with minimal setup, it would open up better testing for:

adapter plugin development (write + test each macro as you go)

package development (such as this package)

custom macro/materialization development in users' own projects

JCZuurmond · 2021-12-28T15:53:47Z

tests/test_macros.py

+from dbt.parser.manifest import ManifestLoader
+from dbt.tracking import User
+from pyspark.sql import SparkSession
+from sodaspark.scan import Connection


See the issue in dbt-spark about replacing the Connection with a reference to the SparkSession

JCZuurmond · 2021-12-28T15:54:23Z

tests/test_macros.py

+from sodaspark.scan import Connection
+
+
+dbt.tracking.active_user = User(os.getcwd())


dbt complained about not having an active_user (it was None). If possible, I would like to remove this.

Did you see these complaints when you instantiated RuntimeConfig? Most of our integration tests expect active_user to be None (i.e. anonymous tracking disabled)

JCZuurmond · 2021-12-28T15:56:13Z

tests/test_macros.py

+    args = Args()
+
+    # Sets the Spark plugin in dbt.adapters.factory.FACTORY
+    config = RuntimeConfig.from_args(args)


dbt is written as a command line tool. The entrypoint of dbt - the dbt cli - uses the parsed arguments heavily. I tried to find a balance between reusing what is in dbt-core with minimal dependencies: here we still use the from_args class method to create a RuntimeConfig - but with a smaller Args class.

Heard. For our part, we know that:

RuntimeConfig is a messy grab-bag

We need to decouple the CLI / arg parsing logic from the actual entrypoints to the dbt-core application: Loosely Couple CLI dbt-core#4179 and Split out dbt front end for many benefits dbt-core#4475

yes, that would help. Also, see the comment (in the updated code):

requires a profile in your project which also exists in your profiles file

I would like to not require a profile to be present. Especially for dbt packages this does not makes sense. Also, when running the test suite in a CI, a profile is not necessarily present.

JCZuurmond · 2021-12-28T15:56:52Z

tests/test_macros.py

+    # Sets the Spark plugin in dbt.adapters.factory.FACTORY
+    config = RuntimeConfig.from_args(args)
+
+    register_adapter(config)


This stores the Spark adapter in a global environment variable

JCZuurmond · 2021-12-28T15:57:28Z

tests/test_macros.py

+def adapter(config: RuntimeConfig) -> AdapterContainer:
+    adapter = get_adapter(config)
+
+    connection_manager = _SparkConnectionManager(adapter.config)


As mentioned before, I would like to move this into dbt-spark

jtcohen6

@JCZuurmond Very neat, thanks for sketching this out! I remember your issue last year that encouraged us to lean much more into pytest built-ins (dbt-labs/dbt-core#3193), reinventing fewer wheels and making it easier for external contributors (especially ones with some python know-how) to jump in.

As you mention, a lot of the initial code in this PR is easier setup/mocking for a dbt-spark connection in python. Let's continue that conversation over in dbt-labs/dbt-spark#272. While it's obviously relevant to this PR/package, I sense it would be a much lighter lift for other adapters/databases.

So, the best bits are right at the end: I'm particularly taken by how you've used @pytest.fixture to mock macro inputs and outputs. This feels incredibly compelling to me. @gshank @emmyoop I'd be especially interested in getting your eyes and thoughts on this approach, as we think about our next-generation approach to testing dbt (with modularity and extensibilty top of mind).

JCZuurmond · 2022-01-13T08:31:37Z

As just said in dbt-labs/dbt-spark#272, I will have a first go at a package for unit testing dbt-spark logic at the end of the month.

I would like some guidance on where to integrate with dbt-core. Let's discuss those in the comments above.

JCZuurmond

@jtcohen6 : Could you have another look? I have created a package (Github repo) that contains the set-up logic for testing. It is written as a Pytest plugin.

I think it would be best to add the Spark session stuff to the dbt-spark repo. This makes the pytest-dbt-core warehouse independent. I'll continue the discussion about this in this issue.

This PR is not ready to be merged until that issue is resolved.

test-requirements.txt

jtcohen6 · 2022-02-07T14:53:40Z

@JCZuurmond This is really neat!! Your work has sparked and anticipated many conversations internally, as we're working to refactor our own testing framework.

If you're interested, check out dbt-labs/dbt-core#4691. We're just exiting the exploratory phase, and experimenting toward a set of concrete proposals. I'd love to hear your feedback over there, and (if you're up for it) find ways to include you in our ongoing work.

JCZuurmond · 2022-05-27T16:13:58Z

@jtcohen6 : Curious to get feedback again! Do you think this way of testing benefits the project? Will you merge this PR?

jtcohen6

@JCZuurmond I don't have any real opposition to including this sort of testing, as a neat experiment of what's possible for macro unit testing when you have a "database" that can run in-memory. Some neat possibilities for SQLite / DuckDB, perhaps? (cc @dbeatty10)

In practice, this package isn't one I'm able to regularly maintain, so I can't promise taking this any further than where you've brought it to date. I am interested in finding other folks who can help out — at dbt Labs, or potentially in the wider community.

profiles.yml

tests/test_macros.py

JCZuurmond · 2022-07-23T06:06:05Z

Thanks for reviewing the PR, @jtcohen6. I have not tested the library against other adapters than Spark. The lib is written adapter agnostic, but I expect it is easiest to use on in-memory databases.

I am keen to include others on the project or to move parts of it to another project. If you could help with that, that would be great. I have ideas about what I want to change further, however, I prefer to include others first. To hear what others think and to allow the library to grow to be community based.

jtcohen6

@JCZuurmond Last ask before we merge this: Could you update the README with a quick overview of this test? Plus, some code annotation in tests/test_macros.py?

tests/test_macros.py

JCZuurmond · 2022-08-05T15:45:54Z

@jtcohen6 : Could you have another look?

JCZuurmond · 2022-08-05T15:48:40Z

The CI fails due to tests/functional/test_utils.py expecting a dbt.tests.adapter module.

test-requirements.txt

jtcohen6 · 2022-08-10T13:46:19Z

.github/workflows/workflow.yml

+
+     - name: Run pytest
+       shell: bash
+       run: DBT_PROFILES_DIR=$PWD pytest tests/


I see - this tries to run tests/functional/test_utils.py as well (added in #25), which has other requirements (specified in dev-requirements.txt).

Options:

Unify: Add python -m pip install -r dev-requirements.txt into the "Install dependencies" step above—or better yet, unify dev-requirements + test-requirements into a single file—and try running the tests with --profile session. I imagine that a few may fail (as in dbt-spark), but the rest should succeed, and we can mark the failing ones to skip on the "session" profile.

Separate: Treat these as two totally independent testing frameworks, which is really what they are. Create tests/functional for the tests using the dbt-core functional-testing framework, and tests/unit for the tests using the pytest-dbt-core unit-testing framework. Have clearly separate READMEs and files used by each: separate requirements, separate profile setup, all that jazz.

What do you think? I lean toward option 2, "separate" — and then tests/unit could be a clear, self-contained, and real-world demonstration of the potential of pytest-dbt-core

+1 to option 2, "separate" (especially different folders tests/unit vs. tests/functional).

Separate folders won't undermine the functionality at all, and it will make the delineations between different frameworks more clear.

Side note: I don't see a problem with them sharing a single dev-requirements.txt though.

I moved the test_macros.py in a unit subfolder. And, I merged the test-requirements.txt into the dev-requirements.txt. I think it is confusing to have a test-requirements.txt that does not contain all dependencies for the functional tests.

I had to move the conftest.py into the functional subdirectory because it was interfering with the unit tests.

tests/test_macros.py

jtcohen6

Huzzah!! 🎉

@JCZuurmond Thanks for all your work to get this over the last mile. Excited to have this package as a demo of different approaches to testing complex macros.

JCZuurmond commented Dec 28, 2021

View reviewed changes

JCZuurmond force-pushed the add-testing branch 3 times, most recently from fa668eb to b4a9add Compare December 28, 2021 16:02

jtcohen6 mentioned this pull request Jan 4, 2022

Add connection method for running dbt against a Spark session dbt-labs/dbt-spark#272

Closed

jtcohen6 reviewed Jan 4, 2022

View reviewed changes

JCZuurmond force-pushed the add-testing branch from 69a8370 to fe8c05a Compare January 28, 2022 12:49

JCZuurmond commented Jan 28, 2022

View reviewed changes

test-requirements.txt Outdated Show resolved Hide resolved

jtcohen6 reviewed Jul 22, 2022

View reviewed changes

profiles.yml Outdated Show resolved Hide resolved

tests/test_macros.py Outdated Show resolved Hide resolved

jtcohen6 reviewed Aug 4, 2022

View reviewed changes

JCZuurmond added 16 commits August 5, 2022 17:42

Add working script to run macro

c6b517d

Add comment about adapters

b3b0a0a

Try using a project instead of runtime config

5bb9728

Remove spark credentials and Project

4e31232

Use connection from soda spark

37aa6e6

Add test requirements

69ed207

Add pytest ini

236286d

Move everything into pytest fixtures

d72ebf2

Copy connection

18170a1

Remove pytest-dbt-core code

25e5806

Add pytest dbt core as test requirement

409d827

Add workflow for testing

56c848a

Bump pytest dbt core version

1560e5e

Add profile to dbt project

3014264

Add profiles

153708b

Add profiles dir when running pytest

f9b0db7

JCZuurmond added 11 commits August 5, 2022 17:43

Remove redundant from future import annotations

52307bb

Bump pytest-dbt-core version

cb447a2

Change version

8b7eb8f

Add pyspark dependency

6dfd9f7

Change pyspark dependency to dbt-spark session

91b6bb1

Change required by to dbt-spark

30112b0

Add test docstring

59f2139

Make test less strict

74482a7

Create and delete table with fixture

df29346

Fix typo

ffe50cb

Add section about testing to the documentation

8b13fda

JCZuurmond force-pushed the add-testing branch from 05a0d23 to 8b13fda Compare August 5, 2022 15:43

JCZuurmond commented Aug 5, 2022

View reviewed changes

tests/test_macros.py Outdated Show resolved Hide resolved

jtcohen6 reviewed Aug 10, 2022

View reviewed changes

JCZuurmond added 4 commits August 11, 2022 07:57

Move test macros into tests/unit

29e88d6

Run unit tests only in Github action

ee25a3e

Merge dev and test requirements

bbc7923

Move conftest into functional

d380607

jtcohen6 approved these changes Aug 11, 2022

View reviewed changes

jtcohen6 merged commit 143c1e7 into dbt-labs:main Aug 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add testing #22

Add testing #22

JCZuurmond commented Dec 28, 2021

JCZuurmond left a comment

JCZuurmond Dec 28, 2021

jtcohen6 Jan 4, 2022

JCZuurmond Dec 28, 2021

jtcohen6 Jan 4, 2022 •

edited

Loading

JCZuurmond Dec 28, 2021

JCZuurmond Dec 28, 2021

jtcohen6 Jan 4, 2022 •

edited

Loading

JCZuurmond Dec 28, 2021

jtcohen6 Jan 4, 2022

JCZuurmond Jan 13, 2022 •

edited

Loading

JCZuurmond Dec 28, 2021

JCZuurmond Dec 28, 2021

jtcohen6 left a comment

JCZuurmond commented Jan 13, 2022

JCZuurmond left a comment •

edited

Loading

jtcohen6 commented Feb 7, 2022

JCZuurmond commented May 27, 2022

jtcohen6 left a comment

JCZuurmond commented Jul 23, 2022

jtcohen6 left a comment

JCZuurmond commented Aug 5, 2022

JCZuurmond commented Aug 5, 2022

jtcohen6 Aug 10, 2022

dbeatty10 Aug 10, 2022

JCZuurmond Aug 11, 2022

jtcohen6 left a comment

		dbt.tracking.active_user = User(os.getcwd())


		class _SparkConnectionManager(SparkConnectionManager):

		from sodaspark.scan import Connection


		dbt.tracking.active_user = User(os.getcwd())

Add testing #22

Add testing #22

Conversation

JCZuurmond commented Dec 28, 2021

JCZuurmond left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 Jan 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 Jan 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JCZuurmond Jan 13, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 left a comment

Choose a reason for hiding this comment

JCZuurmond commented Jan 13, 2022

JCZuurmond left a comment • edited Loading

Choose a reason for hiding this comment

jtcohen6 commented Feb 7, 2022

JCZuurmond commented May 27, 2022

jtcohen6 left a comment

Choose a reason for hiding this comment

JCZuurmond commented Jul 23, 2022

jtcohen6 left a comment

Choose a reason for hiding this comment

JCZuurmond commented Aug 5, 2022

JCZuurmond commented Aug 5, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 left a comment

Choose a reason for hiding this comment

jtcohen6 Jan 4, 2022 •

edited

Loading

jtcohen6 Jan 4, 2022 •

edited

Loading

JCZuurmond Jan 13, 2022 •

edited

Loading

JCZuurmond left a comment •

edited

Loading