Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

populate the cache on compile, too (#1705) #1737

Closed

Conversation

beckjake
Copy link
Contributor

Fixes #1705

dbt compile will now populate the cache before the run, but after the manifest has been built, just like dbt run/dbt test already do.

I don't think dbt ls will need this as it doesn't compile nodes, it just compiles a manifest.

I didn't implement lazy-loading, though we could definitely also choose to lazy-load the cache on a per-schema basis. That would help if you had a project with many schemas and only a couple get_relation/is_incremental/similar calls.

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: wrong PR :)

@drewbanin
Copy link
Contributor

drewbanin commented Sep 11, 2019

Ok - I thought about this one a bunch more. How much is involved in lazily loading the relation cache? I think that this implementation is sort of a toss-up: it might make dbt compile faster for some projects and slower for others. It really does depend on 1) the proportion of schemas which get_relation() is called against and 2) the number of times that method is called against each particular schema.

Some examples:

  • 1 schema, no calls to get_relation()
    • old: 0 queries
    • new: 1 query
  • 1 schema, 10 calls to get_relation()
    • old: 10 queries
    • new: 1 query
  • 5 schemas, 1 call to get_relation()
    • old: 1 query
    • new: 5 queries
  • 5 schemas, 10 calls to get_relation()
    • old: 10 queries
    • new: 5 queries

So, it's not super clear that this will actually be better in the general case. If we're able to lazily populate the cache, then I do think that implementation would be strictly better than what we have in place currently.

If lazy loading is a significant project, then it might make sense to hold on that for now. Speeding up dbt compile is a worthwhile endeavor, but we don't necessarily have to do it right this moment.

@drewbanin
Copy link
Contributor

closing this - see #1705

@drewbanin drewbanin closed this Sep 12, 2019
@beckjake beckjake deleted the feature/compile-populate-cache branch April 23, 2020 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement the relation cache for dbt compile and dbt ls
2 participants