-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-711] [Draft] Add Structured Logs to Materializations #5325
Comments
Agree in principle. I'm on the fence about sticking this in the Definitely curious to hear Gerda's thoughts on how to factor this code. Also tagging @dataders for visibility / fruit for thought: To what extent must we shy away from high-impact logging, or other important metadata, living in the adapter?
The important ones I can find:
We use it in our functional test utilities as well, here and here Logging relation info as soon as it's builtThis was another thing we discussed earlier today, related to this effort but possibly its own ticket Let's add structured logging about the relations produced by the materialization, as soon as it finishes building them (here). Materializations already return the set of relations it creates/updates, for the purposes of updating dbt's cache. Why not share the wealth with programmatic consumers of dbt metadata? (Will serializing Relation objects be an absolute nightmare? Relation objects can be reimplemented by adapter, of course, though they all inherit from For now, the only really valuable information included on the relation object is database location (
Put it all together, and we'll be able to provide realer-time access to catalog info, rather than trying to grab it all in one big memory-intensive batch during |
I've come around on this, given the way that we've implemented Python on some adapters. The We'd still want to handle cases where:
|
In python model we currently added a decorator to log things. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Assigning myself to reopen as 2-3 separate issues, and then close this one |
Motivation
Today, run_results.json is a static artifact written at the end of a run which contains a summary of the objects that
dbt run
created and updated along with an incomplete view of the sql from each materialization. This information should be more granular and be visible during a run. Structured logging is a formal interface for programmatically consuming metadata about a dbt run, and is where this more granular information will be streamed to outside consumers.Context
The materialization macro and the adapter functions it calls are natural places to put these log lines. However, many adapters override materialiations directly (i.e. snowflake, bigquery, spark), and others override key adapter functions such as
adapter.execute
(i.e. bigquery) which would require these and future adapter maintainers to remember to copy our log lines exactly the same. Requiring adapter maintainers to take direct action to maintain the reliability of the structured logging interface is something we'd like to avoid, which is why this ticket puts these logs lines before and after the call to execute in statement.sql (source). Since statement.sql is not a Python file, structured logging must be accessible to jinja and today it is not.Implementation
log_event
likelog
(source) so that instead of taking a string msg, it takes a stringevent_name
and a list of values of typeList[Any]
. This will fire an event with the same name as the string passed with the parameters in the list. (UsingEvent.__subclasses__()
, and the splat operator for lists (i.e.*[foo, bar]
) may help here). Mypy will not be able to catch any type mismatches here since the event constructors aren't passed concrete values till runtime. Since these log lines will run with every materialization, tests should catch any issues with these log lines.adapter_response
object, and node name. Today we stringify the adapter reponse first making it much harder to consume. Special attention will have to be paid to serialization since each warehouse will be putting different values into their own responses. Using examples from today's warehouses for inputs to serialization tests is a good idea. The name of the currently running node is accessible via the "jinja god context" (TODO: link example of how to access the jinja god context) (TODO:adapter.execute
is called in other places too. We should add log lines there too. Link exact lines here.)The text was updated successfully, but these errors were encountered: