Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt docs generate does not add database to tables #2108

Closed
1 of 5 tasks
danielwlogan opened this issue Feb 7, 2020 · 9 comments
Closed
1 of 5 tasks

dbt docs generate does not add database to tables #2108

danielwlogan opened this issue Feb 7, 2020 · 9 comments
Labels
bigquery bug Something isn't working

Comments

@danielwlogan
Copy link

danielwlogan commented Feb 7, 2020

Describe the bug

When running dbt docs generate, an error is thrown:

Encountered an error:
Runtime Error
  404 Not found: Dataset XXX:YYY was not found in location US

However, the dataset YYY does not exist in database XXX. The yml file that describes the source is written such as:

schema: YYY
database: 'ZZZ'

The code that is automatically generated in a dbt docs generate call erroneously prefixes the dataset with default database, XXX, found in profiles.yml

Steps To Reproduce

  1. Create a source yml file with a dataset in a different database than the default database specified in profiles.yml
  2. Run: dbt docs generate

Expected behavior

The code automatically generated and run when executing a dbt docs generate should prefix all calls to a dataset with the correct database as defined in source yml files.

Screenshots and log output

13:47 $ dbt docs generate
Running with dbt=0.15.2

13:47:57 | Concurrency: 4 threads (target='dev')
13:47:57 | 
13:48:17 | Done.
13:48:17 | Building catalog
Encountered an error:
Runtime Error
  404 Not found: Dataset XXX:YYY was not found in location US
...
tables as (
    select
       project_id as table_database,
       dataset_id as table_schema,
       table_id as original_table_name,

       concat(project_id, '.', dataset_id, '.', table_id) as relation_id,

       row_count,
       size_bytes as size_bytes,
       case
           when type = 1 then 'table'
           when type = 2 then 'view'
           else 'external'
       end as table_type,

        REGEXP_CONTAINS(table_id, '^.+[0-9]{8}$') and coalesce(type, 0) = 1 as is_date_shard,
        REGEXP_EXTRACT(table_id, '^(.+)[0-9]{8}$') as shard_base_name,
        REGEXP_EXTRACT(table_id, '^.+([0-9]{8})$') as shard_name

   from YYY.__TABLES__

 ),

Would like to have the last line be:

from ZZZ.YYY.__TABLES__

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.15.2
   latest version: 0.15.2

The operating system you're using:
Ubuntu 18.04.4 LTS

The output of python --version:
Python 3.7.4

Additional context

A work around is to change the source ymls from:

schema: YYY
database: 'ZZZ'

to

schema: "`ZZZ`.YYY"
database: 'ZZZ'
@danielwlogan danielwlogan added bug Something isn't working triage labels Feb 7, 2020
@drewbanin drewbanin added bigquery and removed triage labels Feb 11, 2020
@drewbanin
Copy link
Contributor

Hi @danielwlogan - are you certain that you were using 0.15.2 when you encountered this issue? I just gave this a spin locally with dbt v0.15.2 and found that the database (project) name from my .yml file was correctly injected into dbt's catalog query.

For me, this looks like:

...
                REGEXP_CONTAINS(table_id, '^.+[0-9]{8}$') and coalesce(type, 0) = 1 as is_date_shard,
                REGEXP_EXTRACT(table_id, '^(.+)[0-9]{8}$') as shard_base_name,
                REGEXP_EXTRACT(table_id, '^.+([0-9]{8})$') as shard_name

            from `dbt-dev-project`.dbt_dbanin.__TABLES__
...

So, dbt is definitely injecting the project id into this query in at least some cases, but it's definitely possible that there's a bug which is impacting your dbt setup. If you can confirm that this occurred on dbt v0.15.2, then we'd certainly be happy to look deeper into it!

@danielwlogan
Copy link
Author

Yes, I can confirm that I am running 0.15.2. However, I think you may have misunderstood my original post. My problem is that the database name from my profiles.yml file is incorrectly injecting itself into the dbt catalog query. That is, I do not want the default database name to be the project name, I want the database name defined in the source.yml (not the ~/.dbt/profiles.yml) to be populated.

@danielwlogan
Copy link
Author

Any progress on this front?

@drewbanin
Copy link
Contributor

@danielwlogan great timing! I actually just pulled this up while prioritizing the next release. I did some digging here after I saw your last comment, but I was not able to replicate the behavior you're describing here. I'll revisit this and prioritize it accordingly :)

@danielwlogan
Copy link
Author

Great. Let me know if there is anything else I can provide or help with in any way.

@drewbanin
Copy link
Contributor

@danielwlogan I was very puzzled here for a very long time, but I just thought about this issue while reviewing a different bug report! Do you by any chance specify a quoting: config either in:

  1. your dbt_project.yml file
  2. a schema.yml file that specifies sources?

If so, is the database quoting config set to false in either of those places?? Check out #2188 -- this bug causes dbt to use the "quote policy" in place of the "include policy", which would indeed lead to the behavior you're seeing here!

@danielwlogan
Copy link
Author

@drewbanin, the answer is yes, we do have the following structure in our schema ymls:

sources:
  - name: XXX
    schema: YYY
    database: 'ZZZ'
    quoting:
      database: false
      schema: false
      identifier: false

@danielwlogan
Copy link
Author

Based on what you have suggested, we have pulled the quoting config into dbt_project.yml and changed them all to true. We then drop the quotes around the {{ source(...) }} and it seems to work.

Thanks @drewbanin for the tip.

@drewbanin
Copy link
Contributor

cool, glad to hear it! I'm going to close this issue as #2188 tracks the actual code change that needs to happen here. This is going to be fixed in the next release of dbt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants