Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No table names shown in Dataset creation window / SQL Lab for Apache Kyuubi #28743

Open
3 tasks done
Miosp opened this issue May 28, 2024 · 5 comments
Open
3 tasks done
Labels
data:connect:hive Related to Hive data:connect:kyuubi sqllab Namespace | Anything related to the SQL Lab

Comments

@Miosp
Copy link

Miosp commented May 28, 2024

Bug description

I'm trying to connect Apache Kyuubi to superset. I settled on hive connector through pyhive (the issues with mySQL and trino connectors are even worse). However I'm facing an issue where my tables are not shown in SQL lab. I can still construct a SQL query and get the data manually, but for the sake of correctness the feature should work.

How to reproduce the bug

  1. Install Superset via helm with minimal configuration (just set the SUPERSET_SECRET_KEY, which is required for the chart)
  2. Connect kyuubi with uri: hive://<kyuubi_ip>:<kyuubi_spark_thrift_port>/default
  3. In the Data Creation window / SQL Lab choose the name of the connection and schema default
  4. See that instead of your tables you see view 'default'

Screenshots/recordings

Zrzut ekranu 2024-05-28 225447

Superset version

4.0.1

Python version

3.9

Node version

I don't know

Browser

Chrome

Additional context

When i click the "refresh" button next to table list in superset, in kyuubi's spark driver log i can see following SQL queries executed:
USE default
SHOW VIEWS IN default
USE default
USE default
SHOW VIEWS IN default

There are no errors in superset's logs.

Checklist

  • I have searched Superset docs and Slack and didn't find a solution to my problem.
  • I have searched the GitHub issue tracker and didn't find a similar bug report.
  • I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
@dosubot dosubot bot added data:connect:hive Related to Hive sqllab Namespace | Anything related to the SQL Lab labels May 28, 2024
Copy link

dosubot bot commented May 28, 2024

@Miosp Hello there! I'm a bot that can help you with bugs, questions, and contributing. I'm here to assist you while you wait for a human maintainer.

To resolve the issue where tables are not displayed in the SQL Lab when connecting Apache Kyuubi to Superset using the Hive connector through PyHive, ensure that the "Expose database in SQL Lab" option is enabled in the database settings. This setting allows the database to be queried in SQL Lab.

Here is the relevant code snippet that shows how to enable this option:

<IndeterminateCheckbox
  id="expose_in_sqllab"
  indeterminate={false}
  checked={!!db?.expose_in_sqllab}
  onChange={onInputChange}
  labelText={t('Expose database in SQL Lab')}
/>
<InfoTooltip
  tooltip={t('Allow this database to be queried in SQL Lab')}
/>

Make sure that the expose_in_sqllab property is set to true for the database configuration. This can be done through the Superset UI in the database settings under the "SQL Lab" section [1].

Additionally, ensure that the Hive engine specification is correctly configured in Superset. The HiveEngineSpec class in superset/db_engine_specs/hive.py should be properly set up to handle the connection and data fetching. This includes methods for fetching data, handling cursor operations, and managing schema and table information. If there are any issues with the Hive engine specification, it could also lead to tables not being displayed in SQL Lab [2].

Moreover, Superset handles schema and table caching through properties defined in the Database class. Specifically, it uses the metadata_cache_timeout property to manage cache timeouts for catalog, schema, and table metadata. These properties are:

  • catalog_cache_enabled and catalog_cache_timeout
  • schema_cache_enabled and schema_cache_timeout
  • table_cache_enabled and table_cache_timeout

These caching mechanisms can affect the visibility of tables in SQL Lab. If the cache is enabled and the timeout is set, the metadata (including schema and table information) will be cached for the specified duration. This means that any changes to the schema or tables in the database may not be immediately visible in SQL Lab until the cache expires or is refreshed [3].

Lastly, there is a known issue with the SHOW TABLES command when used with Apache Kyuubi. Specifically, the SHOW TABLES command does not work as expected with Apache Kyuubi, which might be contributing to the problem [2].

Details

About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@tyagiaksh
Copy link

@dosu still not able to resolve above issue

@tyagiaksh
Copy link

@Miosp have you resolved this issue?

@Miosp
Copy link
Author

Miosp commented Jul 14, 2024

No, in order to overcome this i had to create a source from an SQL query and use that. SQL queries execute just fine.

@tyagiaksh
Copy link

yes I have also solved using that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data:connect:hive Related to Hive data:connect:kyuubi sqllab Namespace | Anything related to the SQL Lab
Projects
None yet
Development

No branches or pull requests

3 participants