Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(SQL Lab): hang when result set size is too big #30522

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

anamitraadhikari
Copy link
Contributor

@anamitraadhikari anamitraadhikari commented Oct 5, 2024

SUMMARY

Users were encountering performance issues in SQL Lab, where large query result payloads caused the browser to hang. This issue was noticeable even when result sets contained only a few rows (e.g., 1-10 rows), but the payload was quite large. The large payloads overwhelmed the browser’s ability to render the data, resulting in crashes or freezing.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Before:
Screenshot 2024-10-06 at 3 05 03 AM

After:
Screenshot 2024-10-06 at 3 11 17 AM

TESTING INSTRUCTIONS

Test by running a query that generates a large-sized result (e.g., > 100 MB), and verify that an appropriate error message is triggered if the result size becomes too large to handle efficiently. The size limit, after which the handling logic is triggered, can be defined in the Superset config file under SQL_LAB_PAYLOAD_MAX_MB.

ADDITIONAL INFORMATION

NA

@dosubot dosubot bot added the sqllab Namespace | Anything related to the SQL Lab label Oct 5, 2024
@anamitraadhikari anamitraadhikari marked this pull request as draft October 5, 2024 01:18
@anamitraadhikari anamitraadhikari changed the title [Draft] fix(sql lab): SQL Lab hangs when result set size is too big (fix) SQL Lab hangs when result set size is too big Oct 5, 2024
Copy link

codecov bot commented Oct 5, 2024

Codecov Report

Attention: Patch coverage is 27.77778% with 13 lines in your changes missing coverage. Please review.

Project coverage is 77.32%. Comparing base (76d897e) to head (76bebbc).
Report is 837 commits behind head on master.

Files with missing lines Patch % Lines
superset/sql_lab.py 18.75% 13 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #30522       +/-   ##
===========================================
+ Coverage   60.48%   77.32%   +16.83%     
===========================================
  Files        1931      533     -1398     
  Lines       76236    38559    -37677     
  Branches     8568        0     -8568     
===========================================
- Hits        46114    29814    -16300     
+ Misses      28017     8745    -19272     
+ Partials     2105        0     -2105     
Flag Coverage Δ
hive 48.99% <22.22%> (-0.18%) ⬇️
javascript ?
mysql 76.77% <27.77%> (?)
postgres 76.87% <27.77%> (?)
presto 53.47% <22.22%> (-0.33%) ⬇️
python 77.32% <27.77%> (+13.83%) ⬆️
sqlite 76.32% <27.77%> (?)
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@anamitraadhikari anamitraadhikari changed the title (fix) SQL Lab hangs when result set size is too big fix(SQL Lab): hang when result set size is too big Oct 6, 2024
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass comments

@@ -408,3 +408,4 @@ zipp==3.19.0
# via importlib-metadata
zstandard==0.22.0
# via flask-compress
PyHive==0.7.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change?

Suggested change
PyHive==0.7.0
PyHive==0.7.0

@@ -955,6 +955,9 @@ class D3TimeFormat(TypedDict, total=False):
SQLLAB_SAVE_WARNING_MESSAGE = None
SQLLAB_SCHEDULE_WARNING_MESSAGE = None

# Max payload size (MB) for SQL Lab to prevent browser hangs with large results.
SQL_LAB_PAYLOAD_MAX_MB = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the config keys in line with the pre-existing ones above, let's remove the underscore

Suggested change
SQL_LAB_PAYLOAD_MAX_MB = None
SQLLAB_PAYLOAD_MAX_MB = None

@@ -214,7 +215,7 @@ def execute_sql_statement( # pylint: disable=too-many-statements, too-many-loca
insert_rls = (
insert_rls_as_subquery
if database.db_engine_spec.allows_subqueries
and database.db_engine_spec.allows_alias_in_select
and database.db_engine_spec.allows_alias_in_select
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this will be flagged by the linter

Suggested change
and database.db_engine_spec.allows_alias_in_select
and database.db_engine_spec.allows_alias_in_select

Comment on lines +608 to +610
if config.get("SQL_LAB_PAYLOAD_MAX_MB"):
serialized_payload_size = sys.getsizeof(serialized_payload)
sql_lab_payload_max_mb = config["SQL_LAB_PAYLOAD_MAX_MB"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: walrus operator can be used to simplify:

Suggested change
if config.get("SQL_LAB_PAYLOAD_MAX_MB"):
serialized_payload_size = sys.getsizeof(serialized_payload)
sql_lab_payload_max_mb = config["SQL_LAB_PAYLOAD_MAX_MB"]
if sql_lab_payload_max_mb := config.get("SQL_LAB_PAYLOAD_MAX_MB"):
serialized_payload_size = sys.getsizeof(serialized_payload)

logger.info("Result size exceeds the allowed limit.")
raise SupersetErrorException(
SupersetError(
message=f"Result size ({serialized_payload_size / (1024 * 1024):.2f} MB) exceeds the allowed limit of {sql_lab_payload_max_mb} MB.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we're rewriting 1024 * 1024 here and in many places after this. To keep the codebase DRY, let's define it only once as a const.

Comment on lines +133 to +137
def test_execute_sql_statement_exceeds_payload_limit_log_check(mocker: MockerFixture, caplog) -> None:
"""
Test for `execute_sql_statements` when the result payload size exceeds the limit,
and check if the correct log message is captured without raising the exception.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this test? Shouldn't we always raise an exception if the payload size exceeds the threshold?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review:draft size/L sqllab Namespace | Anything related to the SQL Lab
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants