Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(presto): add support for user impersonation #13214

Merged
merged 11 commits into from
Feb 22, 2021

Conversation

rijojoseph07
Copy link
Contributor

SUMMARY

Fix for issue #11359, #9406

When using LDAP authentication with presto/trino, the current behavior is just to modify the URL to replace the username which will result in an unauthorized exception. This PR will fix this by updating the connection argument with the effective user.

TEST PLAN

Step 1: Setup database connection to presto without credentials in URL.
Step 2: Provide admin(who can impersonate as any other user in presto) credentials in connection properties via extras

"engine_params": {
           "connect_args":{
              "protocol": "https",
              "username":"<admin>",
              "password":"<admin_password>"
           }
}

Step 3: Enable impersonation for the database connection.
Step 4: Log in with a different user and run the query via SQL lab and you will see the principal user as admin and user as the logged-in user in presto UI.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

@rijojoseph07 rijojoseph07 changed the title fix(wip): Changes to support presto impersionation with ldap fix: Changes to support presto impersionation with ldap Feb 18, 2021
@willbarrett
Copy link
Member

@rijojoseph07 would it be add to add a test duplicating the bug to both verify the fix and prevent future regressions?

@rijojoseph07
Copy link
Contributor Author

@rijojoseph07 would it be add to add a test duplicating the bug to both verify the fix and prevent future regressions?

@willbarrett I am trying to write a unit test for this but struggling to get connect_params from the SQLAlchemy engine object to check if the new principal_username is added.

I have tested this code in our environment and is running in our prod system.

@willbarrett
Copy link
Member

A way forward if it's not possible to get the connect args out of SQLAlchemy would be to mock the create_engine call on line 364 in superset/models/core.py and assert on the arguments passed in. These systems for connecting to many different database types are quite challenging the community to test, so comprehensive unit test coverage is important for quality.

@rijojoseph07
Copy link
Contributor Author

A way forward if it's not possible to get the connect args out of SQLAlchemy would be to mock the create_engine call on line 364 in superset/models/core.py and assert on the arguments passed in. These systems for connecting to many different database types are quite challenging the community to test, so comprehensive unit test coverage is important for quality.

@willbarrett Thanks for your suggestion. I have added a unit test by mocking create_engine.

@codecov-io
Copy link

codecov-io commented Feb 19, 2021

Codecov Report

Merging #13214 (c082af8) into master (2ce7982) will increase coverage by 13.61%.
The diff coverage is 53.82%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #13214       +/-   ##
===========================================
+ Coverage   53.06%   66.67%   +13.61%     
===========================================
  Files         489      493        +4     
  Lines       17314    29168    +11854     
  Branches     4482        0     -4482     
===========================================
+ Hits         9187    19447    +10260     
- Misses       8127     9721     +1594     
Flag Coverage Δ
cypress ?
python 66.67% <53.82%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/constants.py 100.00% <ø> (ø)
superset/examples/birth_names.py 73.19% <ø> (ø)
...43f2fdb_add_granularity_to_charts_where_missing.py 0.00% <0.00%> (ø)
...s/260bf0649a77_migrate_x_dateunit_in_time_range.py 0.00% <0.00%> (ø)
...ons/versions/41ce8799acc3_rename_pie_label_type.py 0.00% <0.00%> (ø)
superset/connectors/sqla/views.py 62.43% <4.34%> (ø)
superset/views/datasource.py 88.70% <16.66%> (ø)
superset/views/core.py 72.91% <66.66%> (ø)
superset/charts/commands/exceptions.py 92.85% <77.77%> (ø)
superset/db_engine_specs/elasticsearch.py 90.24% <87.50%> (ø)
... and 938 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 94893ff...c082af8. Read the comment docs.

@junlincc junlincc added the new:contributor The author is a new contributor label Feb 20, 2021
@rijojoseph07
Copy link
Contributor Author

@willbarrett @villebro Can you guys please review this PR.

Thanks

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @rijojoseph07! I think what we should do is refactor the existing get_configuration_for_impersonation to be more in line with the new method you're proposing - get_configuration_for_impersonation is just mutating the configuration parameter in connect_args, but this seems highly specific to Hive, and it really should be mutating connect_args (the parent). Instead of adding the proposed method, let's rather change the existing logic so that BaseEngineSpec has a method update_connect_args_for_impersonation, which for Hive then just adds the configuration property like before, and for Presto adds the principal_username property.

@pull-request-size pull-request-size bot added size/L and removed size/M labels Feb 22, 2021
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor change request

superset/db_engine_specs/base.py Outdated Show resolved Hide resolved
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few last comments

superset/db_engine_specs/base.py Outdated Show resolved Hide resolved
superset/db_engine_specs/base.py Outdated Show resolved Hide resolved
superset/models/core.py Outdated Show resolved Hide resolved
@villebro villebro changed the title fix: Changes to support presto impersionation with ldap feat(presto): add support for user impersonation Feb 22, 2021
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DRavikanth
Copy link

DRavikanth commented May 3, 2021

@rijojoseph07, I have upgraded to superset v1.1.0 & updated my connection settings & have corresponding impersonation settings in Presto as well. However, when clicking Impersonate logged in User button on the database settings it throws an exception on the UI and nothing is visible in the backend logs. Details below:

  1. Upgraded to v1.1.0
  2. Presto 341 with impersonation rules as follows:
    "impersonation": [ { "originalUser": "admin", "newUser": ".*" } ],
    Above means any user can be impersonated by admin user
  3. Removed username/password from DB Connection string
  4. Updated Extra as follows:
    "connect_args": { "protocol": "https", "username":"admin", "password":"adminPwd", "session_props": { "omit_datetime_type_precision": true }
  5. Check Impersonate Logged In User (Presto & Hive)
  6. Click on Save. UI throws following exception:

An error occurred while fetching databases: "Connection failed, please check your connection settings"

This is very useful feature which I have been waiting for a long time. Any help in debugging this further will be of great help. Also, please let me know if you need any further information to understand the problem.

@rijojoseph07
Copy link
Contributor Author

rijojoseph07 commented May 4, 2021

Hi @DRavikanth, were you able to connect using admin credentials without impersonation? Also it will be helpful if you can run superset in debug mode and share the logs. If possible share the complete connection extras after masking sensitive data.

@DRavikanth
Copy link

DRavikanth commented May 4, 2021 via email

@rijojoseph07
Copy link
Contributor Author

rijojoseph07 commented May 4, 2021

Yes the admin credentials without impersonation works with no issues. I'm running Superset in debug mode and I don't see any logs associated with this. Thanks, Ravi

On Mon, May 3, 2021 at 22:46 rijojoseph07 @.***> wrote: Hi @DRavikanth https://github.com/DRavikanth, were you able to connect using admin credentials without impersonation? Also it will be helpful if you can run superset in debug mode and share the logs. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#13214 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRGFYILCSOHKHUGPSLITI3TL6C2PANCNFSM4X2W3ZIA .

@DRavikanth I just test this feature with superset 1.1.0 and presto 347 and it is working fine. Can you please check if impersonation is happening working as expected from the presto side via cli (this support was recently added to presto so you may have to upgrade presto)?

My connection URL looks like this : presto://presto-dns:443/hive/dev

Extra :

{
"metadata_params": {},
"cost_estimate_enabled": true,
"engine_params": {
"connect_args":{
"protocol": "https",
"username":"admin",
"password":""
}
},
"metadata_cache_timeout": {},
"schemas_allowed_for_csv_upload": ["dev"]
}

I use ranger with presto, and have enabled impersonation from ranger. You can also check presto server logs to see if your request is reaching presto.

@DRavikanth
Copy link

@rijojoseph07 I have exactly what you are testing against. I don't have Ranger. However, I don't think that's an issue. The only difference I see is the Presto version. I am using 341 and from the documentation I see impersonation is supported in that version.

I don't see any request being coming into the presto coordinator logs as well when the exception was initiated. I think this is failing in the UI itself and debug logs of Superset doesn't show anything related to this.

I am looking into the UI Console and see the following:
In Headers section:
Status: 422 Unprocessable Entity

In Response:
message "Connection failed, please check your connection settings"

Request seems to be good though. Any help in debugging this further?

@rijojoseph07
Copy link
Contributor Author

@DRavikanth Can you please confirm if the user which is creating this dataset with impersonation exist is a valid user for presto ?
What I am trying to say is that : lets say you are logged in as supersetadmin and you add this dataset and test the connection, it will add supersetadmin as principal_username and if supersetadmin is not a valid user to presto, this might fail.

@DRavikanth
Copy link

@rijojoseph07, Yes, the user(admin is the user name in this case) creating this connection in Superset does exist in Presto and is enabled with impersonation. Please note that, I am able to login to Presto and execute the queries using this admin user with no issues. Also, please note that, if I uncheck Impersonation option in connection object, everything works as expected i.e the dashboards built on top of this connection will load the dashboards/charts from Presto.

@rijojoseph07
Copy link
Contributor Author

@DRavikanth Can we connect over slack to discuss this in detail? Please join superset slack channel.

@DRavikanth
Copy link

DRavikanth commented May 5, 2021 via email

@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.2.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels new:contributor The author is a new contributor size/L 🚢 1.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants