Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authentication: Enable user impersonation for Superset to HiveServer2 using hive.server2.proxy.user (a.fernandez) #3652

Merged
merged 1 commit into from
Oct 17, 2017

Conversation

afernandez
Copy link
Contributor

@afernandez afernandez commented Oct 12, 2017

Superset today has a config for impersonation when creating/editing a datasource.
When used with Presto, it actually creates a connection on behalf of the logged on user.
For Hive, we instead want to connect as the superuser (superset service account) but use the hive.server2.proxy.user property in the URI to enable impersonation.

Unit Tests passed.

cc @timifasubaa @mistercrunch

@afernandez afernandez changed the title DI-1113. Authentication: Enable user impersonation for Superset to HiveServer2 using hive.server2.proxy.user (a.fernandez) Authentication: Enable user impersonation for Superset to HiveServer2 using hive.server2.proxy.user (a.fernandez) Oct 12, 2017
@afernandez
Copy link
Contributor Author

Need to fix unit tests

@@ -184,6 +186,28 @@ def select_star(cls, my_db, table_name, schema=None, limit=100,
sql = sqlparse.format(sql, reindent=True)
return sql

@classmethod
def modify_url_for_impersonation(cls, url, impersonate_user, username):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base class has methods for how to modify a URI and URL object for impersonation

url.query["hive_server2_proxy_user"] = username

@classmethod
def get_uri_for_impersonation(cls, uri, impersonate_user, username):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python Engine Spec overrides the methods for how to modify a URI and URL object for impersonation

self.sqlalchemy_uri = str(conn) # hides the password

def get_effective_user(self, url, user_name=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to its own method

elif hasattr(g, 'user') and g.user.username:
effective_username = g.user.username
return effective_username

def get_sqla_engine(self, schema=None, nullpool=False, user_name=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed uri to url since it was of type SQLAlchemy.URL

self.db_engine_spec.modify_url_for_impersonation(url, self.impersonate_user, effective_username)

masked_url = self.get_password_masked_url(url)
logging.info("Database.get_sqla_engine(). Masked URL: {0}".format(masked_url))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important to mask the URL while logging since it may contain a password

@@ -168,6 +168,7 @@ def handle_error(msg):
session.merge(query)
session.commit()
logging.info("Set query to 'running'")
conn = None
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some code paths, conn was not defined

uri = db_engine.get_uri_for_impersonation(uri, impersonate_user, username)
masked_url = database.get_password_masked_url_from_uri(uri)

logging.info("Superset.testconn(). Masked URL: {0}".format(masked_url))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important to mask the URL while logging since it may contain a password

@@ -282,7 +282,8 @@ def test_testconn(self):
# validate that the endpoint works with the password-masked sqlalchemy uri
data = json.dumps({
'uri': database.safe_sqlalchemy_uri(),
'name': 'main'
'name': 'main',
'impersonate_user': False
Copy link
Contributor Author

@afernandez afernandez Oct 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit test is failing since g.user.username is missing. Will fix soon.

Copy link
Contributor Author

@afernandez afernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial annotations

@afernandez
Copy link
Contributor Author

Unit tests passed,

Ran 168 tests in 268.075s

OK
Submitting coverage to coveralls.io...
Coverage submitted!
Job ##8777.1
https://coveralls.io/jobs/30207773

@coveralls
Copy link

coveralls commented Oct 12, 2017

Coverage Status

Coverage increased (+0.08%) to 70.191% when pulling 4e9263e804d4297a94edf81aaeaf1b1f39cfd04d on afernandez:afernandez_impersonate into 52a9f27 on apache:master.

Copy link
Member

@mistercrunch mistercrunch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlighted a few minor things but LGTM otherwise. Please lint according to our .pylinrc. We used to have automation with Landscape.io but switched it off since we move the repo to Apache...

:param impersonate_user: Bool indicating if impersonation is enabled
:param username: Effective username
"""
if impersonate_user and username is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: I'd go without the is not None as None evals to False

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

backend_name = url.get_backend_name()

# Must be Hive connection, enable impersonation, and set param auth=LDAP|KERBEROS
if backend_name == "hive" and "auth" in url.query.keys() and impersonate_user is True and username is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some very long lines here. PEP8 says 80, our pylint say 90. If you want to lint your PR only you can git diff master... | flake8 --diff thought that's flake8 not pylint. There's also git-lint which can lint your diff as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

:param impersonate_user: Bool indicating if impersonation is enabled
:param username: Effective username
"""
if impersonate_user is True and "auth" in url.query.keys() and username is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: -is not None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

…veServer2 using hive.server2.proxy.user (a.fernandez)
@coveralls
Copy link

coveralls commented Oct 13, 2017

Coverage Status

Coverage decreased (-0.02%) to 70.364% when pulling f310817 on afernandez:afernandez_impersonate into bad6938 on apache:master.

@afernandez
Copy link
Contributor Author

@mistercrunch would you be able to re-review these changes? Does the decrease in coveralls mean I need to add unit tests?

@mistercrunch mistercrunch merged commit adef519 into apache:master Oct 17, 2017
@afernandez
Copy link
Contributor Author

Thanks for reviewing. I figured out how to do it without needing changes to PyHive. Will submit another PR soon.
@mistercrunch

michellethomas pushed a commit to michellethomas/panoramix that referenced this pull request May 24, 2018
…veServer2 using hive.server2.proxy.user (a.fernandez) (apache#3652)
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.20.5 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.20.5
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants