Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support AAD Passthrough for ADLS mounts #63

Closed
stuartleeks opened this issue May 21, 2020 · 12 comments
Closed

[FEATURE] Support AAD Passthrough for ADLS mounts #63

stuartleeks opened this issue May 21, 2020 · 12 comments
Labels
feature New feature or request

Comments

@stuartleeks
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Currently ADLS mounts allow mounts to be created using service princpal details, but for some scenarios we want to be able to provision mounts using AAD Passthrough: https://docs.microsoft.com/en-us/azure/databricks/security/credential-passthrough/adls-passthrough#--mount-azure-data-lake-storage-to-dbfs-using-credential-passthrough

Current ADLS Gen 2 mount resource:

resource "databricks_azure_adls_gen2_mount" "mount" {
	  cluster_id             = ""
	  container_name         = "" 
	  storage_account_name   = ""
	  directory              = ""
	  mount_name             = ""
	  tenant_id              = ""
	  client_id              = ""
	  client_secret_scope    = ""
	  client_secret_key      = ""
	  initialize_file_system = true
}

Describe the solution you'd like
Would like to be able to specify to use AAD Passthrough rather than passing client_id etc

The proposed change to the resource is shown below

Service principal:

resource "databricks_azure_adls_gen2_mount" "mount" {
	  cluster_id             = ""
	  container_name         = "" 
	  storage_account_name   = ""
	  directory              = ""
	  mount_name             = ""
	  initialize_file_system = true
	  mount_type             =  "ServicePrincipal"
	  service_principal {
	  	  tenant_id              = ""
	  	  client_id              = ""
	  	  client_secret_scope    = ""
	  	  client_secret_key      = ""
	  }
}

AAD Passthrough:

resource "databricks_azure_adls_gen2_mount" "mount" {
	  cluster_id             = ""
	  container_name         = "" 
	  storage_account_name   = ""
	  directory              = ""
	  mount_name             = ""
	  initialize_file_system = true
	  mount_type             =  "AADPassthrough"
}
@stuartleeks stuartleeks added the feature New feature or request label May 21, 2020
@stuartleeks
Copy link
Contributor Author

@stikkireddy - this is the suggestion I mentioned to you. Does this seem like a reasonable approach?

Am thinking that the properties on service_principal (e.g. tenant_id) would be marked as required, but the service_principal property itself would be optional with a validation function to check that it is specified when mount_type is set to ServicePrincipal.

@stikkireddy
Copy link
Contributor

@stuartleeks I like the approach, at this point in time I do not have visibility into how many people use this resource to provision mounts so I think it would be best to make these "breaking" changes asap as this is still a minor version of the provider and breaking changes are expected to happen. But the service_principal block seems to be a good approach to make the resource a bit cleaner

@stuartleeks
Copy link
Contributor Author

Cool. Thanks for your response - I'm hoping to work on this tomorrow :-)

@stuartleeks
Copy link
Contributor Author

Hey @stikkireddy

I've coded this up and am trying to test it now. I've set this on the cluster I'm testing the AAD Passthrough mount with:

		spark_conf = {
			"spark.databricks.passthrough.enabled": "true"
		}

When I run the test I get an error: Error: com.databricks.backend.daemon.data.client.adl.AzureCredentialNotFoundException: Could not find ADLS Gen2 Token

I noticed that if I create a cluster via the UI I also get a single_user_name property set in the cluster spec that is set to my email address, but looking at the provider (and API docs!), this isn't a documented property.

Is there something that I'm missing? Any docs of setting up a cluster for passthrough with the API?

Thanks!

@stikkireddy
Copy link
Contributor

Hey @stuartleeks was on PTO till today, had sometime to look at this.

Pass through is generally used in a shared "high concurrency" cluster and it seems that the following spark options are what creates these "high concurrency" clusters. You may need these enabled to enable passthrough or at least test passthrough with adls.

spark.databricks.cluster.profile serverless
spark.databricks.repl.allowedLanguages python,sql
spark.databricks.passthrough.enabled true
spark.databricks.pyspark.enableProcessIsolation true

@stikkireddy
Copy link
Contributor

Standard clusters require "single_user_name" field populated in the cluster model in the post request for the passthrough to work appropriately

@stuartleeks
Copy link
Contributor Author

@stikkireddy - I've added #71 to track adding single_user_name to the cluster resource

@stikkireddy
Copy link
Contributor

@stuartleeks not sure what the status of this is? its closable correct? i believe we merged single_user_name.

@stuartleeks
Copy link
Contributor Author

We hit issues automating mounts for AAD Pass through when running under a service principal account.
We switched to using direct ADLS access in this area of our project so the need for this feature went away.
If it is possible to use a service principal account to set up a mount configured for AAD pass through then we may be able to get this fixed in a PR.

@nfx
Copy link
Contributor

nfx commented Sep 27, 2020

@stuartleeks is this one still relevant? Will close if it's not.

@kyrios
Copy link

kyrios commented Jun 6, 2021

Hi, I stumbled upon this issue. We'd like to use the feature. Is there any workaround on how to make passthrough auth work? Can the Feature Request be re-opened?

@alexott
Copy link
Contributor

alexott commented Jun 7, 2021

@kyrios it will be fixed as part of #497 - you can already look into generic-mounts branch - it has example of how to mount with passthrough

@databricks databricks locked and limited conversation to collaborators Jun 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants