Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Azure Fusion env misses credentials when no key or SAS provided #5287

Closed

Conversation

adamrtalbot
Copy link
Collaborator

@adamrtalbot adamrtalbot commented Sep 5, 2024

  • Fix: Azure Fusion missing authentication if accountKey is provided
  • Catch when service principal exists but no keys or SAS

Changes logic of Azure Environment set up:

1. Is there an account name? (no: error)
2. Is there an accountKey or an accountSas or Managed identity? (no: error)
3. If there is a managed identity
    yes: return
    no: Create or add a SAS -> return

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Copy link

netlify bot commented Sep 5, 2024

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit f93e25b
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/66f3206eb370e8000847153f

Comment on lines 69 to 71
def mockStorageObject = Mock(Object) {
getOrCreateSasToken() >> 'generatedSasToken'
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pditommaso o great Spock wizard, why does it not mock the getOrCreateSasToken method properly here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be Mock(AzStorageOpts)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried that and I get the same result, env.AZURE_STORAGE_SAS_TOKEN is sv=2024-05-04&ss=bf&srt=sco&se=2024-09-07T11%3A24%3A38Z&sp=rwdlacut&sig=SIGNATURE%3D

// If a Managed Identity or Service Principal is configured, Fusion only needs to know the account name
if (cfg.managedIdentity().isConfigured() || cfg.activeDirectory().isConfigured()) {
// If a Managed Identity is configured, Fusion only needs to know the account name
if (cfg.managedIdentity().isConfigured()) {
return result
}

// If a SAS token is configured, instead, Fusion also requires the token value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamrtalbot Could we please update this comment to something like:

        // Otherwise, Fusion also requires a SAS Token
        // (yes, even if `cfg.storage().sasToken` has not been explicitly defined)

so that this doesn't happen again?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this:

// If Fusion does not use a managed identity, get or create a SAS token for Fusion to use

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not much different from what the code itself already tells. I would prefer to have a explicit comment telling us why it's needed despite the configuration not requesting it explicitly.

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
…Env.groovy

Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
@pditommaso
Copy link
Member

@alberto-miranda
Copy link
Contributor

I tested this patch interactively and it fails when running nextflow from an Azure VM using Service Principal authentication:

java.lang.NullPointerException: 'accountKey' cannot be null.
        at java.base/java.util.Objects.requireNonNull(Objects.java:235)
        at com.azure.storage.common.StorageSharedKeyCredential.<init>(StorageSharedKeyCredent
ial.java:57)
        at nextflow.cloud.azure.batch.AzHelper.memoizedMethodPriv$getOrCreateBlobServiceWithK
eyStringString(AzHelper.groovy:181)
        at nextflow.cloud.azure.batch.AzHelper.access$0(AzHelper.groovy)
        at nextflow.cloud.azure.batch.AzHelper$__clinit__closure1.doCall(AzHelper.groovy)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccesso
rImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMetho
dAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:343)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:328)
        at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaCla
ss.java:279)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1007)
        at groovy.lang.Closure.call(Closure.java:433)
        at org.codehaus.groovy.runtime.memoize.Memoize$MemoizeFunction.lambda$call$0(Memoize.
java:137)
        at org.codehaus.groovy.runtime.memoize.ConcurrentCommonCache.getAndPut(ConcurrentComm
onCache.java:137)
        at org.codehaus.groovy.runtime.memoize.ConcurrentCommonCache.getAndPut(ConcurrentComm
onCache.java:113)
        at org.codehaus.groovy.runtime.memoize.Memoize$MemoizeFunction.call(Memoize.java:136)
        at nextflow.cloud.azure.batch.AzHelper.getOrCreateBlobServiceWithKey(AzHelper.groovy)
        at nextflow.cloud.azure.batch.AzHelper.generateAccountSas(AzHelper.groovy:173)
        at nextflow.cloud.azure.config.AzStorageOpts.getOrCreateSasToken(AzStorageOpts.groovy
:68)
[...]

The reason is that the current implementation of getOrCreateSasToken() needs accountKey to be defined to generate the SAS token string, and there's no accountKey when using a Service Principal. I checked the Azure SDK and it seems to be possible to generate SAS tokens with any method of authentication (i.e not just a Shared Access Key) so it should suffice to extend the getOrCreateSasToken() to cover the Managed Identity and Service Principal cases.

I'm currently working on a fix to test this theory.

…method

Signed-off-by: Alberto Miranda <alberto.miranda@seqera.io>
@alberto-miranda
Copy link
Contributor

alberto-miranda commented Sep 6, 2024

Ok, I think I finally got it working. I just pushed a fix that works for all the interactive tests I could think of using an Azure VM. @pditommaso @adamrtalbot could you please test it with your workflows and see if I finally got it right? 🥺

@pditommaso
Copy link
Member

Is this PR still valid following the outcome of the discussion on the related Fusion thread?

@adamrtalbot
Copy link
Collaborator Author

Is this PR still valid following the outcome of the discussion on the related Fusion thread?

Yes - we need to make sure the workers have a SAS key when using a managed identity. I believe these changes make Nextflow always generate a storage account SAS, even when using a Managed Identity, is that correct @alberto-miranda?

Currently using a Managed Identity + Fusion will fail.

@alberto-miranda
Copy link
Contributor

Is this PR still valid following the outcome of the discussion on the related Fusion thread?

Yes - we need to make sure the workers have a SAS key when using a managed identity. I believe these changes make Nextflow always generate a storage account SAS, even when using a Managed Identity, is that correct @alberto-miranda?

Currently using a Managed Identity + Fusion will fail.

It is indeed still valid. In fact, it doesn't make sense to proceed with the implementation mentioned in https://github.com/seqeralabs/fusion/issues/526 unless the current PR is confirmed to work. Did this fix pass all the failing tests?

@pditommaso
Copy link
Member

Integration tests looks OK in this branch

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@pditommaso
Copy link
Member

I tried to run the e2e tests by adding [platform stage] comment but it didn't run

if: ${{ contains(github.event.head_commit.message, '[platform prod]') || contains(github.event.head_commit.message, '[platform stage]') }}

Any chance you can look at that?

@ewels
Copy link
Member

ewels commented Sep 23, 2024

Any chance you can look at that?

@pditommaso who is this to? @alberto-miranda ?

@alberto-miranda
Copy link
Contributor

Any chance you can look at that?

@pditommaso who is this to? @alberto-miranda ?

Hopefully not, because I have no idea how that works 😅

@adamrtalbot
Copy link
Collaborator Author

Tested with:

  • Keys
  • Service Principal
  • Managed Identity

All worked with and without Fusion. Will dig into it a bit more too see if any edge cases pop up it's acceptable to me now.

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
@adamrtalbot
Copy link
Collaborator Author

adamrtalbot commented Sep 23, 2024

I tried to run the e2e tests by adding [platform stage] comment but it didn't run

if: ${{ contains(github.event.head_commit.message, '[platform prod]') || contains(github.event.head_commit.message, '[platform stage]') }}

Any chance you can look at that?

Tried again with 3139dae. See https://github.com/nextflow-io/nextflow/actions/runs/10992221379/job/30516198828?pr=5287 for an example workflow run 🤞

@pditommaso
Copy link
Member

Still not working, not understanding why

@adamrtalbot
Copy link
Collaborator Author

@adamrtalbot
Copy link
Collaborator Author

I would switch it to be a separate workflow, triggered on Github comment containing [test platform prod] or [test platform stage], then add an automatic PR comment to the end of a successful test workflow. That way you get a daisy chained workflow but you can also manually kick it off for a PR.

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
@adamrtalbot
Copy link
Collaborator Author

Here's a test run with the if-statement disabled. Gonna revert that commit now.

https://github.com/nextflow-io/nextflow/actions/runs/10993010551/job/30518841423?pr=5287

This reverts commit cab3ffe.

Signed-off-by: adamrtalbot <12817534+adamrtalbot@users.noreply.github.com>
@pditommaso
Copy link
Member

I would switch it to be a separate workflow

That make sense, any chance to give it a try?

@adamrtalbot
Copy link
Collaborator Author

I would switch it to be a separate workflow

That make sense, any chance to give it a try?

I can but can we do a separate PR so we don't block this one?

@pditommaso
Copy link
Member

Yes, sure

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@pditommaso
Copy link
Member

It looks tests are not running because is an external PR. Closing in favour of #5328

@pditommaso pditommaso closed this Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Azure storage authentication errors in 24.08.0-edge with Fusion enabled
5 participants