Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage doc for configuring Nomad OIDC with AWS IAM #23845

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

tunzor
Copy link
Contributor

@tunzor tunzor commented Aug 19, 2024

Reference: JIRA ticket
Preview: Build link

@aimeeu aimeeu added the theme/docs Documentation issues and enhancements label Aug 21, 2024
Copy link
Contributor

@boruszak boruszak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This draft LGTM!

website/content/docs/operations/aws-oidc-provider.mdx Outdated Show resolved Hide resolved
website/content/docs/operations/aws-oidc-provider.mdx Outdated Show resolved Hide resolved
website/content/docs/operations/aws-oidc-provider.mdx Outdated Show resolved Hide resolved
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
website/content/docs/operations/aws-oidc-provider.mdx Outdated Show resolved Hide resolved
website/content/docs/operations/aws-oidc-provider.mdx Outdated Show resolved Hide resolved
```hcl
resource "aws_lb_listener" "example" {
load_balancer_arn = <LB_ARN>
port = "443"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Nomad agents are listening on port 4646, but I don't see how we're mapping port 443 to that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original guide doesn't make any distinction so I tried my best to convert it as is. But looking at the terraform docs, we could either use a redirect to provide a port (but looks like we'd need to list all the hosts) or use the authenticate-oidc action which seems a little too perfect for the use-case. For the oidc action, I think the config would be something like this with client_id, and client_secret coming from AWS and the rest from Nomad but I can't find in the docs what the endpoints would be.

authenticate_oidc {
  authorization_endpoint = "https://example.com/authorization_endpoint"
  client_id              = "client_id"
  client_secret          = "client_secret"
  issuer                 = "https://<NOMAD_CLUSTER_DOMAIN>"
  token_endpoint         = "https://example.com/token_endpoint"
  user_info_endpoint     = "https://example.com/user_info_endpoint"
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure which "original guide" you're referring to here, but I think I also assumed this had been verified end-to-end and not just ported from some other source.

Copy link
Contributor Author

@tunzor tunzor Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the original guide from the support team, it does expect the user to have some knowledge and provides them with higher level steps with the user filling in the details. I'm trying to verify it now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe we should pull in @SuyashHashiCorp to take a look at this doc as well then? If I recall correctly that guide may have also been inspired by work @schmichael did.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really excited to see workload identity federation (WIF) documentation for another cloud provider, but I'm afraid this content probably needs to be moved to https://github.com/hashicorp/tutorials if it is intended to be complementary to our GCP WIF tutorial: https://developer.hashicorp.com/nomad/tutorials/fed-workload-identity/integration-gcp

Source for that is here https://github.com/hashicorp/tutorials/blob/main/content/tutorials/nomad/integration-gcp.mdx and the corresponding jobspecs/terraform are here: https://github.com/hashicorp-education/learn-nomad-workload-identity-federation/tree/main

proxy.nomad.hcl is what handles proxying JWKS requests from the cloud vendor to Nomad's HTTP endpoint. You should be able to use that jobspec for any deployment in the cloud or onprem. The flow in GCP is:

GCP  ---https (443)---> LB ---http (80)---> Proxy ---task api (unix socket)---> Nomad API

...and I assume a similar flow works for any cloud.


If there is a desire to keep this content in docs instead of tutorials, I think we should move the GCP tutorial over here and create a subdirectory under Operations for both. Ideally we'd add more WIF examples in the future. This all feels more like tutorials to me, but I can understand the desire and simplicity of reference docs. If we have any hard and fast rules about what makes something a tutorial vs a doc, I don't know it, so as long as we're consistent I'm ok with WIF content in either place as long as we're consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schmichael that's totally valid. Education is working on a larger initiative to better organize tutorials and docs and we've got some ongoing plans. @boruszak or I can loop you in for better context but the gist is: there are lots of tutorials/docs that could and should be moved around and edited to better fit in their respective places.

This doc was being brought over from the support team's knowledge base to make it more visible and we did intend to shuffle it around as part of the larger edu goal when we get there. 😅

@SuyashHashiCorp
Copy link

Hello Team,

Thank you for your patience and collaboration on this. I’ve thoroughly tested the documentation in my lab environment, and overall, the steps are well-structured and functional. However, I encountered issues with some of the Terraform code blocks while executing the setup.

I have provided detailed comments outlining the specific errors encountered and proposed the corrected Terraform configuration that resolves these issues. The updated code has been rigorously tested and is now fully functional in my environment.

Kindly update the documentation with the revised Terraform code changes, review the modifications, and let me know if any additional adjustments or refinements are required.

@tgross
Copy link
Member

tgross commented Sep 20, 2024

@SuyashHashiCorp I don't see any of your comments here in the PR. Maybe you missed submitting them as a review?

@SuyashHashiCorp
Copy link

Hey Team,
Here are the updates that should be incorporated into the respective sections of the tutorial documentation. Thanks to @tgross for his guidance.

  • Document Section: Create a Hosted Zone
    While running the Terraform code for creating the hosted zone, I encountered the following error during initialization:

$ terraform init
Initializing the backend...

│ Error: Invalid quoted type constraints
│ on variable.tf line 2, in variable "domain_name":
│ 2: type = "string"

│ Terraform 0.11 and earlier required type constraints to be given in quotes, but that form is now deprecated and will be removed in a future version of Terraform. Remove the quotes around "string".

To resolve this, I removed the double quotes from the type parameter as follows:

 variable "domain_name" {
  type    = string
  default = "<DOMAIN_NAME>"
}
  • Document Section: Generate SSL Certificates
    The documentation lacks the code to automate ACM validation using Route 53. Without it, ACM remains in a "Pending Validation" state and never gets issued.

I added an aws_route53_record resource to validate the domain through Route 53. Below is the working Terraform code that resolves this issue:

variable "zone_id" {
  type    = string
  default = "<HOSTED_ZONE_ID>"
}

resource "aws_route53_record" "cert_dns" {
  allow_overwrite = true
  name            = tolist(aws_acm_certificate.example.domain_validation_options)[0].resource_record_name
  records         = [tolist(aws_acm_certificate.example.domain_validation_options)[0].resource_record_value]
  type            = tolist(aws_acm_certificate.example.domain_validation_options)[0].resource_record_type
  zone_id         = var.zone_id
  ttl             = 60
}

resource "aws_acm_certificate_validation" "example" {
  certificate_arn         = aws_acm_certificate.example.arn
  validation_record_fqdns = [aws_route53_record.cert_dns.fqdn]
}
  • Document Section: Create an OIDC Identity Provider
    While testing the OIDC Connect Provider, I encountered an error related to an invalid attribute name:

Error: Invalid attribute name
on oidc.tf line 13, in resource "aws_iam_openid_connect_provider" "nomad":
13: thumbprint_list = [data.tls_certificate.example.certificates.[0].sha1_fingerprint]
​An attribute name is required after a dot.​

To fix this, the correct syntax for accessing the first element in a list should be:
thumbprint_list = [data.tls_certificate.example.certificates.0.sha1_fingerprint]

  • Document Section: Create an IAM Policy for OIDC Federated Users
    During testing of the IAM policy setup, I received multiple errors:

​│ Error: creating IAM Role (s3_all_access_role): operation error IAM: CreateRole, https response error StatusCode: 400, RequestID: c9c76811-0a78-44b3-80c3-6d8ae4431194, MalformedPolicyDocument: Has prohibited field Resource
​│
​│ with aws_iam_role.s3_all_access_role,
​│ on assume_role.tf line 20, in resource "aws_iam_role" "s3_all_access_role":
​│ 20: resource "aws_iam_role" "s3_all_access_role" {
​│ ​
​│ Error: creating IAM Policy (nomad-oidc-policy): operation error IAM: CreatePolicy, https response error StatusCode: 400, RequestID: 370b176e-7a63-49ab-b267-cca64c249e04, MalformedPolicyDocument: Policy document should not specify a principal.​
​│ ​
​│ with aws_iam_policy.policy,​
​│ on assume_role.tf line 44, in resource "aws_iam_policy" "policy":​
​│ Error: Incorrect attribute value type Create an IAM policy for OIDC Federated Users​
​│
​│ on assume_role.tf line 15, in data "aws_iam_policy_document" "assume_role":
​│ 15: values = "aws"​

I corrected these issues by modifying the policy structure. Here is the working Terraform code:

# Variables for OIDC provider and AWS account
variable "oidc_provider" {
  description = "The OIDC provider URL"
  type        = string
  default     = "<DOMAIN_NAME>"
}

variable "aws_account_id" {
  description = "AWS account ID"
  type        = string
  default     = "<AWS_ACCOUNT_ID>"
}

# Assume role policy document for allowing OIDC to assume the role
data "aws_iam_policy_document" "assume_role" {
  statement {
    effect = "Allow"

    principals {
      type        = "Federated"
      identifiers = ["arn:aws:iam::${var.aws_account_id}:oidc-provider/${var.oidc_provider}"] # Use the correct variable for OIDC provider
    }

    actions = ["sts:AssumeRoleWithWebIdentity"]

    condition {
      test     = "StringEquals"
      variable = "${var.oidc_provider}:aud"
      values   = ["aws"]
    }
  }
}

# Create an IAM role with the assume role policy generated above
resource "aws_iam_role" "s3_all_access_role" {
  name               = "s3_all_access_role"
  assume_role_policy = data.aws_iam_policy_document.assume_role.json

  tags = {
    tag-key = "tag-value"
  }
}

# Inline policy that defines what the role can do (full S3 access)
data "aws_iam_policy_document" "s3_access_policy" {
  statement {
    effect = "Allow"

    actions = [
      "s3:*",
      "s3-object-lambda:*"
    ]

    resources = ["*"] # You can scope this down to specific S3 buckets if necessary
  }
}

# Create a policy resource from the inline policy document above
resource "aws_iam_policy" "policy" {
  name        = "nomad-oidc-policy"
  description = "A policy for federated Nomad OIDC"
  policy      = data.aws_iam_policy_document.s3_access_policy.json
}

# Attach the S3 access policy to the IAM role
resource "aws_iam_role_policy_attachment" "test-attach" {
  role       = aws_iam_role.s3_all_access_role.name
  policy_arn = aws_iam_policy.policy.arn
}

Please update the documentation with these corrections. Let me know if you need any further adjustments or clarifications.

Thank you!!

@tunzor
Copy link
Contributor Author

tunzor commented Sep 20, 2024

Thanks @SuyashHashiCorp! I'll get your changes integrated 👍

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

website/content/docs/operations/aws-oidc-provider.mdx Outdated Show resolved Hide resolved
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/docs Documentation issues and enhancements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants