Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insufficient error handling when Control Tower fails to create an account #456

Open
evan10s opened this issue May 9, 2024 · 1 comment
Labels
bug Something isn't working pending investigation Issue needs further investigation

Comments

@evan10s
Copy link

evan10s commented May 9, 2024

Terraform Version & Prov:
Terraform 1.5.5, open-source

AFT Version: 1.12.1

Terraform Version & Provider Versions
Please provide the outputs of terraform version and terraform providers from within your AFT environment

I can provide these if needed, but leaving them out because it's non-trivial to run the AFT terraform locally, and I don't think this issue is related to Terraform.

Bug Description
High-level: Sometimes, an AFT request for an account is syntactically correct, but Control Tower fails to actually create the account. Today, we had an account request where this happened, but the only error we got from AFT was about a failed call to DescribeAccount, even though the true error was from Control Tower and visible in Service Catalog in our management account.

More specifically: The aft-invoke-aft-account-provisioning-framework Lambda does not proactively catch when the CreateManagedAccount call failed in Control Tower, and as a result the event data sent to the Lambda lists the account ID as Not Available. aft-invoke-aft-account-provisioning-framework then tries to call AWS Organization's DescribeAccount method with the account ID Not Available, which causes boto3 to throw an exception (An error occurred (InvalidInputException) when calling the DescribeAccount operation: You provided a string that exceeds that maximum length.).

This does cause a failure to propagate in AFT, which is good, but proactively catching and erroring with a more descriptive message than the DescribeAccount error would make this easier to debug. Even saying that the account creation request failed would have sent my debugging in a more productive direction.

To Reproduce
Steps to reproduce the behavior:

  1. Create an AFT account request where the account email is already used as the root account email for another AWS account.
  2. Commit the change so AFT's pipelines will apply the Terraform and AFT will try to create the account.
  3. The aft-failure-notifications SNS topic should receive a message with the DescribeAccount error mentioned above.

Expected behavior
If the account fails to create in Control Tower, AFT should error with that information rather than trying to continue with an invalid account ID.

  • If I have to trace through the entire AFT pipeline step by step, it makes debugging AFT failures take longer than it should.

Related Logs
I think I shared everything that's relevant elsewhere, can grab more logs if that would be useful.

Additional context
We had an AFT account request fail on initial creation due to this error, which I found in Service Catalog's provisioned products list:

AWS Control Tower cannot create an account using email user@company.com because an AWS account with that email already exists, but it is not part of your AWS Control Tower organization.

However, the error sent from the AFT failures SNS topic just had this less useful error:

AFT account request failed

An error occurred in the 'aft-invoke-aft-account-provisioning-framework' Lambda function.
For more information, search AWS Request ID 'c3e55225-a997-47a6-b3d7-6be2e2eea65d' in CloudWatch log group '/aws/lambda/aft-invoke-aft-account-provisioning-framework'
Error Message: An error occurred (InvalidInputException) when calling the DescribeAccount operation: You provided a string that exceeds that maximum length.

After looking at the source code for the aft-invoke-aft-account-provisioning-framework Lambda, I found that it calls the DescribeAccount operation on a Control Tower event, but I couldn't figure out what the actual problematic event content was.

Tracing back, I noted that the aft-controltower-event-logger EventBridge rule triggers the aft-invoke-aft-account-provisioning-framework Lambda. I went to look at the EventBridge rule and saw it also triggers the aft-controltower-event-logger Lambda, which I noted writes to the aft-controltower-events Dynamo table, so I went to that table. Finally, I found the problematic event:

Partially redacted event JSON

{
 "id": "99f09435-b193-e673-a50e-bf61ee0fa086",
 "time": "2024-05-09T19:45:19Z",
 "account": "redacted",
 "detail": {
  "awsRegion": "us-east-1",
  "eventCategory": "Management",
  "eventID": "b2e5ef71-f7cc-46f9-b0fe-597e36413ebb",
  "eventName": "CreateManagedAccount",
  "eventSource": "controltower.amazonaws.com",
  "eventTime": "2024-05-09T19:45:19Z",
  "eventType": "AwsServiceEvent",
  "eventVersion": "1.08",
  "managementEvent": true,
  "readOnly": false,
  "recipientAccountId": "redacted",
  "requestParameters": null,
  "responseElements": null,
  "serviceEventDetails": {
   "createManagedAccountStatus": {
    "account": {
     "accountId": "Not Available",
     "accountName": "redacted-sandbox"
    },
    "completedTimestamp": "2024-05-09T19:45:19+0000",
    "message": "AWS Control Tower failed to create an enrolled account.",
    "organizationalUnit": {
     "organizationalUnitId": "Not Available",
     "organizationalUnitName": "Sandbox"
    },
    "requestedTimestamp": "2024-05-09T19:44:54+0000",
    "state": "FAILED"
   }
  },
  "sourceIPAddress": "AWS Internal",
  "userAgent": "AWS Internal",
  "userIdentity": {
   "accountId": "redacted",
   "invokedBy": "AWS Internal"
  }
 },
 "detail-type": "AWS Service Event via CloudTrail",
 "region": "us-east-1",
 "resources": [
 ],
 "source": "aws.controltower",
 "version": "0"
}

@evan10s evan10s added bug Something isn't working pending investigation Issue needs further investigation labels May 9, 2024
@snebhu3
Copy link
Collaborator

snebhu3 commented Jun 21, 2024

@evan10s thank you for reporting this.
I will create an internal backlog to evaluate this request further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pending investigation Issue needs further investigation
Projects
None yet
Development

No branches or pull requests

2 participants