Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(aws): Improving AWS EC2 instance types API integration and caching, feat(aws): Adding archi type to images API and caching #5609

Merged
merged 4 commits into from
Aug 24, 2022

Conversation

pdk27
Copy link
Contributor

@pdk27 pdk27 commented Dec 21, 2021

Changes:

  • Improving AWS EC2 instance types API integration and caching:
    • Replaced AWS EC2 pricing docs integration with AWS EC2 describe-instance-types API. Pricing docs precedes the API. Instance types retrieved via API are up-to-date and include other associated metadata.
    • Added instance type information to the cache.
  • Adding architecture type to images API and caching

Clouddriver API response before-after:

Before:

After:

Use cases / reasoning for the changes:

Instructions to be included in release (preview) notes:

@pdk27
Copy link
Contributor Author

pdk27 commented Dec 21, 2021

Reviewers, please also review the Instructions to be included in release (preview) notes in PR overview.

@pdk27 pdk27 requested a review from cfieber December 21, 2021 22:02
@mattgogerly
Copy link
Member

Haven't had a chance to look at the code yet, but for the release note - not everybody deploys Spinnaker using Halyard. Are there instructions for other installation methods (i.e. what is that flag actually doing?)

@pdk27
Copy link
Contributor Author

pdk27 commented Dec 23, 2021

Haven't had a chance to look at the code yet, but for the release note - not everybody deploys Spinnaker using Halyard. Are there instructions for other installation methods (i.e. what is that flag actually doing?)

Thanks for the callout @mattgogerly. That is where I need help :)
I believe hal deploy apply --flush-infrastructure-caches refreshes the caches. Someone familiar with it can add more details. I will post about this PR and ask for help in the Slack channel too.

@pdk27 pdk27 force-pushed the aws-ec2-instance-types-integration branch from 7a4cbbb to bfa9ce6 Compare January 10, 2022 17:40
@pdk27 pdk27 requested review from dbyron-sf and removed request for ajordens and aravindmd January 11, 2022 16:41
@mattgogerly
Copy link
Member

Haven't had a chance to look at the code yet, but for the release note - not everybody deploys Spinnaker using Halyard. Are there instructions for other installation methods (i.e. what is that flag actually doing?)

Thanks for the callout @mattgogerly. That is where I need help :) I believe hal deploy apply --flush-infrastructure-caches refreshes the caches. Someone familiar with it can add more details. I will post about this PR and ask for help in the Slack channel too.

All I can find is that it wipes the Redis instance, but that would only work if you're using the built in Redis. Why is it necessary to do that?

attributes.put("account", account.getName());
attributes.put("region", region);
attributes.put("name", i.getInstanceType());
attributes.put("defaultVCpus", i.getVCpuInfo().getDefaultVCpus());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can .getVCpuInfo() or getMemoryInfo() ever return null? There's a null check for getInstanceStorageInfo() below so wanted to be sure these were different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getVCpuInfo() and getMemoryInfo() can never be null. getInstanceStorageInfo() can return null for instance types that don't support instance storage, and support EBS-only for example. Here are some examples - https://aws.amazon.com/ec2/instance-types/

}

if (i.getNetworkInfo() != null) {
attributes.put("ipv6Supported", i.getNetworkInfo().getIpv6Supported());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to have this attribute not be optional? It's always a bit clunky consuming these values from the map when you have to do something like if (attributes.get("key") != null && attributes.get("key") == true) or equivalent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The i - InstanceTypeInfo here is a result object returned from the AWS API DescribeInstanceTypes and networkInfo is an optional parameter. Also, not all instance types support IPv6, hence, the null check. Hope this adds clarification.

@link108
Copy link
Member

link108 commented Mar 9, 2022

Hi @pdk27, just wanted to follow up on this PR, I saw the clouddriver feature was merged (#5610) and want to ensure this gets in as well 👍

@pdk27
Copy link
Contributor Author

pdk27 commented Mar 28, 2022

Hi @pdk27, just wanted to follow up on this PR, I saw the clouddriver feature was merged (#5610) and want to ensure this gets in as well 👍

Thanks for the heads-up @link108. Will address the comments here soon.

@pdk27
Copy link
Contributor Author

pdk27 commented Apr 4, 2022

Haven't had a chance to look at the code yet, but for the release note - not everybody deploys Spinnaker using Halyard. Are there instructions for other installation methods (i.e. what is that flag actually doing?)

Thanks for the callout @mattgogerly. That is where I need help :) I believe hal deploy apply --flush-infrastructure-caches refreshes the caches. Someone familiar with it can add more details. I will post about this PR and ask for help in the Slack channel too.

All I can find is that it wipes the Redis instance, but that would only work if you're using the built in Redis. Why is it necessary to do that?

@mattgogerly
Its needed to refresh the cache, and to be able to use the features in this PR.

@mattgogerly
Copy link
Member

Haven't had a chance to look at the code yet, but for the release note - not everybody deploys Spinnaker using Halyard. Are there instructions for other installation methods (i.e. what is that flag actually doing?)

Thanks for the callout @mattgogerly. That is where I need help :) I believe hal deploy apply --flush-infrastructure-caches refreshes the caches. Someone familiar with it can add more details. I will post about this PR and ask for help in the Slack channel too.

All I can find is that it wipes the Redis instance, but that would only work if you're using the built in Redis. Why is it necessary to do that?

@mattgogerly Its needed to refresh the cache, and to be able to use the features in this PR.

Does the cache not get updated by normal clouddriver caching? The equivalent to the hal command for SQL/non-Halyard would be to wipe the database, at least of AWS data, which isn't ideal operationally.

@pdk27
Copy link
Contributor Author

pdk27 commented Apr 5, 2022

Haven't had a chance to look at the code yet, but for the release note - not everybody deploys Spinnaker using Halyard. Are there instructions for other installation methods (i.e. what is that flag actually doing?)

Thanks for the callout @mattgogerly. That is where I need help :) I believe hal deploy apply --flush-infrastructure-caches refreshes the caches. Someone familiar with it can add more details. I will post about this PR and ask for help in the Slack channel too.

All I can find is that it wipes the Redis instance, but that would only work if you're using the built in Redis. Why is it necessary to do that?

@mattgogerly Its needed to refresh the cache, and to be able to use the features in this PR.

Does the cache not get updated by normal clouddriver caching? The equivalent to the hal command for SQL/non-Halyard would be to wipe the database, at least of AWS data, which isn't ideal operationally.

Gotcha! That doesn't sound ideal. As part of testing these changes, I couldn't see the new cache fields without flushing caches in my setup. Any ideas on how to confirm if flushing caches can be avoided? I will try a few things on my setup and update here.

@pdk27
Copy link
Contributor Author

pdk27 commented Apr 5, 2022

Haven't had a chance to look at the code yet, but for the release note - not everybody deploys Spinnaker using Halyard. Are there instructions for other installation methods (i.e. what is that flag actually doing?)

Thanks for the callout @mattgogerly. That is where I need help :) I believe hal deploy apply --flush-infrastructure-caches refreshes the caches. Someone familiar with it can add more details. I will post about this PR and ask for help in the Slack channel too.

All I can find is that it wipes the Redis instance, but that would only work if you're using the built in Redis. Why is it necessary to do that?

@mattgogerly Its needed to refresh the cache, and to be able to use the features in this PR.

Does the cache not get updated by normal clouddriver caching? The equivalent to the hal command for SQL/non-Halyard would be to wipe the database, at least of AWS data, which isn't ideal operationally.

Gotcha! That doesn't sound ideal. As part of testing these changes, I couldn't see the new cache fields without flushing caches in my setup. Any ideas on how to confirm if flushing caches can be avoided? I will try a few things on my setup and update here.

I was able to verify in my setup that flushing caches is required in order to see the changes to the cached data. May be that is too extreme? Do you know if there is a softer option like a refresh? Will reach on Slack again to see if anyone has a solution - https://spinnakerteam.slack.com/archives/C091CCWRJ/p1649183109750109

Without running hal deploy apply --flush-infrastructure-caches, the images API response just didn't include the new architecture field and the response looked identical to one from before the change.
I think the question here is related to cache refresh instructions (no matter what the change to the cache is / how cache is setup) and it doesn't need to block this PR from getting merged. I can followup and a note to release notes. Do you agree?

@pdk27 pdk27 force-pushed the aws-ec2-instance-types-integration branch from 7073938 to 1d3b82e Compare April 5, 2022 18:50
@mattgogerly
Copy link
Member

@pdk27

Hm, that seems weird (or a bug?) Tagging @german-muzquiz who might be more familiar with cache data schema updates

@german-muzquiz
Copy link
Contributor

I'm not too familiar with the AWS provider, but as a general principle the cache in clouddriver should be able to automatically rebuild itself if the underlying datastore is deleted (sql or redis), which according to the comments is working fine.

Then what I'm hearing is that the cache for instance types only gets populated once and then never gets updated. Probably that was expected in the beginning since instance types is not something that changes regularly, but then I think ideally this PR needs to automatically refresh that cache. If there are no caching agents for instance types, maybe refresh it upon startup or use something like last_modified fields to know that it should be refreshed?

@pdk27
Copy link
Contributor Author

pdk27 commented Apr 25, 2022

I'm not too familiar with the AWS provider, but as a general principle the cache in clouddriver should be able to automatically rebuild itself if the underlying datastore is deleted (sql or redis), which according to the comments is working fine.

Then what I'm hearing is that the cache for instance types only gets populated once and then never gets updated. Probably that was expected in the beginning since instance types is not something that changes regularly, but then I think ideally this PR needs to automatically refresh that cache. If there are no caching agents for instance types, maybe refresh it upon startup or use something like last_modified fields to know that it should be refreshed?

Thanks for the pointer @german-muzquiz. I will dig into this further and get back to you.

@pdk27
Copy link
Contributor Author

pdk27 commented Apr 27, 2022

I'm not too familiar with the AWS provider, but as a general principle the cache in clouddriver should be able to automatically rebuild itself if the underlying datastore is deleted (sql or redis), which according to the comments is working fine.
Then what I'm hearing is that the cache for instance types only gets populated once and then never gets updated. Probably that was expected in the beginning since instance types is not something that changes regularly, but then I think ideally this PR needs to automatically refresh that cache. If there are no caching agents for instance types, maybe refresh it upon startup or use something like last_modified fields to know that it should be refreshed?

Thanks for the pointer @german-muzquiz. I will dig into this further and get back to you.

@german-muzquiz
Just to give you a brief context, there are 2 API related changes in this PR: instance type API, and images API

Within images API, there are 2 kinds:
(1) find by name like http://localhost:8084/images/find?q=hello*
(2) find by ID like http://localhost:8084/images/find?q=ami-23751835
The change in this PR includes adding a new attribute architecture to both the images APIs. I can see the new field in the API response for (2) without flushing the caches, but not in the API response for (1) which comes from namedImages, built in ImageCachingAgent. Flushing caches is non-ideal for certain types of cache setups as @mattgogerly pointed out that flushing caches means wiping the SQL/ non-Halyard database.

My observations:

  • Flushing infrastructure caches via hal deploy apply --service-names clouddriver --flush-infrastructure-caches is the only way I could get the public images in the caches to reload with the new architecture field. Re-deploying clouddriver didn't help.

Some questions we need help with:

  • Is this a problem in all types of cache setups? (I could verify the behavior only for Halyard + Redis cache)
    May be there are other ways to reload images cache in other setups, without the need for wiping the whole cache?
  • How are new attributes added to cached data in general?
  • With the given findings, how do we proceed?

@german-muzquiz
Copy link
Contributor

@pdk27 I don't see anything wrong in the code leading to the issue of the architecture not showing up in http://localhost:8084/images/find?q=hello without flushing caches, the ImageCachingAgent should be rebuilding the cache each time for both named images and ami images.

My suggestion is to investigate as follows:

  • Start with an empty redis
  • Run clouddriver without your changes, so that images are initially cached without the architecture field.
  • Verify in redis that images don't have the architecture field. This can be done through redis-cli with a command like this:
mget "com.netflix.spinnaker.clouddriver.aws.provider.AwsProvider:namedImages:attributes:aws:namedImages:{account}:{image name}"

Where you replace {account} and {image name} for their respective values.

  • Verify that the request to clouddriver http://localhost:7002/aws/images/find?q={image name} doesn't include the architecture field
  • Stop clouddriver and then run it with your changes, without flushing the caches
  • Check again redis, this time the architecture field should have been stored
  • Check again the response of clouddriver REST API, this time the architecture field should be included

The best approach is to not having to delete the cache (redis or sql) in order for this change to work, because the images cache is rebuilt every 30 seconds anyway.

@pdk27
Copy link
Contributor Author

pdk27 commented Aug 22, 2022

Apologies for the delay following-up on this.

@mattgogerly @dbyron-sf I tried the instructions provided by @german-muzquiz and it worked. i.e. I didn't need to flush infrastructure caches to see the architecture type in response.

There is no other pending action from my side. Can you please help me identify the right reviewers for this PR?

@pdk27
Copy link
Contributor Author

pdk27 commented Aug 24, 2022

Apologies for the delay following-up on this.

@mattgogerly @dbyron-sf I tried the instructions provided by @german-muzquiz and it worked. i.e. I didn't need to flush infrastructure caches to see the architecture type in response.

There is no other pending action from my side. Can you please help me identify the right reviewers for this PR?

Here is a demo of the testing done:

Spinnaker-imagesApi-demo-reduced.mov

@pdk27
Copy link
Contributor Author

pdk27 commented Aug 24, 2022

@dbyron-sf ✅ Updated release preview PR.

@dbyron-sf dbyron-sf added the ready to merge Approved and ready for a merge label Aug 24, 2022
@mergify mergify bot added the auto merged Merged automatically by a bot label Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto merged Merged automatically by a bot ready to merge Approved and ready for a merge target-release/1.29
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants