Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert VMs to machine images after they have been stopped for a while #53

Open
Kalmalyzer opened this issue Sep 4, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@Kalmalyzer
Copy link
Contributor

Google has support for machine images in Beta.

This allows for snapshotting an instance, with its disks and all its metadata, and storing that cheaper than having running disks. The instance can later be recreated including labels etc from the machine image.

time gcloud beta compute machine-images create build-linux-static --source-instance build-linux-static
	-> 29s
time gcloud beta compute machine-images create build-win64-static --source-instance build-win64-static
	-> 24s

time gcloud beta compute instances create build-linux-static --source-machine-image=build-linux-static
time gcloud beta compute instances create build-win64-static --source-machine-image=build-win64-static

	took 1 minute, goes instantly to "started" state (probably possible to control with various options)

We should do a bit of math on the cost ($0.05/GB&month) for a machine image. It's the same cost as for custom images; is it the same as for snapshots? Is it only for the used portion of the disk? How does this change pricing for a build machine that is active, say, 3h/day?

@Kalmalyzer Kalmalyzer added the enhancement New feature or request label Sep 4, 2021
@Kalmalyzer
Copy link
Contributor Author

Also -- are the above times affected by disk size? Are they affected by the amount of unique data on the disk? (There is some form of de-duplication between snapshots happening in the GCE backend)

@Kalmalyzer
Copy link
Contributor Author

With a 600g pd-balanced disk...

time gcloud beta compute machine-images create build-linux-static --source-instance build-linux-static
with 111G random data
-> 214s
with 511G random data
-> 225s

time gcloud beta compute instances create test --source-machine-image=test-with-random-511g
-> 61s

Not sure why the previous test showed ~30 second image creation times and this is at ~220s.

I wonder if there's a difference when using a pd-ssd?

@Kalmalyzer
Copy link
Contributor Author

WIth a 600g pd-ssd disk...

create machine image from 600g pd-ssd disk with 511g random data:
210s

create 600g pd-ssd disk from machine image with 511g random data:
72s

@Kalmalyzer
Copy link
Contributor Author

Kalmalyzer commented Sep 12, 2021

So the statistics looks like this:

  • Disk type probably has no impact on image creation / disk creation times
  • Create machine image from disk probably takes 3-4 mins in our normal situations
  • Create disk from machine image probably takes 70 seconds in our normal situations

Based on this,

  • Having a second persistence step where we convert stopped VMs into machine images can be worthwhile. We could do this at, say, 1h of idleness. This will enable use of pd-ssd volumes for rarely-run jobs (since most of the time the pd-ssd doesn't exist). It also cuts away most of the cost associated with diskspace for rarely-run jobs. Expected savings: ~50% on disk costs?
  • Going from machine image -> running VM adds ~70s to the startup time but does not add complexity (the VM gets started automatically when it gets created)
  • We would need extra logic for managing the machine images (deleting them as necessary) in parallel with other ops
  • Going from VM -> machine image means extra steps in Jenkins
  • Delays are so long that we probably need a different solution to time(out) those transitions than what we use to stop-after-idle-threshold. Something like the clean-lost-nodes worker would apply well (but that one hasn't been updated to work properly either yet).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant