Skip to content

GCP Help

Anirudh Vegesana edited this page Feb 26, 2021 · 6 revisions

General Use Case (Creating the VM and TPU) in Google Cloud shell

Get the $PROJECT_NAME from your team leader.

To create a VM for an existing TPU

export PROJECT_NAME=$PROJECT_NAME
export VM_NAME=yolo
export CPU_SIZE=e2-standard-16
export TPU_ZONE=us-central1-a
gcloud config set project $PROJECT_NAME

ctpu up --vm-only --name=$VM_NAME --project=$PROJECT_NAME --zone=$TPU_ZONE --disk-size-gb=50 --machine-type=$CPU_SIZE --tf-version=2.4.1

Inside the VM:

export TPU_NAME=fat-boy
export BRANCH_NAME=yolo_debug
export EXPERIMENT_NAME=yolov4-coco-tpuV2-256
export MODEL_DIR=$EXPERIMENT_NAME
export LOG_FILE=$EXPERIMENT_NAME

git clone https://github.com/PurdueCAM2Project/TensorFlowModels
cd TensorFlowModels
git checkout $BRANCH_NAME
pip install -r yolo/requirements.txt
nohup python3 -m yolo.train_vm --mode=train_and_eval --experiment=yolo_custom --config_file=yolo/configs/experiments/$EXPERIMENT_NAME.yaml --model_dir="gs://tensorflow2/$MODEL_DIR" > $LOG_FILE &
tail -f $LOG_FILE

To create a VM and a TPU at the same time

export PROJECT_NAME=$PROJECT_NAME
export VM_NAME=yolo
export TPU_SIZE=v2-64
export CPU_SIZE=e2-standard-16
export TPU_ZONE=us-central1-a
gcloud config set project $PROJECT_NAME
ctpu up --tpu-size=$TPU_SIZE --name=$VM_NAME --project=$PROJECT_NAME --zone=$TPU_ZONE --disk-size-gb=50 --machine-type=$CPU_SIZE --tf-version=2.4.1

Inside the VM:

export VM_NAME=yolo
export TPU_NAME=$VM_NAME
export BRANCH_NAME=yolo_debug
export EXPERIMENT_NAME=yolov4-coco-tpuV2-256
export MODEL_DIR=$EXPERIMENT_NAME
export LOG_FILE=$EXPERIMENT_NAME

git clone https://github.com/PurdueCAM2Project/TensorFlowModels
cd TensorFlowModels
git checkout $BRANCH_NAME
pip install -r yolo/requirements.txt
nohup python3 -m yolo.train_vm --mode=train_and_eval --experiment=yolo_custom --config_file=yolo/configs/experiments/$EXPERIMENT_NAME.yaml --model_dir="gs://tensorflow2/$MODEL_DIR" > $LOG_FILE &
tail -f $LOG_FILE

Installing Google Cloud SDK (Optional)

Follow the instructions here: https://cloud.google.com/sdk/docs/install Then download ctpu to the bin folder in the Cloud SDK from this Git repo: https://github.com/tensorflow/tpu/blob/master/tools/ctpu/README.md

Creating VMs and TPUs in CGP

First, read through Google's documentation for creating and deleting TPUs and linking buckets to the VM. This will show how to do this in the GCP console, in your Terminal, and the Cloud Shell (cptu commands).

https://cloud.google.com/tpu/docs/creating-deleting-tpus

Authorize the TPU to edit GCP buckets (only need to do once per project)

https://cloud.google.com/tpu/docs/storage-buckets

Run the command in GCP shell

gcloud beta services identity create --service tpu.googleapis.com --project $PROJECT_NAME

Creating an SSH Key (only need to do once per project)

https://cloud.google.com/compute/docs/instances/adding-removing-ssh-keys#createsshkeys

Logging in with the SSH UI interface in the GCP will create a new user for your personal account. That is fine with the cam2tensorflow account, but if you want to use your personal account, you need to make your own key to access the cam2tensorflow account.

ssh-keygen -t rsa -f ~/.ssh/gcp -C cam2tensorflow

Add your generated ~/.ssh/gcp.pub to the metadata https://console.cloud.google.com/compute/metadata

Creating VMs and TPUs in Python (Optional)

Creating an IAM key

https://cloud.google.com/iam/docs/creating-managing-service-account-keys

Go here, click on the developer service account and create a key for the service account. https://console.cloud.google.com/iam-admin/serviceaccounts

pip install google-cloud-storage

TODO: Understand the Python