Merge pull request #61 from captain-pool/add_image_retraining_tpu

Updating TPU trainer Sample
captain-pool · Aug 25, 2019 · 67c9447 · 67c9447
2 parents 1e11a80 + a18ad71
commit 67c9447
Show file tree

Hide file tree

Showing 3 changed files with 293 additions and 212 deletions.
diff --git a/E1_TPU_Sample/README.md b/E1_TPU_Sample/README.md
@@ -6,7 +6,7 @@
 ### Cloud TPU
 
 **TPU Type:** v2.8
-**Tensorflow Version:** Nightly
+**Tensorflow Version:** 1.14
 
 ### Cloud VM
 
@@ -17,7 +17,7 @@
 Launching Instance and VM
 ---------------------------
 - Open Google Cloud Shell
-- `ctpu up -tf-version nightly`
+- `ctpu up -tf-version 1.14`
 - If cloud bucket is not setup automatically, create a cloud storage bucket
 with the same name as TPU and the VM
 - enable HTTP traffic for the VM instance
@@ -26,35 +26,6 @@ with the same name as TPU and the VM
   - `pip3 install -r requirements.txt`
   - `export CTPU_NAME=<common name of the tpu, vm and bucket>`
 
-Chaning Tensorflow Source Code For Support to Cloud TPU:
---------------------------------------------------------
-TPU is not Officially Supported for Tensorflow 2.0, so it is not exposed in the Public API.
-However in the code, the python files containing the required modules are imported explicitly.
-There's a small bug in `CrossShardOptimizer` which tries to use OptimizerV1 and all Optimizers
-available in the Public API are in V2. To support V2 Optimizers, a small Code Fragment is needed
-to be changed in CrossShardOptimizer's `apply_gradients(...)` function.
-To do that
-- Browse (`cd`) to the installation directory of tensorflow. 
-
-**To find the installation directory:**
-```python3
->>> import os
->>> import tensorflow as tf
->>> print(os.path.dirname(str(tf).split(" ")[-1][1:]))
-```
-
-- `cd` to `python/tpu` inside the installation directory
-- open `tpu_optimizer.py` in an editor
-- change line no. 173 (For Tensorflow 2.0 Beta)
-**From**
-```python3
-     return self._opt.apply_gradients(summed_grads_and_vars, global_step, name)
-```
-**To**
-```python3
-     return self._opt.apply_gradients(summed_grads_and_vars, name=name)
-```
-- Save Changes
 
 Running Tensorboard:
 ----------------------
@@ -74,11 +45,30 @@ To view Tensorboard, Browse to the Public IP of the VM Instance
 
 Running the Code:
 ----------------------
+#### Train The Model
+
 ```bash
 $ python3 image_retraining_tpu.py --tpu $CTPU_NAME --use_tpu \
---model_dir gs://$CTPU_NAME/model_dir \
---data_dir gs://$CTPU_NAME/data_dir \
---batch_size 16 \
---iterations 4 \
+--modeldir gs://$CTPU_NAME/modeldir \
+--datadir gs://$CTPU_NAME/datadir \
+--logdir gs://$CTPU_NAME/logdir \
+--num_steps 2000 \
 --dataset horses_or_humans
 ```
+Training Saves one single checkpoint at the end of training. This checkpoint can be loaded up
+later to export a SavedModel from it.
+
+#### Export Model
+
+```bash
+$ python3 image_retraining_tpu.py --tpu $CTPU_NAME --use_tpu \
+--modeldir gs://$CTPU_NAME/modeldir \
+--datadir gs://$CTPU_NAME/datadir \
+--logdir gs://$CTPU_NAME/logdir \
+--dataset horses_or_humans \
+--export_only \
+--export_path modeldir/model
+```
+Exporting SavedModel of trained model
+----------------------------
+The trained model gets saved at `gs://$CTPU_NAME/modeldir/model` by default if the path is not explicitly stated using `--export_path`
diff --git a/E1_TPU_Sample/image_retraining_tpu.py b/E1_TPU_Sample/image_retraining_tpu.py