Skip to content

Commit

Permalink
Merge pull request #61 from captain-pool/add_image_retraining_tpu
Browse files Browse the repository at this point in the history
Updating TPU trainer Sample
  • Loading branch information
captain-pool authored Aug 25, 2019
2 parents 1e11a80 + a18ad71 commit 67c9447
Show file tree
Hide file tree
Showing 3 changed files with 293 additions and 212 deletions.
60 changes: 25 additions & 35 deletions E1_TPU_Sample/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
### Cloud TPU

**TPU Type:** v2.8
**Tensorflow Version:** Nightly
**Tensorflow Version:** 1.14

### Cloud VM

Expand All @@ -17,7 +17,7 @@
Launching Instance and VM
---------------------------
- Open Google Cloud Shell
- `ctpu up -tf-version nightly`
- `ctpu up -tf-version 1.14`
- If cloud bucket is not setup automatically, create a cloud storage bucket
with the same name as TPU and the VM
- enable HTTP traffic for the VM instance
Expand All @@ -26,35 +26,6 @@ with the same name as TPU and the VM
- `pip3 install -r requirements.txt`
- `export CTPU_NAME=<common name of the tpu, vm and bucket>`

Chaning Tensorflow Source Code For Support to Cloud TPU:
--------------------------------------------------------
TPU is not Officially Supported for Tensorflow 2.0, so it is not exposed in the Public API.
However in the code, the python files containing the required modules are imported explicitly.
There's a small bug in `CrossShardOptimizer` which tries to use OptimizerV1 and all Optimizers
available in the Public API are in V2. To support V2 Optimizers, a small Code Fragment is needed
to be changed in CrossShardOptimizer's `apply_gradients(...)` function.
To do that
- Browse (`cd`) to the installation directory of tensorflow.

**To find the installation directory:**
```python3
>>> import os
>>> import tensorflow as tf
>>> print(os.path.dirname(str(tf).split(" ")[-1][1:]))
```

- `cd` to `python/tpu` inside the installation directory
- open `tpu_optimizer.py` in an editor
- change line no. 173 (For Tensorflow 2.0 Beta)
**From**
```python3
return self._opt.apply_gradients(summed_grads_and_vars, global_step, name)
```
**To**
```python3
return self._opt.apply_gradients(summed_grads_and_vars, name=name)
```
- Save Changes

Running Tensorboard:
----------------------
Expand All @@ -74,11 +45,30 @@ To view Tensorboard, Browse to the Public IP of the VM Instance

Running the Code:
----------------------
#### Train The Model

```bash
$ python3 image_retraining_tpu.py --tpu $CTPU_NAME --use_tpu \
--model_dir gs://$CTPU_NAME/model_dir \
--data_dir gs://$CTPU_NAME/data_dir \
--batch_size 16 \
--iterations 4 \
--modeldir gs://$CTPU_NAME/modeldir \
--datadir gs://$CTPU_NAME/datadir \
--logdir gs://$CTPU_NAME/logdir \
--num_steps 2000 \
--dataset horses_or_humans
```
Training Saves one single checkpoint at the end of training. This checkpoint can be loaded up
later to export a SavedModel from it.

#### Export Model

```bash
$ python3 image_retraining_tpu.py --tpu $CTPU_NAME --use_tpu \
--modeldir gs://$CTPU_NAME/modeldir \
--datadir gs://$CTPU_NAME/datadir \
--logdir gs://$CTPU_NAME/logdir \
--dataset horses_or_humans \
--export_only \
--export_path modeldir/model
```
Exporting SavedModel of trained model
----------------------------
The trained model gets saved at `gs://$CTPU_NAME/modeldir/model` by default if the path is not explicitly stated using `--export_path`
177 changes: 0 additions & 177 deletions E1_TPU_Sample/image_retraining_tpu.py

This file was deleted.

Loading

0 comments on commit 67c9447

Please sign in to comment.