This project demonstrates semantic image segmentation using U-Net and SegNet architectures.
- Algorithm: Ronneberger et al., U-Net: Convolutional Networks for Biomedical Image Segmentation
- Dataset: Oxford IIIT Pet Dataset (Kaggle version)
- Packages:
TensorFlow
,NumPy
,scikit-learn
,Pillow
,imageio
,matplotlib
.
├── notebooks/
├── data/
│ ├── images/
│ └── annotations/
│ └── trimaps/
├── visuals/
└── README.md
- Images:
.jpg
format - Masks:
.png
and.jpg
inannotations/trimaps
- Script loads, sorts, resizes, and normalizes data for training.
def LoadData (path1, path2):
Loads, sorts, and matches images/masks
...
def PreprocessData(img, mask, target_shape_img, target_shape_mask, path1, path2):
Preprocesses images and masks into arrays ...
The architecture uses encoder and decoder blocks with skip connections.
Key U-Net functions (expand)
def EncoderMiniBlock(inputs, n_filters=32, dropout_prob=0.3, max_pooling=True):
...
def DecoderMiniBlock(prev_layer_input, skip_layer_input, n_filters=32):
...
def UNetCompiled(input_size=(128, 128, 3), n_filters=32, n_classes=3):
...
- Load and view data
- Preprocess to shapes
[128,128,3]
(images) and[128,128,1]
(masks) - Train/test split (
20%
held out) - Model Architecture:
unet = UNetCompiled(input_size=(128,128,3), n_filters=32, n_classes=3)
unet.compile(
optimizer=tf.keras.optimizers.Adam(),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy']
)
results = unet.fit(X_train, y_train, batch_size=32, epochs=20, validation_data=(X_valid, y_valid))
- Plots for
loss
andaccuracy
(train/val) - Qualitative comparison: input, true mask, predicted mask
fig, axis = plt.subplots(1, 2, figsize=(20, 5))
axis[0].plot(results.history["loss"], color='r', label = 'train loss')
axis[0].plot(results.history["val_loss"], color='b', label = 'val loss')
...
def VisualizeResults(index):
Shows input, true mask, prediction ...
Custom implementation for per-class Intersection over Union:
def iou_metric(num_classes):
...
def per_class_iou(y_true, y_pred, num_classes):
...
Used in model compile:
unet.compile(
optimizer=...,
loss=...,
metrics=['accuracy', iou_metric(num_classes=3)]
)
- Encoder: stacked Conv+ReLU+MaxPool (indices not used for unpooling here)
- Decoder: Upsampling, Conv, ReLU
def EncoderBlock(x, filters, dropout=0.0):
...
def DecoderBlock(x, skip, filters):
...
def SegNet(input_shape=(128,128,3), n_classes=3):
...
segnet = SegNet(input_shape=(128,128,3), n_classes=3)
segnet.compile(
optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy', iou_metric(num_classes=3)]
)
results_seg = segnet.fit(X_train, y_train, epochs=20, batch_size=32, validation_data=(X_valid, y_valid))
Visualize predictions:
def VisualizeResults_all(model, index):
Shows input, true mask, and segmentation ...
VisualizeResults_all(segnet, 150)
- Example accuracy: U-Net ≈ 89%, SegNet ≈ 85–89%
- Loss decreases smoothly for both, low overfit observed.
- IoU by class (example run):
Class 0 (background): U-Net ~0.80, SegNet ~0.75
Class 1 (animal): U-Net ~0.88, SegNet ~0.85
Class 2 (contour): U-Net ~0.49, SegNet ~0.40
Summary:
- U-Net outperforms SegNet, especially on fine structure (contour class), due to richer skip-connections.
- Implement IoU as metric and for per-class evaluation
- Implement SegNet architecture
- Compare results (accuracy, loss, IoU breakdown)
- Try adding more data augmentation, regularization
- Test transfer learning or deeper backbone
- U-Net original paper (arXiv)
- Oxford IIIT Pet Dataset
- Kaggle: Pet Dataset page
- SegNet paper (arXiv)
- Tensorflow Documentation
- Image Segmentation with U-Net – Example Notebook
- Official U-Net implementation (Keras)
MIT License
For questions or discussion:
Gmail (perinadaria19@gmail.com)