Skip to content

MichalBusta/OpenCitiesAIC

Repository files navigation

Open Cities AI Challenge

(so far 3rd place solution), pytorch implementation

Project Organization

├── LICENSE
├── README.md                <- The top-level README for developers using this project.
├── config                   <- yaml configuration files 
│   ├── config.yaml          <- config for training
│   ├── config_eval.yaml     <- config for inference
│   ├── dataset.yaml         <- config for train data extraction 
├── models     
│   ├── get_models.sh        <- will download models       
├── requirements.txt         <- The requirements file for reproducing the analysis environment, e.g.
│                              generated with `pip freeze > requirements.txt` filtered to minimal set 
├── combine_images.py        <- final ensable using models average
├── dataset.py               <- data loading / preprocesing
├── find_hyperparameters.py  <- find hyper-parameters for given model
├── inference.py             <- inference script for single model
├── model.py                 <- model definition
├── optim.py                 <- optimizers definition
├── train.py                 <- training script
├── space_net_data.py        <- utility script for GT extraction from SpaceNet dataset (#TODO - hardcoded paths)    

High level overview

Simple approach: end-to-end segmentation with fully convolutional neural network:

step 1:

  • Net architecture: FPN[1] with efficient-net(b1)[2] back-bone
  • Identify hyper-parameters (weight decay, learning rate)
  • train (one-cycle learning policy, optimizers: AdamW, losses: Focal Loss, Dice Loss (in separate heads = train only once, see what work best, try to use all outputs in final ensemble))
  • the network have additional 2 heads – direct classification head (is there a building or not – later to be used for negative examples mining) and scale regression head (the samples in dataset are in different scales, so this was a attempt (fail) to make things easier/better performing in inference)
  • datasets: tier-1, and some data from SpaceNet dataset (7 fold)

step 2: Try “Self-training with Noisy Student”[2], and negative mining

  • label test data and data from tier2 (soft labels) with model from step 1 (with TTA)
  • mine negative samples
  • repeat training with efficient-net(b2) back-bone, for soft labels, the KL divergence loss has been used.

step 3: Standard “competition madness”:

  • average ensemble of models (from steps 1,2) with TTA augumentation (scale, flipping, transposing)

The step 2 is with big ?: I think that just repeating step 1 with b2 backbone will lead to same results

Requirements

  • python >= 3.6
  • torch >= 1.2.0
  • torchvision>=0.3.0
  • albumentations>=0.4.5
  • efficientnet-pytorch==0.6.3
  • geopandas
  • pandas
  • rasterio
  • opencv-python
  • hydra-core
  • numpy
  • scikit-learn
  • optuna (optional)

Infrerence

Download models (following script will download all models used in final submission):

cd models
sh get_models.sh

and run inference:

python inference.py data_dir=<path to test data> model_name=<efficientnet-b1|efficientnet-b2> model=<path to the model name>

for infrerence with model 'b1' model:

python inference.py data_dir=<path to test data> model_name=efficientnet-b1 model=../../models/m40000.pth

*Note: the run folder is generated, so relative paths are ../../models/... or use absolute paths *

for infrerence with model 'b2' model:

python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/ave.pth use_context_block=True use_mish=True

Checkpoints used in final ensamble and performace on public LB score

Backbone Model Inference Resolution Public Jaccard
b1 models/m40000.pth 672 0.8167
b2 models/ave.pth 1024 0.8255
b2 models/ave2.pth 896 0.8203
b2 models/model-b2-1.pth 672 0.8209
b2 models/mb2-m20000.pth x x
b2 models/mb2-m35000.pth x x
b2 models/model-b2-2.pth x x

Train

Data preparation

For training, data are converted to std. format: image(img) and corresponding GT mask, with naming convention (img.(tif|jpg), img_mask.png). Data can be generated by dataset.py script:

python dataset.py base_dir=<path to dataset> out_dir=<output directory> 

Data used in step 1:

python dataset.py base_dir=<path to dataset> out_dir=<output directory> scale=1
python dataset.py base_dir=<path to dataset> out_dir=<output directory> scale=2
python dataset.py base_dir=<path to dataset> out_dir=<output directory> scale=0.5

First mistake: over-sampling: in this way (almost all buildings from dataset are sampled) = there is no 'valid' validation dataset for later use

and some parts of SpaceNet dataset:

(optional) Find hyper-parameters

Parameters / hyper-paramers are tuned for 1CPU/2threads, 1060 GPU, for better performance, set num_workers parameter to CPUs * threads -1 and:

  • test max. batch size for your GPU (run train.py for a few steps increasing batch_size parameter until you get CUDA out of memory - provided learning rate and weight decay are for = batch size: 5, input width = 512 )
python find_hyperparamets.py batch_size=<?> data_dir=<path to train data>

Train model

For tracing of the training process, the https://neptune.ai/ platform is used. (If you want to enable web logging, fill 'neptune..' fields in config.yaml file)

For 'b1' backbone:

python train.py batch_size=<your batch size / my is 5> data_dir=<path to train data> max_lr=<lr found by find_hyperparamers / ~ 0.0002> weight_decay=<found by find_hyperparamers / ~6.322983921368948e-7> fold=0 model_name=efficientnet-b1

For 'b2' backbone:

python train.py batch_size=<your batch size / my is 5> data_dir=<path to train data> max_lr=<lr found by find_hyperparamers / ~ 0.0002> weight_decay=<found by find_hyperparamers / ~6.322983921368948e-7> fold=0 model_name=efficientnet-b2 use_mish=True use_context_block=True

(optional) mine some negative samples from test set and add them to train dataset

python inference.py model=<path to the project>/models/m40000.pth model_name=efficientnet-b1 width=512 debug=False data_dir=<path to src(test) images> mine_empty=True output_dir_empty=<output directory for images>

(optional) label test data and data from tier_2 with soft labels using best performing model / model ensamble

Just run inference on test data:

python inference.py model=<path to the project>/models/m40000.pth model_name=efficientnet-b1 width=512 debug=False data_dir=<path to tier_2/test images> output_dir=outputs_masks

#next steps are with TTA 
python inference.py model=<path to the project>/models/m40000.pth model_name=efficientnet-b1 width=672 debug=False data_dir=<path to tier_2/test images> output_dir=outputs_masks

python inference.py model=<path to the project>/models/m40000.pth model_name=efficientnet-b1 width=672 debug=False data_dir=<path to tier_2/test images>  flip=2 output_dir=outputs_masks

...

python combine_images.py base_dir=../../outputs_masks add_mask_suffix=True

and train model with soft masks:

python train.py batch_size=<your batch size / my is 5> data_dir=<path to train data> max_lr=<lr found by find_hyperparamers / ~ 0.0002> weight_decay=<found by find_hyperparamers / ~6.322983921368948e-7> fold=0 model_name=efficientnet-b2 use_mish=True use_context_block=True drop_connect_rate=0.5 soft_labels_dir=<path to dir with generated soft labels>  

(optional)

Run training on different fold for few epochs using already trained model ... add to final ensamble ... mine more negative samples ...

Final ensable

Nothing clever here, outputs average - just TTA with image fliping, transposing, scaling

#efficientnet-b1
python inference.py data_dir=<path to test data> model_name=efficientnet-b1 model=../../models/m40000.pth width=672 threshold=0.56 output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b1 model=../../models/m40000.pth width=672 threshold=0.56 flip=3 output_dir=outputs_final #TTA flip input over dim 3
python inference.py data_dir=<path to test data> model_name=efficientnet-b1 model=../../models/m40000.pth width=672 threshold=0.56 flip=3 output_dir=outputs_final #TTA flip input over dim 3
python inference.py data_dir=<path to test data> model_name=efficientnet-b1 model=../../models/m40000.pth width=672 threshold=0.56 flip=2 output_dir=outputs_final #TTA flip input over dim 2
python inference.py data_dir=<path to test data> model_name=efficientnet-b1 model=../../models/m40000.pth width=672 threshold=0.56 flip=23 output_dir=outputs_final #TTA flip input over dim 2 and 3
python inference.py data_dir=<path to test data> model_name=efficientnet-b1 model=../../models/m40000.pth width=672 threshold=0.56 flip=0 transpose=True output_dir=outputs_final #TTA transpose image

#efficientnet-b2
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/ave.pth width=1024 threshold=0.56 flip=0 transpose=True use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/ave.pth width=1024 threshold=0.56 flip=3 transpose=True use_mish=True use_context_block=True output_dir=outputs_final #TTA flip 3
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/ave.pth width=1024 threshold=0.56 flip=2 transpose=True use_mish=True use_context_block=True output_dir=outputs_final #TTA flip 2
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/ave.pth width=1024 threshold=0.56 flip=23 transpose=True use_mish=True use_context_block=True output_dir=outputs_final #TTA flip 23

#To this point, public LB  shuld be: 0.8393

#Meta-algorithm for next part (ideal for people with sleeping disorder):
#1. run inference with random TTA random checkpoint
#2. -> do something meaningful until results are generated (optional)  
#3. add results to ensamle 
#4. submit results if improves LB score keep resutls
#5. go to step one

#efficientnet-b2: some scales / different check-points ...
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/ave2.pth width=512 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final 
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/ave2.pth width=896 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/ave2.pth width=960 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/mb2-m20000.pth width=960 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/mb2-m20000.pth width=832 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/mb2-m20000.pth width=1024 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/model-b2-1.pth width=896 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/mb2-m35000.pth width=640 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final
  
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/model-b2-2.pth width=1024 threshold=0.56 use_mish=True use_context_block=True output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/model-b2-2.pth width=800 threshold=0.56 use_mish=True use_context_block=True flip=23 output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/model-b2-2.pth width=800 threshold=0.56 use_mish=True use_context_block=True flip=23 output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/model-b2-2.pth width=992 threshold=0.56 use_mish=True use_context_block=True flip=3 output_dir=outputs_final
python inference.py data_dir=<path to test data> model_name=efficientnet-b2 model=../../models/model-b2-2.pth width=640 threshold=0.56 use_mish=True use_context_block=True flip=3 output_dir=outputs_final
 
#Great improvement: public LB  + 0.002 :) 
 
python combine_images.py base_dir=../../outputs_final

Final score is probably reachable using just 2 models: models/m40000.pth and models/ave.pth with TTA augumentation

Refrerences

  1. Feature Pyramid Networks for Object Detection
  2. Rethinking Model Scaling for Convolutional Neural Networks. ICML 2019
  3. Self-training with Noisy Student improves ImageNet classification

Notes

Possible reasons why step 2 did not work (so well):

  • Noise in network was increased just in back-bone part
  • Small amount of data with soft labels / does not work well with the segmentation task