An Empirical Study of Remote Sensing Pretraining

Current applications

Scene Recognition: Please see Usage for a quick start;

Sementic Segmentation: Please see Remote Sensing Pretraining for Semantic Segmentation;

Object Detection: Please see Remote Sensing Pretraining for Object Detection;

Change Detection: Please see Remote Sensing Pretraining for Change Detection;

ViTAE: Please see ViTAE-Transformer;

Matting: Please see ViTAE-Transformer for matting;

Updates

011/04/2022

The baiduyun links of scene recognition models are provided.

07/04/2022

The paper is post on arxiv!

06/04/2022

The pretrained models for ResNet-50, Swin-T and ViTAEv2-S are released. The code for pretraining and scene recognition task are also provided for reference.

Introduction

This repository contains codes, models and test results for the paper "An Empirical Study of Remote Sensing Pretraining".

The aerial images are usually obtained by a camera in a birdview perspective lying on the planes or satellites, perceiving a large scope of land uses and land covers, whose scene is usually difficult to be interpreted since the interference of the scene-irrelevant regions and the complicated spatial distribution of land objects. Although deep learning has largely reshaped remote sensing research for aerial image understanding and made a great success. However, most of existing deep models are initialized with ImageNet pretrained weights, where the natural images inevitably presents a large domain gap relative to the aerial images, probably limiting the finetuning performance on downstream aerial scene tasks. This issue motivates us to conduct an empirical study of remote sensing pretraining (RSP). To this end, we train different networks from scratch with the help of the largest remote sensing scene recognition dataset up to now-MillionAID, to obtain the remote sensing pretrained backbones, including both convolutional neural networks (CNN) and vision transformers such as Swin and ViTAE, which have shown promising performance on computer vision tasks. Then, we investigate the impact of ImageNet pretraining (IMP) and RSP on a series of downstream tasks including #scene recognition#, semantic segmentation, object detection, and change detection using the CNN and vision transformers backbones.

Results and Models

UCM (8:2)

Backbone	Input size	Acc@1 (μ±σ)	Model
RSP-ResNet-50-E300	224 × 224	99.48 ± 0.10	google & baidu
RSP-Swin-T-E300	224 × 224	99.52 ± 0.00	google & baidu
RSP-ViTAEv2-S-E100	224 × 224	99.90 ± 0.13	google & baidu

AID (2:8)

Backbone	Input size	Acc@1 (μ±σ)	Model
RSP-ResNet-50-E300	224 × 224	96.81 ± 0.03	google & baidu
RSP-Swin-T-E300	224 × 224	96.89 ± 0.08	google & baidu
RSP-ViTAEv2-S-E100	224 × 224	96.91 ± 0.06	google & baidu

AID (5:5)

Backbone	Input size	Acc@1 (μ±σ)	Model
RSP-ResNet-50-300	224 × 224	97.89 ± 0.08	google & baidu
RSP-Swin-T-E300	224 × 224	98.30 ± 0.04	google & baidu
RSP-ViTAEv2-S-E100	224 × 224	98.22 ± 0.09	google & baidu

NWPU-RESISC (1:9)

Backbone	Input size	Acc@1 (μ±σ)	Model
RSP-ResNet-50-E300	224 × 224	93.93 ± 0.10	google & baidu
RSP-Swin-T-E300	224 × 224	93.02 ± 0.12	google & baidu
RSP-ViTAEv2-S-E100	224 × 224	94.41 ± 0.11	google & baidu

NWPU-RESISC (2:8)

Backbone	Input size	Acc@1 (μ±σ)	Model
RSP-ResNet-50-E300	224 × 224	95.02 ± 0.06	google & baidu
RSP-Swin-T-E300	224 × 224	94.51 ± 0.05	google & baidu
RSP-ViTAEv2-S-E100	224 × 224	95.60 ± 0.06	google & baidu

Usage

Installation

Create a conda virtual environment and activate it

conda create -n rsp python=3.8 -y
conda activate rsp
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=10.2 -c pytorch
pip install timm==0.4.12

Install apex (optional)

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Install other requirements:

pip install pyyaml yacs pillow

Clone this repo

git clone https://github.com/ViTAE-Transformer/RSP.git

Data Preparation

We use the MillionAID dataset for pretraining, and fine tune the pretrained model on UCM/AID/NWPU-RESISC45 datasets. For each dataset, we firstly merge all images together, and then split them to training and validation sets, where their information are separately recoded in train_label.txt and valid_label.txt. Note we only consider the third-level categories (totally 51 classes) for MillionAID dataset. The form in train_label.txt is exemplified as

P0960374.jpg dry_field 0
P0973343.jpg dry_field 0
P0235595.jpg dry_field 0
P0740591.jpg dry_field 0
P0099281.jpg dry_field 0
P0285964.jpg dry_field 0
...

Here, 0 is the training id of category for corresponded image.

Training

For pretraining, take ResNet-50 as an example, training on MillionAID dataset with 4 GPU and 512 batch size

python -m torch.distributed.launch --nproc_per_node 4 --master_port 6666 main.py \
--dataset 'millionAID' --model 'resnet' --exp_num 1 \
--batch-size 128 --epochs 300 --img_size 224 --split 100 \
--lr 5e-4  --weight_decay 0.05 --gpu_num 4 \
--output [model save path]

When repeatedly finetuning the pretrained ViTAE model on AID dataset with the setting of (2:8) in 5 times

python -m torch.distributed.launch --nproc_per_node 1 --master_port 7777 main.py \
--dataset 'aid' --model 'vitae_win' --ratio 28 --exp_num 5 \
--batch-size 64 --epochs 200 --img_size 224 --split 1 \
--lr 5e-4  --weight_decay 0.05 --gpu_num 1 \
--output [model save path] \
--pretrained [pretraind vitae path]

Inference

Evaluate the existing model

python -m torch.distributed.launch --nproc_per_node 1 --master_port 8888 main.py \
--dataset 'nwpuresisc' --model 'vitae_win' --ratio 28 --exp_num 5 \
--batch-size 64 --epochs 200 --img_size 224 --split 100 \
--lr 5e-4  --weight_decay 0.05 --gpu_num 1 \
--output [log save path] \
--resume [model load path] \
--eval

Note: When pretraining the Swin model, please uncomment _update_config_from_file(config, args.cfg) in config.py, and add

--cfg configs/swin_tiny_patch4_window7_224.yaml

Statement

This project is for research purpose only. For any other questions please contact di.wang at gmail.com .

References

The codes of Pretraining & Recognition part mainly from Swin Transformer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

An Empirical Study of Remote Sensing Pretraining

Current applications

Updates

Introduction

Results and Models

UCM (8:2)

AID (2:8)

AID (5:5)

NWPU-RESISC (1:9)

NWPU-RESISC (2:8)

Usage

Installation

Data Preparation

Training

Inference

Other Links

Statement

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

An Empirical Study of Remote Sensing Pretraining

Current applications

Updates

Introduction

Results and Models

UCM (8:2)

AID (2:8)

AID (5:5)

NWPU-RESISC (1:9)

NWPU-RESISC (2:8)

Usage

Installation

Data Preparation

Training

Inference

Other Links

Statement

References