GCN_Image_Annotation

Implementation of Graph Convolutional Network to Annotate Corel-5k images with PyTorch library

Dataset

There is a 'Corel-5k' folder that contains the (Corel-5k) dataset with 5000 real images (and 13,500 fake images), which has 260 labels in the vocabulary.

(for more information see CNN_Image_Annotation_dataset)

Data Augmentation

I used a multi-label data augmentation method based on Wasserstein-GAN which is fully described here: CNN_Image_Annotation_data_augmentation

Convolutional model

As compared to other CNNs in my experiments, TResNet produced the best results for extracting features of images, so it has been chosen as the feature extractor.

(more information can be found at CNN_Image_Annotation_convolutional_models)

Graph Convolutional Network

As objects normally co-occur in an image, it is desirable to model the label dependencies to improve the annotation performance. For capturing and exploring such important dependencies, In this study, I employ a model based on graph convolutional networks (GCN), which has been described below:

As mentioned in the paper of Z-M. Chen, et al, in order to build the static GCN, we need to make a static correlation matrix as an explicit relationship between labels and try to enhance the relationship of the embedded words, which indicates the implicit relation between labels, by this correlation matrix.

The structures of CNN-GCN & GCN are shown in the images below:

Static Correlation (Adjacency) Matrix:

Z-M. Chen, et al, model the label correlation dependency as a conditional probability. for example, P(L(person) | L(surfboard)) denotes the probability of occurrence of label L(person) when label L(surfboard) appears. As shown in the image below P(L(person) | L(surfboard)) is not equal to P(L(surfboard) | L(person)), Thus, the correlation matrix is an asymmetrical matrix.

Word Embedding:

As shown in the images above, the labels need to be vectorized before they are sent to GCN. There are many word embedding techniques, but for static GCN we use GloVe embeddings, which have shown better results than the other methods.

Different word embeddings will hardly affect the accuracy, which reveals improvements do not absolutely come from the semantic meanings derived from word embeddings, rather than GCN.

The image below shows the relations between labels after using the word embedding technique (t-sne: 300d -> 2d):

The image below shows the relations between labels after training by GCN (t-sne: 2048d -> 2d):

Evaluation Metrics

Precision, Recall, F1-score, and N+ are the most popular metrics for evaluating different models in image annotation tasks. I've used per-class (per-label) and per-image (overall) precision, recall, f1-score, and also N+ which are common in image annotation papers.

(check out CNN_Image_Annotation_evaluation_metrics for more information)

Train and Evaluation

To train the model in Spyder IDE use the code below:

run main.py --loss-function {select loss function}

Please note that:

You should put BCELoss, FocalLoss or AsymmetricLoss in {select loss function}.

Using augmented data, you can train the model as follows:

run main.py --loss-function {select loss function} --augmentation

To evaluate the model in Spyder IDE use the code below:

run main.py --loss-function {select loss function} --evaluate

Results

asymmetric loss (more information at asymmetric loss)

global-pooling	batch-size	num of training images	image-size	epoch time	𝛾+	𝛾-	m
avg	32	4500	448 * 448	135s	0	4	0.05

data	precision	recall	f1-score
testset per-image metrics	0.594	0.670	0.630
testset per-class metrics	0.453	0.495	0.473

data	N+
testset	175

References

Z-M. Chen, X-S. Wei, P. Wang, and Y. Guo.
"Multi-Label Image Recognition with Graph Convolutional Networks" (CVPR - 2019)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.spyproject/config		.spyproject/config
Corel-5k		Corel-5k
checkpoints		checkpoints
glove		glove
README.md		README.md
dataset.py		dataset.py
engine.py		engine.py
evaluation_metrics.py		evaluation_metrics.py
image_show.py		image_show.py
loss_function.py		loss_function.py
main.py		main.py
models.py		models.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GCN_Image_Annotation

Dataset

Data Augmentation

Convolutional model

Graph Convolutional Network

Evaluation Metrics

Train and Evaluation

Results

asymmetric loss (more information at asymmetric loss)

References

About

Uh oh!

Languages

parham1998/GCN_Image_Annotation

Folders and files

Latest commit

History

Repository files navigation

GCN_Image_Annotation

Dataset

Data Augmentation

Convolutional model

Graph Convolutional Network

Evaluation Metrics

Train and Evaluation

Results

asymmetric loss (more information at asymmetric loss)

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages