Given an input Image we need to predict the Text in the Image with a reasonable accuracy >80% (Exact match with the actual Text Labels) and should have a good letter match accuracy.
- Accuracy
- Letter Accuracy
- CTC Loss
The Model used is Convolutional Recurrent Neural Network(CRNN) which is end-to-end trainable and its architecture consists of three parts:
- convolutional layers, which extract a feature sequence from the input image;
- recurrentlayers, which predict a label distribution for each frame;
- transcription layer, which translates the per-frame predictions into the final label sequence
Please check the iPython Notebook for detailed analysis and implementation of this project.