Bangla Digit Recognizer Part 1¶

This is part 1 of the "Bangla digit Recognizer" Series. At the end of this series, you will learn to build models that can recognize bangla digits from images. Our objective on this part is to show how simple it is to build a model with the least amount of codes. We will go into the details of the concepts used here in later parts. So, don't worry if you can't understand, just see the three steps I am mentioning with the implementation in keras (a programming tool). We will also learn about keras along with this series. No need to worry!

Let's first divide our complete process of making a model into some steps. We will alway divide it into the 3 main steps below -¶

Step 1 : Get the data.¶

Step 2 : Design a model.¶

Step 3 : Train the model.¶

For this part, ignore the first 2 blocks( In[1] and In[2] ) below, it will enable us to use some neccesary functions we will be using in implementing the above three steps. Just keep in mind that, this is something you will always have to do.

%matplotlib inline
import utils; reload(utils)
from utils import *
import vgg16; reload(vgg16)
from vgg16 import Vgg16
from keras.preprocessing import image
from __future__ import division, print_function

Using cuDNN version 6021 on context None
Mapped name None to device cuda: Tesla K80 (1928:00:00.0)
Using Theano backend.

path = "/home/thohid/data/bnist/"  #You have to tell the location of the folder you are
                                   #keeping your data - 'bangla digit' 
batch_size=32

Step 1: Get the data¶

batches = get_batches(path + 'train', batch_size = batch_size)         #Gives you the train data
val_batches = get_batches (path + 'valid',batch_size = 2*batch_size)   #Gives you the validation data

Found 16380 images belonging to 10 classes.
Found 7020 images belonging to 10 classes.

To complete step 1, you first need your data. Then, you will have to divide your data (bangla digit images) into two parts - 1.train and 2. validation. You need to put most of the data into the 'train' part. In our case, we have taken 16380 images as train and 7020 as validation data. In the above code, 'get_batches()' does this job for you. In the 1st line,it gets the train data and renames it as 'batches'. And in the second line, it gives us the validation data and renames as 'val_batches'.

Step 2: Design the model¶

vgg = Vgg16()

There are two ways to design a model. Either you will design your own or you will use one that is designed by others. Vgg16 is a model that is designed by some folks from Visual Geometry Group at University of Oxford. Here, 'Vgg16()' gives us this model and we have renamed it as vgg. So, vgg is our model. Now, we have to train it.

Step 3: Train the model¶

vgg.finetune(batches)
vgg.fit(batches, val_batches, nb_epoch=1)

Epoch 1/1
16380/16380 [==============================] - 519s - loss: 0.8408 - acc: 0.7366 - val_loss: 0.3461 - val_acc: 0.8850

This two lines does the training. We can define training a model as the process to make a model gradually better. We will learn all the tricks on training a better model in later part of this series.

Done! vgg is now our trained model that can recognize bangla digits! The fun thing to observe here is that we have implemented our three steps with just 5 lines of codes. How awesome is that!!! This is all the codes you need to make bangla digit recognition.¶

The model in action¶

Let's just put the trained model in action, we will give it some (7024) images of bangla digits which we will name as 'val_data' and use our model to recognise them. We will view 5 images to check. Don't worry if you don't understand the codes now. We will explain the codes in the next part of the series.

val_data = get_data(path+'valid') #'get_data()' gives you the 7024 bangla digit images which we are calling 'val_data'

Found 7020 images belonging to 10 classes.

pred_label = vgg.model.predict_classes(val_data, batch_size*2) #Our model making predictions on val_data

7020/7020 [==============================] - 157s

real_label =  val_batches.classes      #What the image actually represents

filenames = val_batches.filenames

def plots_idx(idx, titles=None):
    plots([image.load_img(path + 'valid/'+ filenames[i]) for i in idx], titles=titles)
    
#Number of images to view for visualization 
n_view = 5

#View 5 images the model recognizes correctly 
correct = np.where(pred_label==real_label)[0]
idx = permutation(correct)[:n_view]
plots_idx(idx, pred_label[idx])

The title above each image denotes our models prediction of what it thinks the image is. We can see our model correctly predicted the above 5 bangla digit images.

Ok, that's it for this part. In part 2, we will discuss about the step 1 and make an attempt to understand the implementation of step 1 in keras. Stay with the series, if you don't understand any part, please ask for help. We will help you to understand and implement all of it by yourself.¶