This is part 1 of the "Bangla digit Recognizer" Series. At the end of this series, you will learn to build models that can recognize bangla digits from images. Our objective on this part is to show how simple it is to build a model with the least amount of codes. We will go into the details of the concepts used here in later parts. So, don't worry if you can't understand, just see the three steps I am mentioning with the implementation in keras (a programming tool). We will also learn about keras along with this series. No need to worry!
For this part, ignore the first 2 blocks( In[1] and In[2] ) below, it will enable us to use some neccesary functions we will be using in implementing the above three steps. Just keep in mind that, this is something you will always have to do.
%matplotlib inline
import utils; reload(utils)
from utils import *
import vgg16; reload(vgg16)
from vgg16 import Vgg16
from keras.preprocessing import image
from __future__ import division, print_function
path = "/home/thohid/data/bnist/" #You have to tell the location of the folder you are
#keeping your data - 'bangla digit'
batch_size=32
batches = get_batches(path + 'train', batch_size = batch_size) #Gives you the train data
val_batches = get_batches (path + 'valid',batch_size = 2*batch_size) #Gives you the validation data
To complete step 1, you first need your data. Then, you will have to divide your data (bangla digit images) into two parts - 1.train and 2. validation. You need to put most of the data into the 'train' part. In our case, we have taken 16380 images as train and 7020 as validation data. In the above code, 'get_batches()' does this job for you. In the 1st line,it gets the train data and renames it as 'batches'. And in the second line, it gives us the validation data and renames as 'val_batches'.
vgg = Vgg16()
There are two ways to design a model. Either you will design your own or you will use one that is designed by others. Vgg16 is a model that is designed by some folks from Visual Geometry Group at University of Oxford. Here, 'Vgg16()' gives us this model and we have renamed it as vgg. So, vgg is our model. Now, we have to train it.
vgg.finetune(batches)
vgg.fit(batches, val_batches, nb_epoch=1)
This two lines does the training. We can define training a model as the process to make a model gradually better. We will learn all the tricks on training a better model in later part of this series.
Let's just put the trained model in action, we will give it some (7024) images of bangla digits which we will name as 'val_data' and use our model to recognise them. We will view 5 images to check. Don't worry if you don't understand the codes now. We will explain the codes in the next part of the series.
val_data = get_data(path+'valid') #'get_data()' gives you the 7024 bangla digit images which we are calling 'val_data'
pred_label = vgg.model.predict_classes(val_data, batch_size*2) #Our model making predictions on val_data
real_label = val_batches.classes #What the image actually represents
filenames = val_batches.filenames
def plots_idx(idx, titles=None):
plots([image.load_img(path + 'valid/'+ filenames[i]) for i in idx], titles=titles)
#Number of images to view for visualization
n_view = 5
#View 5 images the model recognizes correctly
correct = np.where(pred_label==real_label)[0]
idx = permutation(correct)[:n_view]
plots_idx(idx, pred_label[idx])
The title above each image denotes our models prediction of what it thinks the image is. We can see our model correctly predicted the above 5 bangla digit images.