Deep Learning – rohit apte

Character level language models using Recurrent Neural Networks

In recent years Recurrent Neural Networks have shown great results in NLP tasks – generating text, neural machine translation, question answering, and a lot more.

In this post we will explore text generation – teaching computers to write in a certain style. This is based off (and a recreation of) Andrej Karpathy’s famous article The Unreasonable Effectiveness of Recurrent Neural Networks.

Predicting the next character in a sentence is a language model problem. Traditionally these were done using n-gram models. For example a unigram model would be the distribution of individual characters. At each time step we would predict a character using that probability distribution. A bigram model would take the probability distribution of 2 characters (for example, given the first letter a, what is the probability of the second letter is n). Mathematically

$P(W_n|W_{n-1}) = \frac{P(W_{n-1},W_n)}{P(W_{n-1})}$

Doing this at a word level has a disadvantage – how to handle out of vocabulary words. Character models don’t have this problem since they learn general distributions of the underlying text. However, the challenge with n-gram models (word and character) is that the memory required grows exponentially with each additional n. We therefore have a limit to how far back in a sequence we can look. In our example we use an alphabet size of 98 characters (small case and capital letters, and special characters like space, parenthesis etc). A bigram model would take have 9,604 possible letter pairs. With a trigram model it grows to 941,192 possible triplets. In our example we go back 30 characters. That would require us to store 5.46e59 possible combinations.

This is where we can leverage the use of RNNs. I’m assuming you have an understanding of LSTMs and I will only describe the network architecture here. There is an excellent article by Christopher Olah on understanding RNNs and LSTMs that goes into the details of the underlying math.

For this problem we take data in sequences of 30 characters and try to predict the next character for each letter. We are using stateful LSTMs – the data is fed in batches but each batch is a continuation of the previous one. We also save the state of the LSTM at the end of each batch and use this as the initial state for the next batch. The benefit of doing this is that the system can learn longer term dependencies like closing an open parenthesis or bracket, ending a sentence with a period, etc. The code is available on my GitHub, and you can tweak the model parameters to see how the results look.

The model is agnostic to the data. I ran it on 3 different datasets – Shakespeare, Aesop’s fables and a crawl of Paul Graham‘s website. The same code learns to write in each style after a few epochs. In each case, it learns formatting, which words are commonly used, to close open quotes and parenthesis, etc.

We generate sample data as follows – we sample a capital letter (“L” in our case) and then ask the RNN to predict the next letter. we take the n highest probabilities (2 in these examples, but its a parameter that can be adjusted) and generate the next letter. Using that letter we generate the next one, and so on. Here are samples of the data for each dataset.

Shakespeare – we can see that the model learns quickly. At the end of the first epoch its already learned to format the text, close parenthesis (past the 30 character input) and add titles and scenes. After 5 epochs it gets even better and at 60 epochs it generates very “Shakespeare like” text.

OLENTZA	My large of my soul so more that I will
	Then then will not so. The king of my lord, and we would
	to the more and my lord, that the word of this hand
	And than this way to this, and to be my lord.

	[Exeunt]

	THE TONY VENCT IV

SCENE I	A part off.

	[Enter MARCUTES and Servant]
	TREISARD PERTINE, LARD LEND, LECE,)

	THE SARDINAND, and Antondance of Encarder to the pain and the part on the caption.


	[Exeunt MERCUS and SIR ANDRONICUS, LEONATUS]

FARDINAL	How now, so, my lord.


	[Enter MARCUS and MARIUS]                                      Where is the stare?

	[Exeunt]

	THE TONY VELINE LENR ENT

ACT II

SCENE V	I have her heart of the heaver to the part.

	[Enter LARIUS, and Sending]

	[Exeunt LACIUS, and Servants]

		                                            
   Where is this would say you say,
	And where you have some thanks of this wild but
	Then we will not some many me thank you will
	That I would no more to my live to her the
	past of my hand. I would

OLENTZA My large of my soul so more that I will

Then then will not so. The king of my lord, and we would

to the more and my lord, that the word of this hand

And than this way to this, and to be my lord.

[Exeunt]

THE TONY VENCT IV

SCENE I A part off.

[Enter MARCUTES and Servant]

TREISARD PERTINE, LARD LEND, LECE,)

THE SARDINAND, and Antondance of Encarder to the pain and the part on the caption.

[Exeunt MERCUS and SIR ANDRONICUS, LEONATUS]

FARDINAL How now, so, my lord.

[Enter MARCUS and MARIUS] Where is the stare?

[Exeunt]

THE TONY VELINE LENR ENT

ACT II

SCENE V I have her heart of the heaver to the part.

[Enter LARIUS, and Sending]

[Exeunt LACIUS, and Servants]

Where is this would say you say,

And where you have some thanks of this wild but

Then we will not some many me thank you will

That I would no more to my live to her the

past of my hand. I would

ANDORE	With this the service of a servant to him, and
	the wind which hath been seen to take the countenance,
	the wind of thee and still to see his sine of him.

	[Exeunt LORD POLONIUS and LADY MACBETH]

	And so with you, my lord, I will not see
	The sun of thee that I have heard your sines. 
	[Exit]

	[Enter a Senator of the world of Winchester]

	How now, my lord, the king of England say so says
	The season will be married to thy heart.
	Therefore, the king, that would have seen the sense
	Of the worst, that we will never see him.

SILVIA	I will not have her better than he hath seen.

	[Exeunt BARDOLPH, and Attendants]

	That hath my signior shall not speak to me,
	To say that I have have the crown of men
	And shall I have the strength of this my love.
	I hope they will, and see thee to
 thy heart that thou dost send
	The traitors of the state of thine,
	And shall the service to the constancy.

	[Exeunt]

	KING HENRY VI
ACT III
SCENE III	A street of the part.

ANDORE With this the service of a servant to him, and

the wind which hath been seen to take the countenance,

the wind of thee and still to see his sine of him.

[Exeunt LORD POLONIUS and LADY MACBETH]

And so with you, my lord, I will not see

The sun of thee that I have heard your sines.

[Exit]

[Enter a Senator of the world of Winchester]

How now, my lord, the king of England say so says

The season will be married to thy heart.

Therefore, the king, that would have seen the sense

Of the worst, that we will never see him.

SILVIA I will not have her better than he hath seen.

[Exeunt BARDOLPH, and Attendants]

That hath my signior shall not speak to me,

To say that I have have the crown of men

And shall I have the strength of this my love.

I hope they will, and see thee to

thy heart that thou dost send

The traitors of the state of thine,

And shall the service to the constancy.

[Exeunt]

KING HENRY VI

ACT III

SCENE III A street of the part.

	[Enter another Messenger]

Messenger	Madam, I will not speak to them.


CORDELIA	I have a son of the person of the world.

	[Exit Servant]

		            How now, Montague,
	What says the word of the foul discretion?
	The king has been a maiden short of him.

	[Exeunt]

	THE TWO GENTLEMEN OF VERONA

ACT III

SCENE II	The forest.

	[Alarum. Enter KING LEAR, and others]

KING HENRY VIII	Then look upon the king and the street of the king,
	As the subject of my son is most deliver'd,
	And my the honest son of my honour,
	That I must see the king and tongue that shows
	That I have made them all as things as me.

KING HENRY VIII	What, will you see my heart and to my son?

KING HENRY VI	Why, then, I say an earl and the sense
	Who shall not stay the sun to heart to speak.
	I have not seen my heart, and shall be so.
	The king's son will not be a monster's son.

	[Exit]

[Enter another Messenger]

Messenger Madam, I will not speak to them.

CORDELIA I have a son of the person of the world.

[Exit Servant]

How now, Montague,

What says the word of the foul discretion?

The king has been a maiden short of him.

[Exeunt]

THE TWO GENTLEMEN OF VERONA

ACT III

SCENE II The forest.

[Alarum. Enter KING LEAR, and others]

KING HENRY VIII Then look upon the king and the street of the king,

As the subject of my son is most deliver'd,

And my the honest son of my honour,

That I must see the king and tongue that shows

That I have made them all as things as me.

KING HENRY VIII What, will you see my heart and to my son?

KING HENRY VI Why, then, I say an earl and the sense

Who shall not stay the sun to heart to speak.

I have not seen my heart, and shall be so.

The king's son will not be a monster's son.

[Exit]

Paul Graham posts – we have about 80% less data compared to Shakespeare and his writing style is more “diverse” so the model doesn’t do as well after the first epoch. Words are often incomplete. After 5 epochs we see a significant improvement – most words and the language structure are correct. The writing style is starting to resemble Paul Graham. After 60 epochs we see a big improvement overall but still have issues with some nonsensical words.

They've deal in the for the sere the searse to work of the first 
the progetting the founders to be the founders.

They're so exame the for to may be the finst a company. It's no 
have to grink to  better who have to get the form to be a seal 
anyone with to griet to be the startup work to be. The  fart a 
could be a startup wilh be they deal they work the strate 
they're the find on the straction that they would have to get 
the founders will be a stre finst invest the find that they was 
that the precest in an example who work to griet than you con'l 
your a startup work the serves to way to grow the first that 
they want to be a sear how and and the first people where the 
seee to be a lear how the first the perple the sere the first 
perple will be an exter is to be to be the serves the first 
that they'll the founders when you can't was any deal the 
four examply, and you can de a startup with to growth that 
while that's the prectice the seal a find the sere they want 
the for a seart prese

They've deal in the for the sere the searse to work of the first

the progetting the founders to be the founders.

They're so exame the for to may be the finst a company. It's no

have to grink to better who have to get the form to be a seal

anyone with to griet to be the startup work to be. The fart a

could be a startup wilh be they deal they work the strate

they're the find on the straction that they would have to get

the founders will be a stre finst invest the find that they was

that the precest in an example who work to griet than you con'l

your a startup work the serves to way to grow the first that

they want to be a sear how and and the first people where the

seee to be a lear how the first the perple the sere the first

perple will be an exter is to be to be the serves the first

that they'll the founders when you can't was any deal the

four examply, and you can de a startup with to growth that

while that's the prectice the seal a find the sere they want

the for a seart prese

It's not sure they were something that they're supposed to be 
a lot of things. The risk of the problems is a startup.

It's that they wanted to be too much to them and seem to be 
the product valuation of the street that would be a lot of 
people to seem that they could be able to do that.

It's the same problems to start a startup is an idea of the 
product valuetion of a startup in the companies to start a 
series. And that seens the press of a startup to be able to 
do it to start a startup in the same.

It would become things that wants to be the big commany.

The problem with a strett is a startup is that they're also 
being a straight of a smart startup ideas. The problem is 
that they're all to do it. It's the same thing to be some 
kind of person that was a lot of people who are to seem to 
be a lot of people to start any startup. They're so much 
that's what they're a large problem is to be the structural 
offer and the problem is the startup ideas. It's the 
sacrisical company they

It's not sure they were something that they're supposed to be

a lot of things. The risk of the problems is a startup.

It's that they wanted to be too much to them and seem to be

the product valuation of the street that would be a lot of

people to seem that they could be able to do that.

It's the same problems to start a startup is an idea of the

product valuetion of a startup in the companies to start a

series. And that seens the press of a startup to be able to

do it to start a startup in the same.

It would become things that wants to be the big commany.

The problem with a strett is a startup is that they're also

being a straight of a smart startup ideas. The problem is

that they're all to do it. It's the same thing to be some

kind of person that was a lot of people who are to seem to

be a lot of people to start any startup. They're so much

that's what they're a large problem is to be the structural

offer and the problem is the startup ideas. It's the

sacrisical company they

The programing language is the problem with the product than 
a lot of people to start a startup in the same. If you want 
to get the people you'll have a large pattern of an advantage, 
you can start to get a startup. It's not just the same thing 
that could be able to start a startup is the same patents. 
It's not as a sign that it's not just the biggest stagtup 
ideas. They won't get them a lot of them and the best 
startups to sell to a starting to build a series A round.

The reason I think it's the same. If the fundamental person 
to do that, the best people are also trying to do it will 
tend to be the same time.

This was the part of the first time they can do to start 
the startups than they wanted. They're also the only way 
to get a lot of people who want to do it. They'll be able 
to seem to be a great deal to start a startup that startups 
will succeed as a significant company.

It won't seem a lot of people to be so much that it would 
be. The people are so close that they were already true 
that they were a list of the same problems. They can't 
see the rest of their prevertions. To this control of 
startups are the same time and all they could get to the 
standard of a startup than they were startups. They won't 
believe it that's the serioss of their own startups. The 
people are a lot of the problem to start a startup to be 
the second is that it seems to be a good idea.

The programing language is the problem with the product than

a lot of people to start a startup in the same. If you want

to get the people you'll have a large pattern of an advantage,

you can start to get a startup. It's not just the same thing

that could be able to start a startup is the same patents.

It's not as a sign that it's not just the biggest stagtup

ideas. They won't get them a lot of them and the best

startups to sell to a starting to build a series A round.

The reason I think it's the same. If the fundamental person

to do that, the best people are also trying to do it will

tend to be the same time.

This was the part of the first time they can do to start

the startups than they wanted. They're also the only way

to get a lot of people who want to do it. They'll be able

to seem to be a great deal to start a startup that startups

will succeed as a significant company.

It won't seem a lot of people to be so much that it would

be. The people are so close that they were already true

that they were a list of the same problems. They can't

see the rest of their prevertions. To this control of

startups are the same time and all they could get to the

standard of a startup than they were startups. They won't

believe it that's the serioss of their own startups. The

people are a lot of the problem to start a startup to be

the second is that it seems to be a good idea.

Aesop’s fables – the dataset is quite small so the model takes a lot longer to train. But it also gives us an insight into how the RNN is learning. After 1 epoch it only learns the more common letters in the language. It took 15 epochs for it to start to put words together. After 60 epochs it does better, but still has non English words. But it does learn the writing style (animal names in capital, different formatting from the above examples, etc).

 ee e   oo ttt tt   ttt e  e  e eeee  e ee e  ee e    ee      
e   eee      eeee e        e  e    eee
  eeee   ee  ee e  ee ee   ee e ee e e  ee e e     ee   e  ee 
e ee  e  eeee     eeee ee  e    e eeee
eeee ee  e  ee  e  ee   ee        ee e e  ee e e    ee  e  e   
e  e  eeee ee     e e   e  e        e
ee   ee   ee eeee        e ee  eee         eee ee eee  e e  
eeee e e  eeeee ee ee  e e   e    e  ee 
 ee ee     ee  e e     e e  e ee e ee  ee ee e  eeeeee e     
e ee    e        e  e e e      ee     e
      e eeeee  e        ee ee  ee e e e  e       ee ee  e e e  
e eee  eeee  ee  ee e ee e   ee     e
 e e  e  e  ee eeee e ee   eeeee   e   e e ee   ee eee  eee e 
    eeee eee   e  e   ee e     e  e e 
 e  e e  e ee eee   e eeee      ee ee   ee e ee   e  e  e e e 
    e     ee e e eeeee ee  e  e e eee 
  e e  ee ee  eeee    ee ee  e eee  e  ee e       e e ee   e 
eeeee e e eeee  ee ee   eeeee   e e    
eee   e ee ee  e e   ee  e   ee   ee eee e e e     e    ee  
  ee  eee  ee e      eeeeee   e    e   e

ee e oo ttt tt ttt e e e eeee e ee e ee e ee

e eee eeee e e e eee

eeee ee ee e ee ee ee e ee e e ee e e ee e ee

e ee e eeee eeee ee e e eeee

eeee ee e ee e ee ee ee e e ee e e ee e e

e e eeee ee e e e e e

ee ee ee eeee e ee eee eee ee eee e e

eeee e e eeeee ee ee e e e e ee

ee ee ee e e e e e ee e ee ee ee e eeeeee e

e ee e e e e e ee e

e eeeee e ee ee ee e e e e ee ee e e e

e eee eeee ee ee e ee e ee e

e e e e ee eeee e ee eeeee e e e ee ee eee eee e

eeee eee e e ee e e e e

e e e e ee eee e eeee ee ee ee e ee e e e e e

e ee e e eeeee ee e e e eee

e e ee ee eeee ee ee e eee e ee e e e ee e

eeeee e e eeee ee ee eeeee e e

eee e ee ee e e ee e ee ee eee e e e e ee

ee eee ee e eeeeee e e e

"The Horse and the Fox and and to the bound and and the sard 
to the boute
r and and andered than to to him to he pare.
"I a pang a bound the Fox, and shat the Lound the boun the 
sored and the sout the Lound and to ther the wint the boune 
and the sore to his his.
An he coned than the sand the sound and and apled the Hore 
ander the soon the sord his his to care an tor him t
he bound the bout."
A Fan a cane a biged, a pard an a piged to him him his the 
pound an and and tor and to the sore an that her wan to the 
bore and ander the sore.
The Louted and the porned than her was to the sare.
The Loun to he sain the bound the porned to to here to to 
his here and anded and and apling to the boune ander and 
the south the sore.
Then his her ase and the Laon to the bon that he cand 
the porned the berter.
The Ling the bon the pare an a pored the Hores the bene 
to his her as andered the Horser.
An that then the Loond to to he parder and then was his 
to his his he care the bou

"The Horse and the Fox and and to the bound and and the sard

to the boute

r and and andered than to to him to he pare.

"I a pang a bound the Fox, and shat the Lound the boun the

sored and the sout the Lound and to ther the wint the boune

and the sore to his his.

An he coned than the sand the sound and and apled the Hore

ander the soon the sord his his to care an tor him t

he bound the bout."

A Fan a cane a biged, a pard an a piged to him him his the

pound an and and tor and to the sore an that her wan to the

bore and ander the sore.

The Louted and the porned than her was to the sare.

The Loun to he sain the bound the porned to to here to to

his here and anded and and apling to the boune ander and

the south the sore.

Then his her ase and the Laon to the bon that he cand

the porned the berter.

The Ling the bon the pare an a pored the Hores the bene

to his her as andered the Horser.

An that then the Loond to to he parder and then was his

to his his he care the bou

Then he took his back in their tails, and gave his to a prace, 
and suppesed, and threw his bitter than hapeled that he would 
not injore the mirk where the wheels said:
"I would gave a ply on the road, and that I have nothor you 
can ot ereath this tise to put out that he right horring 
towards the right.
When he came herd on the brance of the Bod had to give him a 
pignt of the Fox wert string themselves began to carry him 
and he had his closs, and brew him to his mouth of his coutin, 
but soon stopped the Hart was successing them the Fox intited 
the Stork to the Lion, we tould having his to himself the Lag 
once decaired it into the Pitcher.
At last, and stop and did the time of his boy of the Frogs, 
lifting his horns and suck unto dis and played up to him peck 
of his break.
"I have a sholl borst with the reach of the roges of too ore 
that he was doint down to his mach, a smanl day which the hand 
of her hounds and soon had to give them with his not like to her 
some in the country.

Then he took his back in their tails, and gave his to a prace,

and suppesed, and threw his bitter than hapeled that he would

not injore the mirk where the wheels said:

"I would gave a ply on the road, and that I have nothor you

can ot ereath this tise to put out that he right horring

towards the right.

When he came herd on the brance of the Bod had to give him a

pignt of the Fox wert string themselves began to carry him

and he had his closs, and brew him to his mouth of his coutin,

but soon stopped the Hart was successing them the Fox intited

the Stork to the Lion, we tould having his to himself the Lag

once decaired it into the Pitcher.

At last, and stop and did the time of his boy of the Frogs,

lifting his horns and suck unto dis and played up to him peck

of his break.

"I have a sholl borst with the reach of the roges of too ore

that he was doint down to his mach, a smanl day which the hand

of her hounds and soon had to give them with his not like to her

some in the country.

The source code is available on my GitHub for anyone who wants to play with it. Please make sure you have a GPU with CUDA and CUDNN installed, otherwise it will take forever to train. The model parameters can be changed using command line arguments.

I also added a file in the git called n-gram_model.py that lets you try the same exercise with n-grams to compare how well the deep learning method does vs different n-gram sizes (both speed and accuracy).

Image recognition on the CIFAR-10 dataset using deep learning

CIFAR-10 is an established computer vision dataset used for image recognition. Its a subset of 80 million tiny images collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. This is the link to the website.

The CIFAR-10 dataset consists of 60,000 32×32 color images of 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images.

As a first post, I wanted to write a deep learning algorithm to identify images in the CIFAR-10 database. This topic has been very widely covered – in fact Google’s Tensorflow tutorials cover this very well. However, they do a few things that made it difficult for me to follow their code

They split the code across multiple files making it difficult to follow.
They use a binary version of the file and a file stream to feed Tensorflow.

I downloaded the python version of the data and loaded all the variables into memory. There is some image manipulation done in the tensorflow tutorial that I recreated in the numpy arrays directly and we will discuss it below.

Prerequisites for this tutorial:
Other than Python (obviously!)

numpy
pickle
sklearn
tensorflow

For TensorFlow I strongly recommend the GPU version if you have the set-up for it. The code takes 6 hours on my dual GTX Titan X machine and running it on a CPU will probably take days or weeks!
Assuming you have everything working, lets get started!

Start with our import statements

import numpy as np
import pickle
from sklearn import preprocessing
import random
import math
import os
from six.moves import urllib
import tarfile
import tensorflow as tf

import numpy as np

import pickle

from sklearn import preprocessing

import random

import math

import os

from six.moves import urllib

import tarfile

import tensorflow as tf

Declare some global variables we will use. In our code we are using GradientDescentOptimizer with learning rate decay. I have tested the same code with the AdamOptimizer. Adam runs faster but gives slightly worse results. If you do decide to use the AdamOptimizer, drop the learning rate to 0.0001. This is the link to the paper on Adam optimization.

NUM_FILE_BATCHES=5
LEARNING_RATE = 0.1
LEARNING_RATE_DECAY=0.1
NUM_GENS_TO_WAIT=250.0
TRAINING_ITERATIONS = 30000
DROPOUT = 0.5
BATCH_SIZE = 500
IMAGE_TO_DISPLAY = 10

NUM_FILE_BATCHES=5

LEARNING_RATE = 0.1

LEARNING_RATE_DECAY=0.1

NUM_GENS_TO_WAIT=250.0

TRAINING_ITERATIONS = 30000

DROPOUT = 0.5

BATCH_SIZE = 500

IMAGE_TO_DISPLAY = 10

Create data directory and download data if it doesn’t exist – this code will not run if we have already downloaded the data.

data_dir='data'
if not os.path.exists(data_dir):
  os.makedirs(data_dir)
cifar10_url='https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz'

data_file=os.path.join(data_dir, 'cifar-10-binary.tar.gz')
if os.path.isfile(data_file):
  pass
else:
  def progress(block_num, block_size, total_size):
    progress_info = [cifar10_url, float(block_num * block_size) / float(total_size) * 100.0]
    print('\r Downloading {} - {:.2f}%'.format(*progress_info), end="")
  filepath, _ = urllib.request.urlretrieve(cifar10_url, data_file, progress)
  tarfile.open(filepath, 'r:gz').extractall(data_dir)

data_dir='data'

if not os.path.exists(data_dir):

os.makedirs(data_dir)

cifar10_url='https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz'

data_file=os.path.join(data_dir, 'cifar-10-binary.tar.gz')

if os.path.isfile(data_file):

pass

else:

def progress(block_num, block_size, total_size):

progress_info = [cifar10_url, float(block_num * block_size) / float(total_size) * 100.0]

print('\r Downloading {} - {:.2f}%'.format(*progress_info), end="")

filepath, _ = urllib.request.urlretrieve(cifar10_url, data_file, progress)

tarfile.open(filepath, 'r:gz').extractall(data_dir)

Load data into numpy arrays. The code below loads the labels from the batches.meta file, and the training and test data. The training data is split across 5 files. We also one hot encode the labels.

with open('data/cifar-10-batches-py/batches.meta',mode='rb') as file:
  batch=pickle.load(file,encoding='latin1')
label_names=batch['label_names']


def load_cifar10data(filename):
  with open(filename,mode='rb') as file:
    batch=pickle.load(file,encoding='latin1')
    features=batch['data'].reshape((len(batch['data']),3,32,32)).transpose(0,2,3,1)
    labels=batch['labels']
    return features,labels

x_train=np.zeros(shape=(0,32,32,3))
train_labels=[]
for i in range(1,NUM_FILE_BATCHES+1):
  ft,lb=load_cifar10data('data/cifar-10-batches-py/data_batch_'+str(i))
  x_train=np.vstack((x_train,ft))
  train_labels.extend(lb)


unique_labels=list(set(train_labels))
lb=preprocessing.LabelBinarizer()
lb.fit(unique_labels)
y_train=lb.transform(train_labels)

x_test_data,test_labels=load_cifar10data('data/cifar-10-batches-py/test_batch')
y_test=lb.transform(test_labels)

with open('data/cifar-10-batches-py/batches.meta',mode='rb') as file:

batch=pickle.load(file,encoding='latin1')

label_names=batch['label_names']

def load_cifar10data(filename):

with open(filename,mode='rb') as file:

batch=pickle.load(file,encoding='latin1')

features=batch['data'].reshape((len(batch['data']),3,32,32)).transpose(0,2,3,1)

labels=batch['labels']

return features,labels

x_train=np.zeros(shape=(0,32,32,3))

train_labels=[]

for i in range(1,NUM_FILE_BATCHES+1):

ft,lb=load_cifar10data('data/cifar-10-batches-py/data_batch_'+str(i))

x_train=np.vstack((x_train,ft))

train_labels.extend(lb)

unique_labels=list(set(train_labels))

lb=preprocessing.LabelBinarizer()

lb.fit(unique_labels)

y_train=lb.transform(train_labels)

x_test_data,test_labels=load_cifar10data('data/cifar-10-batches-py/test_batch')

y_test=lb.transform(test_labels)

Having more training data can improve our algorithms. Since we are confined to 50,000 training images (5,000 for each category) we can “manufacture” more images using small image manipulations. We do 3 transformations – flip the image horizontally, randomly adjust the brightness and randomly adjust the contrast. We also normalize the data. Note that there are different ways to do this, but standardization works best for image. However rescaling can be an option as well.

def updateImage(x_train_data,distort=True):
  x_temp=x_train_data.copy()
  x_output=np.zeros(shape=(0,32,32,3))
  for i in range(0,x_temp.shape[0]):
    temp=x_temp[i]
    if distort:
      if random.random()>0.5:
        temp=np.fliplr(temp)
      brightness=random.randint(-63,63)
      temp=temp+brightness
      contrast=random.uniform(0.2,1.8)
      temp=temp*contrast
    mean=np.mean(temp)
    stddev=np.std(temp)
    temp=(temp-mean)/stddev
    temp=np.expand_dims(temp,axis=0)
    x_output=np.append(x_output,temp,axis=0)
  return x_output
#update test data since we don't have to apply distortions
#for training data for each batch we will randomly distory before normalizing
x_test=updateImage(x_test_data,False)

def updateImage(x_train_data,distort=True):

x_temp=x_train_data.copy()

x_output=np.zeros(shape=(0,32,32,3))

for i in range(0,x_temp.shape[0]):

temp=x_temp[i]

if distort:

if random.random()>0.5:

temp=np.fliplr(temp)

brightness=random.randint(-63,63)

temp=temp+brightness

contrast=random.uniform(0.2,1.8)

temp=temp*contrast

mean=np.mean(temp)

stddev=np.std(temp)

temp=(temp-mean)/stddev

temp=np.expand_dims(temp,axis=0)

x_output=np.append(x_output,temp,axis=0)

return x_output

#update test data since we don't have to apply distortions

#for training data for each batch we will randomly distory before normalizing

x_test=updateImage(x_test_data,False)

Now comes the fun part. This is what our network looks like.

Lets define the various layers of the network. The last line of code (logits=tf.identity(final_output,name=’logits’)) is done in case you want to view the model in TensorBoard.

def truncated_normal_var(name, shape, dtype):
  return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.truncated_normal_initializer(stddev=0.05)))
def zero_var(name, shape, dtype):
  return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.constant_initializer(0.0)))

x=tf.placeholder(tf.float32,shape=[None,x_train.shape[1],x_train.shape[2],x_train.shape[3]],name='x')
labels=tf.placeholder(tf.float32,shape=[None,y_train.shape[1]],name='labels')
keep_prob=tf.placeholder(tf.float32,name='keep_prob')

with tf.variable_scope('conv1') as scope:
  conv1_kernel=truncated_normal_var(name='conv1_kernel',shape=[5,5,3,64],dtype=tf.float32)
  strides=[1,1,1,1]
  conv1=tf.nn.conv2d(x,conv1_kernel,strides,padding='SAME')
  conv1_bias=zero_var(name='conv1_bias',shape=[64],dtype=tf.float32)
  conv1_add_bias=tf.nn.bias_add(conv1,conv1_bias)
  relu_conv1=tf.nn.relu(conv1_add_bias)

pool_size=[1,3,3,1]
strides=[1,2,2,1]
pool1=tf.nn.max_pool(relu_conv1,ksize=pool_size,strides=strides,padding='SAME',name='pool_layer1')
norm1=tf.nn.lrn(pool1,depth_radius=4,bias=1.0,alpha=0.001/9.0,beta=0.75,name='norm1')

with tf.variable_scope('conv2') as scope:
  conv2_kernel=truncated_normal_var(name='conv2_kernel',shape=[5,5,64,64],dtype=tf.float32)
  strides=[1,1,1,1]
  conv2=tf.nn.conv2d(norm1,conv2_kernel,strides,padding='SAME')
  conv2_bias=zero_var(name='conv2_bias',shape=[64],dtype=tf.float32)
  conv2_add_bias=tf.nn.bias_add(conv2,conv2_bias)
  relu_conv2=tf.nn.relu(conv2_add_bias)

pool_size=[1,3,3,1]
strides=[1,2,2,1]
pool2=tf.nn.max_pool(relu_conv2,ksize=pool_size,strides=strides,padding='SAME',name='pool_layer2')
norm2=tf.nn.lrn(pool2,depth_radius=4,bias=1.0,alpha=0.001/9.0,beta=0.75,name='norm2')

reshaped_output=tf.reshape(norm2, [-1, 8*8*64])
reshaped_dim=reshaped_output.get_shape()[1].value

with tf.variable_scope('full1') as scope:
  full_weight1=truncated_normal_var(name='full_mult1',shape=[reshaped_dim,1024],dtype=tf.float32)
  full_bias1=zero_var(name='full_bias1',shape=[1024],dtype=tf.float32)
  full_layer1=tf.nn.relu(tf.add(tf.matmul(reshaped_output,full_weight1),full_bias1))
  full_layer1=tf.nn.dropout(full_layer1,keep_prob)

with tf.variable_scope('full2') as scope:
  full_weight2=truncated_normal_var(name='full_mult2',shape=[1024, 256],dtype=tf.float32)
  full_bias2=zero_var(name='full_bias2',shape=[256],dtype=tf.float32)
  full_layer2=tf.nn.relu(tf.add(tf.matmul(full_layer1,full_weight2),full_bias2))
  full_layer2=tf.nn.dropout(full_layer2,keep_prob)

with tf.variable_scope('full3') as scope:
  full_weight3=truncated_normal_var(name='full_mult3',shape=[256,IMAGE_TO_DISPLAY],dtype=tf.float32)
  full_bias3=zero_var(name='full_bias3',shape=[IMAGE_TO_DISPLAY],dtype=tf.float32)
  final_output=tf.add(tf.matmul(full_layer2,full_weight3),full_bias3,name='final_output')

logits=tf.identity(final_output,name='logits')

def truncated_normal_var(name, shape, dtype):

return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.truncated_normal_initializer(stddev=0.05)))

def zero_var(name, shape, dtype):

return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.constant_initializer(0.0)))

x=tf.placeholder(tf.float32,shape=[None,x_train.shape[1],x_train.shape[2],x_train.shape[3]],name='x')

labels=tf.placeholder(tf.float32,shape=[None,y_train.shape[1]],name='labels')

keep_prob=tf.placeholder(tf.float32,name='keep_prob')

with tf.variable_scope('conv1') as scope:

conv1_kernel=truncated_normal_var(name='conv1_kernel',shape=[5,5,3,64],dtype=tf.float32)

strides=[1,1,1,1]

conv1=tf.nn.conv2d(x,conv1_kernel,strides,padding='SAME')

conv1_bias=zero_var(name='conv1_bias',shape=[64],dtype=tf.float32)

conv1_add_bias=tf.nn.bias_add(conv1,conv1_bias)

relu_conv1=tf.nn.relu(conv1_add_bias)

pool_size=[1,3,3,1]

strides=[1,2,2,1]

pool1=tf.nn.max_pool(relu_conv1,ksize=pool_size,strides=strides,padding='SAME',name='pool_layer1')

norm1=tf.nn.lrn(pool1,depth_radius=4,bias=1.0,alpha=0.001/9.0,beta=0.75,name='norm1')

with tf.variable_scope('conv2') as scope:

conv2_kernel=truncated_normal_var(name='conv2_kernel',shape=[5,5,64,64],dtype=tf.float32)

strides=[1,1,1,1]

conv2=tf.nn.conv2d(norm1,conv2_kernel,strides,padding='SAME')

conv2_bias=zero_var(name='conv2_bias',shape=[64],dtype=tf.float32)

conv2_add_bias=tf.nn.bias_add(conv2,conv2_bias)

relu_conv2=tf.nn.relu(conv2_add_bias)

pool_size=[1,3,3,1]

strides=[1,2,2,1]

pool2=tf.nn.max_pool(relu_conv2,ksize=pool_size,strides=strides,padding='SAME',name='pool_layer2')

norm2=tf.nn.lrn(pool2,depth_radius=4,bias=1.0,alpha=0.001/9.0,beta=0.75,name='norm2')

reshaped_output=tf.reshape(norm2, [-1, 8*8*64])

reshaped_dim=reshaped_output.get_shape()[1].value

with tf.variable_scope('full1') as scope:

full_weight1=truncated_normal_var(name='full_mult1',shape=[reshaped_dim,1024],dtype=tf.float32)

full_bias1=zero_var(name='full_bias1',shape=[1024],dtype=tf.float32)

full_layer1=tf.nn.relu(tf.add(tf.matmul(reshaped_output,full_weight1),full_bias1))

full_layer1=tf.nn.dropout(full_layer1,keep_prob)

with tf.variable_scope('full2') as scope:

full_weight2=truncated_normal_var(name='full_mult2',shape=[1024, 256],dtype=tf.float32)

full_bias2=zero_var(name='full_bias2',shape=[256],dtype=tf.float32)

full_layer2=tf.nn.relu(tf.add(tf.matmul(full_layer1,full_weight2),full_bias2))

full_layer2=tf.nn.dropout(full_layer2,keep_prob)

with tf.variable_scope('full3') as scope:

full_weight3=truncated_normal_var(name='full_mult3',shape=[256,IMAGE_TO_DISPLAY],dtype=tf.float32)

full_bias3=zero_var(name='full_bias3',shape=[IMAGE_TO_DISPLAY],dtype=tf.float32)

final_output=tf.add(tf.matmul(full_layer2,full_weight3),full_bias3,name='final_output')

logits=tf.identity(final_output,name='logits')

Now we define our cross entropy and optimization function. If you want to use the AdamOptomizer, uncomment that line, comment the generation_run, model_learning_rate and train_step lines and adjust the learning rate to something lower like 0.0001. Otherwise the model will not converge.

cross_entropy=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=labels),name='cross_entropy')
#train_step=tf.train.AdamOptimizer(LEARNING_RATE).minimize(cross_entropy)
generation_run = tf.Variable(0, trainable=False,name='generation_run')
model_learning_rate=tf.train.exponential_decay(LEARNING_RATE,generation_run,NUM_GENS_TO_WAIT,LEARNING_RATE_DECAY,staircase=True,name='model_learning_rate')
train_step=tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(cross_entropy)
correct_prediction=tf.equal(tf.argmax(final_output,1),tf.argmax(labels,1),name='correct_prediction')
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32),name='accuracy')

cross_entropy=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=labels),name='cross_entropy')

#train_step=tf.train.AdamOptimizer(LEARNING_RATE).minimize(cross_entropy)

generation_run = tf.Variable(0, trainable=False,name='generation_run')

model_learning_rate=tf.train.exponential_decay(LEARNING_RATE,generation_run,NUM_GENS_TO_WAIT,LEARNING_RATE_DECAY,staircase=True,name='model_learning_rate')

train_step=tf.train.GradientDescentOptimizer(LEARNING_RATE).minimize(cross_entropy)

correct_prediction=tf.equal(tf.argmax(final_output,1),tf.argmax(labels,1),name='correct_prediction')

accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32),name='accuracy')

Now we define some functions to run through our batch. For large networks memory tends to be a big constraint. We run through our training data in batches. One epoch is one run through our complete training set (in multiple batches). After each epoch we randomly shuffle our data. This helps improve how our algorithm learns. We run through each batch of data and train our algorithm. We also check for accuracy every 1st, 2nd,…,10th, 20th,…, 100th,… step. Lastly we calculate the final accuracy of the model and save it so we can use the calculated weights on test data without having to re-run it.

epochs_completed=0
index_in_epoch = 0
num_examples=x_train.shape[0]

init=tf.global_variables_initializer()
with tf.Session() as sess:
  sess.run(init)

  def next_batch(batch_size):
    global x_train
    global y_train
    global index_in_epoch
    global epochs_completed
    start = index_in_epoch
    index_in_epoch += batch_size

    if index_in_epoch > num_examples:
      # finished epoch
      epochs_completed += 1
      # shuffle the data
      perm = np.arange(num_examples)
      np.random.shuffle(perm)
      x_train=x_train[perm]
      y_train=y_train[perm]
      # start next epoch
      start = 0
      index_in_epoch = batch_size
      assert batch_size <= num_examples end = index_in_epoch #return x_train[start:end], y_train[start:end] x_output=updateImage(x_train[start:end],True) return x_output,y_train[start:end] # visualisation variables train_accuracies = [] validation_accuracies = [] x_range = [] display_step=1 for i in range(TRAINING_ITERATIONS): #get new batch batch_xs, batch_ys = next_batch(BATCH_SIZE) # check progress on every 1st,2nd,...,10th,20th,...,100th... step if i%display_step == 0 or (i+1) == TRAINING_ITERATIONS: train_accuracy = accuracy.eval(feed_dict={x:batch_xs,labels: batch_ys,keep_prob: 1.0}) validation_accuracy=0.0 for j in range(0,x_test.shape[0]//BATCH_SIZE): validation_accuracy+=accuracy.eval(feed_dict={ x: x_test[j*BATCH_SIZE : (j+1)*BATCH_SIZE],labels: y_test[j*BATCH_SIZE : (j+1)*BATCH_SIZE],keep_prob: 1.0}) validation_accuracy/=(j+1.0) print('training_accuracy / validation_accuracy => %.2f / %.2f for step %d'%(train_accuracy, validation_accuracy, i))
      validation_accuracies.append(validation_accuracy)
      train_accuracies.append(train_accuracy)
      x_range.append(i)
      # increase display_step
      if i%(display_step*10) == 0 and i:
        display_step *= 10
    # train on batch
    sess.run(train_step, feed_dict={x: batch_xs, labels: batch_ys, keep_prob: DROPOUT})

  validation_accuracy=0.0
  for j in range(0,x_test.shape[0]//BATCH_SIZE):
    validation_accuracy+=accuracy.eval(feed_dict={ x: x_test[j*BATCH_SIZE : (j+1)*BATCH_SIZE],labels: y_test[j*BATCH_SIZE : (j+1)*BATCH_SIZE],keep_prob: 1.0})
  validation_accuracy/=(j+1.0)
  print('validation_accuracy => %.4f'%validation_accuracy)
  saver=tf.train.Saver()
  save_path=saver.save(sess,'./CIFAR10_model')
  sess.close()

epochs_completed=0

index_in_epoch = 0

num_examples=x_train.shape[0]

init=tf.global_variables_initializer()

with tf.Session() as sess:

sess.run(init)

def next_batch(batch_size):

global x_train

global y_train

global index_in_epoch

global epochs_completed

start = index_in_epoch

index_in_epoch += batch_size

if index_in_epoch > num_examples:

# finished epoch

epochs_completed += 1

# shuffle the data

perm = np.arange(num_examples)

np.random.shuffle(perm)

x_train=x_train[perm]

y_train=y_train[perm]

# start next epoch

start = 0

index_in_epoch = batch_size

assert batch_size <= num_examples end = index_in_epoch #return x_train[start:end], y_train[start:end] x_output=updateImage(x_train[start:end],True) return x_output,y_train[start:end] # visualisation variables train_accuracies = [] validation_accuracies = [] x_range = [] display_step=1 for i in range(TRAINING_ITERATIONS): #get new batch batch_xs, batch_ys = next_batch(BATCH_SIZE) # check progress on every 1st,2nd,...,10th,20th,...,100th... step if i%display_step == 0 or (i+1) == TRAINING_ITERATIONS: train_accuracy = accuracy.eval(feed_dict={x:batch_xs,labels: batch_ys,keep_prob: 1.0}) validation_accuracy=0.0 for j in range(0,x_test.shape[0]//BATCH_SIZE): validation_accuracy+=accuracy.eval(feed_dict={ x: x_test[j*BATCH_SIZE : (j+1)*BATCH_SIZE],labels: y_test[j*BATCH_SIZE : (j+1)*BATCH_SIZE],keep_prob: 1.0}) validation_accuracy/=(j+1.0) print('training_accuracy / validation_accuracy => %.2f / %.2f for step %d'%(train_accuracy, validation_accuracy, i))

validation_accuracies.append(validation_accuracy)

train_accuracies.append(train_accuracy)

x_range.append(i)

# increase display_step

if i%(display_step*10) == 0 and i:

display_step *= 10

# train on batch

sess.run(train_step, feed_dict={x: batch_xs, labels: batch_ys, keep_prob: DROPOUT})

validation_accuracy=0.0

for j in range(0,x_test.shape[0]//BATCH_SIZE):

validation_accuracy+=accuracy.eval(feed_dict={ x: x_test[j*BATCH_SIZE : (j+1)*BATCH_SIZE],labels: y_test[j*BATCH_SIZE : (j+1)*BATCH_SIZE],keep_prob: 1.0})

validation_accuracy/=(j+1.0)

print('validation_accuracy => %.4f'%validation_accuracy)

saver=tf.train.Saver()

save_path=saver.save(sess,'./CIFAR10_model')

sess.close()

The model gives around 81% accuracy on the test set. I have an iPython notebook on my GitHub site that lets you load the saved model and run it on random samples on the test set. It outputs the image vs the softmax probabilities of the top n predictions.

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30