Experimenting with Javascript

I recently did Stanford’s CS221 course (Artificial Intelligence: Principles and Techniques). The course was extremely fast paced but extremely rewarding.

We coded in python but I wanted to try and demo some techniques on my blog so I decided to try and learn Javascript. When I learn a new language my first project is usually plotting the Mandelbrot Set. This is my first attempt at Javascript (and embedding code into WordPress). If it works I plan to display some interesting problems like a Sudoku Solver and a Rubik’s cube solver.

Clustering News Articles

I subscribe to multiple publications including Wall Street Journal, Financial Times, CNN, MSNBC, etc. The problem with having so much information at your fingertips is that I usually go to the front page of each website and follow a few links. I almost always miss relevant articles because the links are usually not easy to follow. I wanted to aggregate information so that articles on related topics were grouped together. For that I had to

  • Scan and download all new articles on each site.
  • Cluster the articles automatically based on keywords.

I wrote a webcrawler in Python that indexes each site and pulls in new articles I have not downloaded before. There is plenty of literature out there on scraping websites. The book I found useful was Web Scraping with Python (sold on Amazon).

Once we have the articles we preprocess the data by

  • Converting everything to lowercase
  • Add part of speech tagging for each word. This classifies the word as a Noun, Verb, Adverb, etc.
  • Lemmatize the word, i.e. group together inflected forms of a word into a single item. For example
am, are, is ⇒ be
having, have, had ⇒have
car, cars ⇒car
  • Run TfIdf on the lemmatized documents. TfIdf stands for Term Frequency, Inverse Document Frequency. In a nutshell, a word is more important (and has higher weight) if it occurs more often in a document (the Term Frequency part), except if it occurs commonly in other documents in our corpus (the Inverse Document Frequency part).
  • TfIdf gives us a matrix of important words. We can then cluster the words using a common clustering algorithm like KMeans.

Here are what some of the clusters look like for all the WSJ articles from 30th April 2017 (along with the first 10 article titles in each cluster).

states, plan, house, cost, governments

  • Fugitive Mexican Ex-Governor Tomás Yarrington Had State Security While on the Run
  • Australia Considers Cross-Continent Pipeline to Beat Gas Shortages
  • GOP Health-Care Push Falls Short Again
  • America’s Most Anti-Reform Institution? The Media
  • Fugitive Mexican Governor Arrested in Guatemala
  • Pentagon Investigates Whether Army Rangers in Afghanistan Were Killed by Friendly Fire
  • New Plan, Same Hurdle in GOP’s Quest to Gut Obamacare
  • The Resurgent Threat of al Qaeda
  • Saudi Arabia Reinstates Perks for State Employees as Finances Improve
  • Trump Unveils Broad Tax-Cut Plan

growth, economy, economic, quarter, rose

  • Mexican Economy Maintains Growth in First Quarter
  • Outlook for Kenyan Economy Dimmed by Severe Drought
  • Economic Growth Lags Behind Rising Confidence Data
  • Economists See Growth Climbing in 2017, 2018, Then Dissipating
  • Economy Needs Consumers to Shop Again
  • U.K. Economy Slows Sharply Ahead of Election, Brexit Talks
  • From Diapers to Soda, Big Brands Feel Pinch as Consumers Pull Back
  • Stars Align for Emmanuel Macron—and France
  • Consumer Sentiment Remains High Despite GDP Report
  • South Korea’s Economy Grew 0.9% in First Quarter

trump, u.s., officials, president, administration

  • Pentagon Opens Probe Into Michael Flynn’s Foreign Payments
  • Two U.S. Service Members Killed in Afghanistan
  • Immigrant Crackdown Worries Food and Construction Industries
  • U.S. Launches Cruise Missiles at Syrian Air Base in Response to Chemical Attack
  • At NRA Meeting, Trump Warns of Challengers in 2020
  • Trump Backers in Phoenix Region Are Fine With His Learning Curve
  • Relative of Imprisoned Iranian-Americans Appeals to Trump for Help
  • Trump Issues New Warning to North Korea
  • U.S. Presses China on North Korea After Failed Missile Test
  • Trump’s Bipartisan War Coalition

share, billion, quarter, sales, company

  • What’s Keeping GM Going Strong Probably Won’t Last
  • Cardinal Health’s $6.1 Billion Deal for Some Medtronic Operations Raises Debt Concerns
  • UnitedHealth Profits Rise as it Exits Health-Care Exchanges
  • Lockheed Martin Hit By Middle East Charges
  • Johnson & Johnson Lifts Forecast on Actelion Tie-Up
  • Alphabet and Amazon Extend an Earnings Boom
  • How an ETF Gold Rush Rattled Mining Stocks
  • Ski-Park Operator Intrawest to Go Private in Latest Resort Deal
  • S&P’s Warning: Here Are 10 Public Retailers Most in Danger of Default
  • Advertising’s Biggest Threat Isn’t Digital Disruption

u.s., trade, company, trump, billion

  • Mexico Registers Small Trade Deficit in March
  • Coming to America: How Immigration Policy Has Changed the U.S.
  • Boeing Files Petition With Commerce Dept. Over Bombardier
  • Trump Administration Mulls More Trade Actions, Commerce Secretary Says
  • U.S. Hoteliers Go on Charm Offensive Amid Concerns Over Trump Policies
  • Today’s Top Supply Chain and Logistics News From WSJ
  • Dear Canada: It’s Not Personal, It’s Just Trade
  • Today’s Top Supply Chain and Logistics News From WSJ
  • Venezuela Creditor Seeks Asset Freeze on U.S. Refiner Citgo
  • Seoul Plays Down Possibility of Pre-Emptive U.S. Strike on North Korea


In the previous post we discussed using Convolutional Neural Networks for Image Recognition. In order to better understand what the algorithms are doing we need to explore convolutions in more detail.


A convolution of f and g, denoted f*g, is the integral of the product of 2 functions after one is reversed and shifted. Mathematically

    \[ (f \ast g)(t) = \int_{-\infty}^{\infty} f(\tau)g(t-\tau)d\tau = \int_{-\infty}^{\infty} f(t-\tau)g(\tau)d\tau \]

One way of thinking about convolutions is an integral that expresses the amount of overlap of g as it is shifted over f. It “blends” one function with another. The below image courtesy of Wikipedia demonstrates this.
Some more intuitive explanations of convolutions (and more applicable to machine learning):

Think about a time series. Say we look at the fed funds rate over time.One way to smooth data is to convolve it against a smaller list. For example, to calculate the weekly moving average we convolve it against [1,1,1,1,1]. For a monthly moving average we would convolve it with a list of 25 ones (assuming 25 business days in a month).

As another example, consider rolling 2 six sided dice. The probability distribution looks as follows.This can be represented as a convolution… For example to get a 4 we have

So what does the “Convolution” in Convolutional Neural Networks (aka Deep Learning) do to images? Convolutions (called Kernels in image processing) work as feature detectors. Lets look at a few convolutions on 2 images below.

 Convolution  Matrix  Image 1 Image 2
 Original Image
 Edge Detection 1      
 Edge Detection 2      
 Edge Detection 3      
 Edge Detection 4      
 Box Blur      
 Gaussian Blur      
Gradient Detection 1      
Gradient Detection 2      

The amazing part about Deep Learning is we don’t have to teach the algorithm the convolutions to use. It learns them as part of the hyperparameter tuning!

Image recognition on the CIFAR-10 dataset using deep learning

CIFAR-10 is an established computer vision dataset used for image recognition. Its a subset of 80 million tiny images collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. This is the link to the website.

The CIFAR-10 dataset consists of 60,000 32×32 color images of 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images.

As a first post, I wanted to write a deep learning algorithm to identify images in the CIFAR-10 database. This topic has been very widely covered – in fact Google’s Tensorflow tutorials cover this very well. However, they do a few things that made it difficult for me to follow their code

  1. They split the code across multiple files making it difficult to follow.
  2. They use a binary version of the file and a file stream to feed Tensorflow.

I downloaded the python version of the data and loaded all the variables into memory. There is some image manipulation done in the tensorflow tutorial that I recreated in the numpy arrays directly and we will discuss it below.

Prerequisites for this tutorial:
Other than Python (obviously!)

  • numpy
  • pickle
  • sklearn
  • tensorflow

For TensorFlow I strongly recommend the GPU version if you have the set-up for it. The code takes 6 hours on my dual GTX Titan X machine and running it on a CPU will probably take days or weeks!
Assuming you have everything working, lets get started!

Start with our import statements

Declare some global variables we will use. In our code we are using GradientDescentOptimizer with learning rate decay. I have tested the same code with the AdamOptimizer. Adam runs faster but gives slightly worse results. If you do decide to use the AdamOptimizer, drop the learning rate to 0.0001. This is the link to the paper on Adam optimization.

Create data directory and download data if it doesn’t exist – this code will not run if we have already downloaded the data.

Load data into numpy arrays. The code below loads the labels from the batches.meta file, and the training and test data. The training data is split across 5 files. We also one hot encode the labels.

Having more training data can improve our algorithms. Since we are confined to 50,000 training images (5,000 for each category) we can “manufacture” more images using small image manipulations. We do 3 transformations – flip the image horizontally, randomly adjust the brightness and randomly adjust the contrast. We also normalize the data. Note that there are different ways to do this, but standardization works best for image. However rescaling can be an option as well.

Now comes the fun part. This is what our network looks like.

Lets define the various layers of the network. The last line of code (logits=tf.identity(final_output,name=’logits’)) is done in case you want to view the model in TensorBoard.

Now we define our cross entropy and optimization function. If you want to use the AdamOptomizer, uncomment that line, comment the generation_run, model_learning_rate and train_step lines and adjust the learning rate to something lower like 0.0001. Otherwise the model will not converge.

Now we define some functions to run through our batch. For large networks memory tends to be a big constraint. We run through our training data in batches. One epoch is one run through our complete training set (in multiple batches). After each epoch we randomly shuffle our data. This helps improve how our algorithm learns. We run through each batch of data and train our algorithm. We also check for accuracy every 1st, 2nd,…,10th, 20th,…, 100th,… step. Lastly we calculate the final accuracy of the model and save it so we can use the calculated weights on test data without having to re-run it.

The model gives around 81% accuracy on the test set. I have an iPython notebook on my GitHub site that lets you load the saved model and run it on random samples on the test set. It outputs the image vs the softmax probabilities of the top n predictions.