February 2018 – rohit apte

Sentiment Analysis of movie reviews part 2 (Convolutional Neural Networks)

In a previous post I looked at sentiment analysis of movie reviews using a Deep Neural Network. That involved using pretrained vectors (GLOVE in our case) as a bag of words and fine tuning them for our task.

We will try a different approach to the same problem – using Convolutional Neural Networks (aka Deep Learning). We will take the idea from the image recognition blog and apply it to text classification. The idea is to

Vectorize at a character level, using just the characters in our text. We don’t use any pretrained vectors for word embeddings.
Apply multiple convolutional and max pooling layers to the data.
Generate a final output layer with softmax
We’re assuming the Convolutional Neural Network will automatically detect the relationship between characters (pooling them into words and further understanding the relationships between words).

Our input data is just vectorizing each character. We take all the unique characters in our data, and the maximum sentence length and transform our input data into maximum_sentence_length X character_count for each sentence. For sentences with less than the maximum_length, we pad the remaining rows with zeros.

I used 2 1-Dimensional convolutional layers with filter size=3, stride=1 and hidden size=64 and relu for the non-linear activation (see the Image Recognition blog for an explanation on this). I also added a pooling layer of size 3 after each convolution.

Finally, I used 2 fully connected layers of sizes 1024 and 256 dropout probability of 0.5 (that should help prevent over fitting. The final layer uses a softmax to generate the output probabilities and we the standard cross entropy function for the loss. The learning is optimized using the Adam optimizer.

Overall the results are very close to the deep neural network. We get 59.2% using CNNs vs 62%. I think the accuracy is the maximum information we can extract from this data. What’s interesting is we used 2 completely different approaches – pretrained word vectors in the Neural Network case, and character level vectors in this Deep Learning case and we got similar results.

Next post we will explore using LSTMs on the same problem.

Source code available on request.

Evaluation of Machine Learning Trading Strategies Using Recurrent Reinforcement Learning

A few months ago I did the Stanford CS221 course (Introduction to AI). The course was intense, covering a lot of advanced material. For the final project I worked with 2 teammates (Tesa Ho and Albert Lau) on evaluating Machine Learning Strategies using Recurrent Reinforcement Learning. This is our final project submission.

Introduction

There have been several studies that propose using Recurrent Reinforcement Learning to
design profitable trading systems over longer time horizons [see Moody , David and Molina ].
A common practice at trading shops today is to develop a Supervised Learning classification
algorithm to predict whether or not there will be a move of +/- X bps in the next t time period.
Depending on the trading strategy, the model selection may be based on maximizing the
Precision, Accuracy, a mixture of both i.e. F-score, or a measure of profit (i.e. Sharpe Ratio). In
the case of the latter, the parameters of the trading strategy must also be optimized which often
requires brute force.
The direct reinforcement approach, on the other hand, differs from dynamic programming and
reinforcement algorithms such as TD-learning and Q-learning, which attempt to estimate a value
function for the control problem. For finance in particular, the presence of large amounts noise
and non stationarity in the datasets can cause severe problems for the value function approach.
The RRL direct reinforcement framework enables a simpler problem representation, avoids
Bellman’s curse of dimensionality and offers compelling advantages in efficiency.
This project will apply the Recurrent Reinforcement Learning methodology to intraday trading on
the Hong Kong futures exchange specifically the Hang Seng futures. A gradient ascent of the
Sortino Ratio (or Downside Deviation Ratio) was used to calculate the optimized weights to
determine the trade signal. The results indicate that profitability is dependent on the maximum
position allowed (the variable μ). We also develop a trading strategy using the Reinforcement
Learning framework to adapt predictions from a Supervised Learning algorithm and compare
the results to the Recurrent Reinforcement Learning results.

Model and Approach

Our model is based on the work of Molina and Moody. We use Recurrent Reinforcement
Learning to maximize the Sharpe Ratio or Sortino Ratio for a financial asset (Hang Seng
Futures in our case) over a selected training period, then apply the optimized weight parameter
to a test period. The trades and profitability are saved and the process of training and testing is
repeated for all data.
This report is based on 5 minute open, high, low, close prices for the Hang Seng front-month
futures from November 1, 2016 to August 31, 2017. The close price was used as the price
array, p_x , and was the basis of the log normal returns, r_t .

$r_t = ln \frac{p_t}{p_{t-1}}$

Other variables used in the model are:
M = the window size of returns used in the recurrent reinforcement learning
N = number of iterations for the RL algo
μ = max position size
δ = transaction costs in bps per trade
numTrainDays = the number of training days used
numTestDays = the number of test days used
The trader is assumed to take only long, neutral, or short positions with a maximum position of
magnitude mu. The position F_t is established or maintained at the end of each time interval t
and is reassessed at the end of period t+1. Where Moody used a trader function of:

$F_t=tanh(w^Tx_t) \in \{1,0,-1\}$

We opt to use a risk adjusted trader function of:

$F_t=tanh(w^Tx_t) \in \{-1,1\}$

where

$x_t=[1,t_{t-M},\dots,r_t,F_{t-1}]$

The trade cost, δ , associated with each trade is assumed to occur on the closing price at the
end of each time period t. A non-zero trading cost in bps is used to account for slippage, bid
ask spread, and associated trading fees.
The trade return, R_t , is defined as the return obtained from trading:

$R_t=\mu (F_{t-1}r_t -\delta \vert F_t-F_{t-1} \vert)$

where
μ = maximum number of shares per transaction
δ = transaction cost in bps
The reward function that is traditionally used to compare trading strategies is the Sharpe Ratio.
The Sharpe Ratio takes the average of the trade returns divided by the standard deviation of the
trade returns. This penalizes strategies with large variance in returns.

$Sharpe Ratio=\frac{Avg R_t}{Std R_t}$

However, variance in positive returns is acceptable so the Sortino Ratio, or Downside Deviation
Ratio, is a much more accurate measure of a strategy. The Sortino Ratio penalizes large
variations in negative.

$Sortino Ratio=\frac{Avg R_t}{Std R_{t<0}}$

The reinforcement learning algorithm adjusts the parameters of the system to maximize the
expected reward function. It can also be expressed as a function of profit or wealth, U(W_T) , or
in our case, a function of the sequence of trading returns, U(R₁ , R₂ , …, R_T). Given the trading
system F_t(θ) , we can then adjust the parameters θ to maximize U_T . The optimized variable is
θ , an array of weights applied to the log normal price returns r__t−M , …, r_t .

$\theta = \{w_1,\dots,w_M \}$

The gradient with respect to θ is:

$\frac{d U_T(\theta)}{d \theta} =\sum_{t=1}^T \frac{d U_t}{dR_t} \{ \frac{dR_t}{dF_t} \frac{dF_1}{d \theta} + \frac{d R_t}{d F_{t-1}} \frac{dF_{t-1}}{d \theta} \}$

where

$\frac{dR_t}{dF_t}=-1 \mu \delta sign(F_t-F_{t-1})$

$\frac{dR_t}{dF_{t-1}}=\mu r_t + \mu \delta sign(F_t-F_{t-1})$

$\frac{dF_t}{d \theta}=(1-tanh(x_t \cdot \theta)^2)(x_t+w_M \cdot dF_{t-1})$

$\frac{dU_t}{dR_t}=\frac{(Avg R_t^2-Avg R_t)R_t}{\sqrt{\frac{T}{T-1}} (Avg R_t^2-Avg(R_t)^2)^1.5}$

We can then maximize the Sharpe Ratio using Gradient Ascent or Stochastic Gradient Ascent to find the optimal weights for θ .

$\theta_t=\theta_{t-1} + \eta \frac{dU_r}{d \theta}$

where
η = learning rate

Training and Testing Procedure

The overall algorithm utilized a rolling training and test period of 30 and 10 days.

Training period 1-30 days, test period 30-40 days
Run recurrent learning algorithm to maximize the Sortino Ratio by optimizing θ over the training period
Apply the optimized θ to the test period and evaluate the trades and pnl
Update the training period to 10-40 days, test period 40-50 days and repeat the process

Overall analysis was run over all the test periods with the positions and trades priced at the closing price for each time period, t. Cumulative pnl was used to evaluate the trading strategies rather than than returns since geometric cumulative returns were skewed by negative and close to zero return periods.

Evaluation and Error Analysis

For the base case we have extracted features using the market data (order book depth, cancellations, trades, etc.) and run a random forest algorithm to classify +/- 10 point moves in the future over 60 second horizons. Probabilities were generated for -1, 0, +1 classes and the largest probability determined the predicted class. The predicted signal of -1, 0, +1 was then passed to the recurrent reinforcement algorithm to determine what the risk adjusted pnl would be.
For the oracle, we pass the actual target signals to the recurrent reinforcement learning algorithm to see what the maximum trading pnl would be.
We want to see if the recurrent reinforcement learning algorithm can generate better results.
We will additionally explore if we can use LSTMs to predict the next price and see if they perform better than our base case. We will also explore adding the predicted log return into our Reinforcement learning model as an additional parameter to compare results.

1. Oracle

The Oracle cumulative pnl over a 144 day test period is 17.6mm hkd. The average daily pnl is 122,321 hkd per day. The annualized Sharpe and Sortino ratio is 30.77 with an average of 25 trades a day.

2. Recurrent Reinforcement Learning

The recurrent reinforcement learning cumulative pnl is 1.35 mm hkd with an average daily pnl is 9,476 hkd per day. The annualized Sharpe Ratio is 2.28 and Sortino Ratio is 5.08. The average number of trades per day is 106 with 8 contracts per trade.

3. Supervised Learning

The supervised learning cumulative pnl is -226k hkd with an average daily pnl is -1568 hkd per day. The supervised learning model did not fair well with this trading strategy and had a Sharpe Ratio of -0.63 and Sortino Ratio of -0.87. The supervised learning algorithm traded much more frequently on average of 1,379 times a day with an average of 8.74 contracts per trade.

4. LSTM for Prediction

We also explored using LSTMs to predict +/- 10 point moves in the future over 60 second horizons. We used 30 consecutive price points (i.e. 30 minutes of trading data) to generate probabilities for (-1, 0 and +1).
One of the challenges we faced is the dataset is highly unbalanced, with approximately 94% of the cases being 0 (i.e. less than 10-tick move) and just 3% of the cases each being -1 (-10 tick move) or +1 (+10 tick move). Initially the LSTM was just calculating all items as 0 and getting a low error rate. We had to adjust our cross_entropy function to factor in the weights of the distribution which forced it try and classify the -1 and +1 more correctly.
We used a 1 layer LSTM with 64 hidden cells and a dropout of 0.2. Over the results were not great, but slightly better than the RandomForest.
The cumulative pnl is +203.65k hkd with an average daily pnl is +1414 hkd per day. The model had a Sharpe Ratio of 0.71 and Sortino Ratio of 1.07. The algorithm traded less frequently than supervised learning (on average of 104 times a day) but traded larger contracts. This makes sense since it’s classifying only a small percentage of the +/- 1 correctly.

5. Variable Sensitivity Analysis

A sensitivity analysis was run to the following variables: M, μ, numTrainingDays, and N.

The selection of the right window period, M, is very important to the accumulation of netPnl. While almost all of the normalized netPnl ends at the same value, M=10 is the only value that has a stable increasing netPnl. M=20 and M=5 all have negative periods and M=50 has a significant amount of variance.

There is relatively little effect in normalized netPnl by adjusting mu. mu=1 has a slight drop in normalized pnl but still follows the same path as the other iterations.

The number of training days has a limited effect on normalized netPnl since the shape of the pnl is roughly the same for each simulation. The starting point difference indicates that the starting month may or may not have been good months for trading. In particular, trading in January 2017 looked to be positive while the month of December was slightly negative.

Likewise, the number of iterations, N, has very little effect on normalized netPnl. The normalized netPnl is virtually identical for all levels of N.

Conclusions

Recurrent Reinforcement Learning (RRL) shows promise in trading financial markets. While it lacks behind the oracle, this has significant improvements over the current business standards with the use of supervised learning. While the RRL approach is sensitive to the choice the window size, it is plausible to note its limited business adoption to-date possibly for the below reasons.

We have tested the algorithms on 144 days of data. We need to validate the test on a
larger set of historical data.
Trade price assumption is based on closing price and not next period open price. In live markets there would be some slippage from the time a signal was generated to when a trade was executed.
We assume our execution costs are static. For larger trades, there would be more
slippage. We are also not making any assumptions about trading margin, both for new trades and drawdowns. In live trading these factors would affect position sizing and trades.
Incorporation of supervised prediction did not include positional risk adjustment
Gradient calculation for Sortino Ratio
Since 2008 markets have largely been in a low volatility regime. We need to test this algorithm under stressed markets to ensure it performs as expected and that drawdowns are reasonable.
In the past 12 months the market has seen some strong directional themes. A simple quantitative momentum strategy would likely yield similar results.
Further work has to be done to determine if RRL algorithms can outperform well established quantitative strategies.

Sentiment Analysis of movie reviews part 1 (Neural Network)

I’ve always been fascinated with Natural Language Processing and finally have a few tools under my belt to tackle this in a meaningful way. There is an old competition on Kaggle for sentiment analysis on movie reviews. The link to the competition can be found here.

As per the Kaggle website – the dataset consists of tab-separated files with phrases from Rotten Tomatoes. Each sentence has been parsed into many phrases by the Stanford parser. Our job is to learn on the test data and make a submission on the test data. This is what the data looks like.

Each review (Sentiment in the above image) can take on values of 0 (negative), 1 (somewhat negative), 2 (neutral), 3 (somewhat positive) and 4 (positive). Our task is to predict the review based on the review text.

I decided to try a few techniques. This post will cover using a vanilla Neural Network but there is some work with the preprocessing of the data that actually gives decent results. In a future post I will explore more complex tools like LSTMs and GRUs.

Preprocessing the data is key here. As a first step we tokenized each sentence into words and vectorized the word using word embeddings. I used the Stanford GLOVE vectors. I assume word2vec would give similar results but GLOVE is supposedly superior since it captures more information of the relationships between words. Initially I ran my tests using the 50 dimensional vectors which gave about 60% accuracy on the test set and 57.7% on Kaggle. Each word then becomes a 50-dimensional vector.

For a sentence, we take the average of the word vectors as inputs to our Neural Network. This approach has 2 issues

Some words don’t exist in the Glove database. We are ignoring them for now, but it may be useful to find some way to address this issue.
Averaging the word embeddings means we fail to capture the position of the word in the sentence. That can have an impact on some reviews. For example if we had the following review

Great plot, would have been entertaining if not for the horrible acting and directing.

This would be a bad review but by averaging the word vectors we may be losing this information.

For the neural network I used 2 hidden layers with 1024 and 512 neurons. The final output goes through a softmax layer and we use the standard cross-entropy loss since this is a classification problem.

Overall the results are quite good. Using 100 dimensional GLOVE vectors, we get 62% accuracy on the test set and 60.8% on the Kaggle website.

Pre-trained vectors seem to be a good starting point to tackling NLP problems like this. The hyperparameter weight matrices will automatically tweak them for the task at hand.

Next steps are to explore larger embedding vectors and deeper neural networks to see if the accuracy improves further. Also play with regularization, dropout, and try different activation functions.

The next post will explore using more sophisticated techniques like LSTMs and GRUs.

Source code below (assuming you get the data from Kaggle)

import numpy as np
import pandas as pd
import numpy as np
import csv
#from nltk.tokenize import sent_tokenize,word_tokenize
from nltk.tokenize import RegexpTokenizer
from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split

glove_file='../glove/glove.6B.100d.txt'
pretrained_vectors=pd.read_table(glove_file, sep=" ", index_col=0, header=None, quoting=csv.QUOTE_NONE)
base_vector=pretrained_vectors.loc['this'].as_matrix()
def vec(w):
    try:
        location=pretrained_vectors.loc[w]
        return location.as_matrix()
    except KeyError:
        return None

def get_average_vector(review):
    numwords=0.0001
    average=np.zeros(base_vector.shape)
    tokenizer = RegexpTokenizer(r'\w+')
    for word in tokenizer.tokenize(review):
    #sentences=sent_tokenize(review)
    #for sentence in sentences:
    #    for word in word_tokenize(sentence):
        value=vec(word.lower())
        if value is not None:
            average+=value
            numwords+=1
        #else:
        #    print("cant find "+word)

    average/=numwords
    return average.tolist()

class SentimentDataObject(object):
    def __init__(self,test_ratio=0.1):
        self.df_train_input=pd.read_csv('/home/rohitapte/Documents/movie_sentiment/data/train.tsv',sep='\t')
        self.df_test_input=pd.read_csv('/home/rohitapte/Documents/movie_sentiment/data/test.tsv',sep='\t')
        self.df_train_input['Vectorized_review']=self.df_train_input['Phrase'].apply(lambda x:get_average_vector(x))
        self.df_test_input['Vectorized_review'] = self.df_test_input['Phrase'].apply(lambda x: get_average_vector(x))
        self.train_data=np.array(self.df_train_input['Vectorized_review'].tolist())
        self.test_data=np.array(self.df_test_input['Vectorized_review'].tolist())
        train_labels=self.df_train_input['Sentiment'].tolist()
        unique_labels=list(set(train_labels))
        self.lb=LabelBinarizer()
        self.lb.fit(unique_labels)
        self.y_data=self.lb.transform(train_labels)
        self.X_train,self.X_cv,self.y_train,self.y_cv=train_test_split(self.train_data,self.y_data,test_size=test_ratio)

    def generate_one_epoch_for_neural(self,batch_size=100):
        num_batches=int(self.X_train.shape[0])//batch_size
        if batch_size*num_batches<self.X_train.shape[0]:
            num_batches+=1
        perm=np.arange(self.X_train.shape[0])
        np.random.shuffle(perm)
        self.X_train=self.X_train[perm]
        self.y_train=self.y_train[perm]
        for j in range(num_batches):
            batch_X=self.X_train[j*batch_size:(j+1)*batch_size]
            batch_y=self.y_train[j*batch_size:(j+1)*batch_size]
            yield batch_X,batch_y

import numpy as np

import pandas as pd

import numpy as np

import csv

#from nltk.tokenize import sent_tokenize,word_tokenize

from nltk.tokenize import RegexpTokenizer

from sklearn.preprocessing import LabelBinarizer

from sklearn.model_selection import train_test_split

glove_file='../glove/glove.6B.100d.txt'

pretrained_vectors=pd.read_table(glove_file, sep=" ", index_col=0, header=None, quoting=csv.QUOTE_NONE)

base_vector=pretrained_vectors.loc['this'].as_matrix()

def vec(w):

try:

location=pretrained_vectors.loc[w]

return location.as_matrix()

except KeyError:

return None

def get_average_vector(review):

numwords=0.0001

average=np.zeros(base_vector.shape)

tokenizer = RegexpTokenizer(r'\w+')

for word in tokenizer.tokenize(review):

#sentences=sent_tokenize(review)

#for sentence in sentences:

# for word in word_tokenize(sentence):

value=vec(word.lower())

if value is not None:

average+=value

numwords+=1

#else:

# print("cant find "+word)

average/=numwords

return average.tolist()

class SentimentDataObject(object):

def __init__(self,test_ratio=0.1):

self.df_train_input=pd.read_csv('/home/rohitapte/Documents/movie_sentiment/data/train.tsv',sep='\t')

self.df_test_input=pd.read_csv('/home/rohitapte/Documents/movie_sentiment/data/test.tsv',sep='\t')

self.df_train_input['Vectorized_review']=self.df_train_input['Phrase'].apply(lambda x:get_average_vector(x))

self.df_test_input['Vectorized_review'] = self.df_test_input['Phrase'].apply(lambda x: get_average_vector(x))

self.train_data=np.array(self.df_train_input['Vectorized_review'].tolist())

self.test_data=np.array(self.df_test_input['Vectorized_review'].tolist())

train_labels=self.df_train_input['Sentiment'].tolist()

unique_labels=list(set(train_labels))

self.lb=LabelBinarizer()

self.lb.fit(unique_labels)

self.y_data=self.lb.transform(train_labels)

self.X_train,self.X_cv,self.y_train,self.y_cv=train_test_split(self.train_data,self.y_data,test_size=test_ratio)

def generate_one_epoch_for_neural(self,batch_size=100):

num_batches=int(self.X_train.shape[0])//batch_size

if batch_size*num_batches<self.X_train.shape[0]:

num_batches+=1

perm=np.arange(self.X_train.shape[0])

np.random.shuffle(perm)

self.X_train=self.X_train[perm]

self.y_train=self.y_train[perm]

for j in range(num_batches):

batch_X=self.X_train[j*batch_size:(j+1)*batch_size]

batch_y=self.y_train[j*batch_size:(j+1)*batch_size]

yield batch_X,batch_y

import tensorflow as tf
import SentimentData
#import numpy as np
import pandas as pd

sentimentData=SentimentData.SentimentDataObject()

INPUT_VECTOR_SIZE=sentimentData.X_train.shape[1]
HIDDEN_LAYER1_SIZE=1024
HIDDEN_LAYER2_SIZE=1024
OUTPUT_SIZE=sentimentData.y_train.shape[1]
LEARNING_RATE=0.001
NUM_EPOCHS=100
BATCH_SIZE=10000

def truncated_normal_var(name, shape, dtype):
    return (tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.truncated_normal_initializer(stddev=0.05)))

def zero_var(name, shape, dtype):
    return (tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.constant_initializer(0.0)))

X=tf.placeholder(tf.float32,shape=[None,INPUT_VECTOR_SIZE],name='X')
labels=tf.placeholder(tf.float32,shape=[None,OUTPUT_SIZE],name='labels')

with tf.variable_scope('hidden_layer1') as scope:
    hidden_weight1=truncated_normal_var(name='hidden_weight1',shape=[INPUT_VECTOR_SIZE,HIDDEN_LAYER1_SIZE],dtype=tf.float32)
    hidden_bias1=zero_var(name='hidden_bias1',shape=[HIDDEN_LAYER1_SIZE],dtype=tf.float32)
    hidden_layer1=tf.nn.relu(tf.matmul(X,hidden_weight1)+hidden_bias1)

with tf.variable_scope('hidden_layer2') as scope:
    hidden_weight2=truncated_normal_var(name='hidden_weight2',shape=[HIDDEN_LAYER1_SIZE,HIDDEN_LAYER2_SIZE],dtype=tf.float32)
    hidden_bias2=zero_var(name='hidden_bias2',shape=[HIDDEN_LAYER2_SIZE],dtype=tf.float32)
    hidden_layer2=tf.nn.relu(tf.matmul(hidden_layer1,hidden_weight2)+hidden_bias2)

with tf.variable_scope('full_layer') as scope:
    full_weight1=truncated_normal_var(name='full_weight1',shape=[HIDDEN_LAYER2_SIZE,OUTPUT_SIZE],dtype=tf.float32)
    full_bias2 = zero_var(name='full_bias2', shape=[OUTPUT_SIZE], dtype=tf.float32)
    final_output=tf.matmul(hidden_layer2,full_weight1)+full_bias2

logits=tf.identity(final_output,name="logits")

cost=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=labels))
train_step=tf.train.AdamOptimizer(learning_rate=LEARNING_RATE).minimize(cost)
correct_prediction=tf.equal(tf.argmax(final_output,1),tf.argmax(labels,1),name='correct_prediction')
accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32),name='accuracy')

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

test_data_feed = {
    X: sentimentData.X_cv,
    labels: sentimentData.y_cv,
}

for epoch in range(NUM_EPOCHS):
    for batch_X, batch_y in sentimentData.generate_one_epoch_for_neural(BATCH_SIZE):
        train_data_feed = {
            X: batch_X,
            labels: batch_y,
        }
        sess.run(train_step, feed_dict={X:batch_X,labels:batch_y,})
    validation_accuracy=sess.run([accuracy], test_data_feed)
    print('validation_accuracy => '+str(validation_accuracy))

validation_accuracy=sess.run([accuracy], test_data_feed)
print('Final validation_accuracy => ' +str(validation_accuracy))

#generate the submission file
num_batches=int(sentimentData.test_data.shape[0])//BATCH_SIZE
if BATCH_SIZE*num_batches<sentimentData.test_data.shape[0]:
    num_batches+=1
output=[]
for j in range(num_batches):
    batch_X=sentimentData.test_data[j*BATCH_SIZE:(j + 1)*BATCH_SIZE]
    test_output=sess.run(tf.argmax(final_output,1),feed_dict={X:batch_X})
    output.extend(test_output.tolist())
    #print(len(output))


sentimentData.df_test_input['Classification']=pd.Series(output)
#print(sentimentData.df_test_input.head())
#sentimentData.df_test_input['Sentiment']=sentimentData.df_test_input['Classification'].apply(lambda x:sentimentData.lb.inverse_transform(x))
sentimentData.df_test_input['Sentiment']=sentimentData.df_test_input['Classification'].apply(lambda x:x)
#print(sentimentData.df_test_input.head())
submission=sentimentData.df_test_input[['PhraseId','Sentiment']]
submission.to_csv('submission.csv',index=False)

import tensorflow as tf

import SentimentData

#import numpy as np

import pandas as pd

sentimentData=SentimentData.SentimentDataObject()

INPUT_VECTOR_SIZE=sentimentData.X_train.shape[1]

HIDDEN_LAYER1_SIZE=1024

HIDDEN_LAYER2_SIZE=1024

OUTPUT_SIZE=sentimentData.y_train.shape[1]

LEARNING_RATE=0.001

NUM_EPOCHS=100

BATCH_SIZE=10000

def truncated_normal_var(name, shape, dtype):

return (tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.truncated_normal_initializer(stddev=0.05)))

def zero_var(name, shape, dtype):

return (tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.constant_initializer(0.0)))

X=tf.placeholder(tf.float32,shape=[None,INPUT_VECTOR_SIZE],name='X')

labels=tf.placeholder(tf.float32,shape=[None,OUTPUT_SIZE],name='labels')

with tf.variable_scope('hidden_layer1') as scope:

hidden_weight1=truncated_normal_var(name='hidden_weight1',shape=[INPUT_VECTOR_SIZE,HIDDEN_LAYER1_SIZE],dtype=tf.float32)

hidden_bias1=zero_var(name='hidden_bias1',shape=[HIDDEN_LAYER1_SIZE],dtype=tf.float32)

hidden_layer1=tf.nn.relu(tf.matmul(X,hidden_weight1)+hidden_bias1)

with tf.variable_scope('hidden_layer2') as scope:

hidden_weight2=truncated_normal_var(name='hidden_weight2',shape=[HIDDEN_LAYER1_SIZE,HIDDEN_LAYER2_SIZE],dtype=tf.float32)

hidden_bias2=zero_var(name='hidden_bias2',shape=[HIDDEN_LAYER2_SIZE],dtype=tf.float32)

hidden_layer2=tf.nn.relu(tf.matmul(hidden_layer1,hidden_weight2)+hidden_bias2)

with tf.variable_scope('full_layer') as scope:

full_weight1=truncated_normal_var(name='full_weight1',shape=[HIDDEN_LAYER2_SIZE,OUTPUT_SIZE],dtype=tf.float32)

full_bias2 = zero_var(name='full_bias2', shape=[OUTPUT_SIZE], dtype=tf.float32)

final_output=tf.matmul(hidden_layer2,full_weight1)+full_bias2

logits=tf.identity(final_output,name="logits")

cost=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=labels))

train_step=tf.train.AdamOptimizer(learning_rate=LEARNING_RATE).minimize(cost)

correct_prediction=tf.equal(tf.argmax(final_output,1),tf.argmax(labels,1),name='correct_prediction')

accuracy=tf.reduce_mean(tf.cast(correct_prediction,tf.float32),name='accuracy')

init = tf.global_variables_initializer()

sess = tf.Session()

sess.run(init)

test_data_feed = {

X: sentimentData.X_cv,

labels: sentimentData.y_cv,

}

for epoch in range(NUM_EPOCHS):

for batch_X, batch_y in sentimentData.generate_one_epoch_for_neural(BATCH_SIZE):

train_data_feed = {

X: batch_X,

labels: batch_y,

}

sess.run(train_step, feed_dict={X:batch_X,labels:batch_y,})

validation_accuracy=sess.run([accuracy], test_data_feed)

print('validation_accuracy => '+str(validation_accuracy))

validation_accuracy=sess.run([accuracy], test_data_feed)

print('Final validation_accuracy => ' +str(validation_accuracy))

#generate the submission file

num_batches=int(sentimentData.test_data.shape[0])//BATCH_SIZE

if BATCH_SIZE*num_batches<sentimentData.test_data.shape[0]:

num_batches+=1

output=[]

for j in range(num_batches):

batch_X=sentimentData.test_data[j*BATCH_SIZE:(j + 1)*BATCH_SIZE]

test_output=sess.run(tf.argmax(final_output,1),feed_dict={X:batch_X})

output.extend(test_output.tolist())

#print(len(output))

sentimentData.df_test_input['Classification']=pd.Series(output)

#print(sentimentData.df_test_input.head())

#sentimentData.df_test_input['Sentiment']=sentimentData.df_test_input['Classification'].apply(lambda x:sentimentData.lb.inverse_transform(x))

sentimentData.df_test_input['Sentiment']=sentimentData.df_test_input['Classification'].apply(lambda x:x)

#print(sentimentData.df_test_input.head())

submission=sentimentData.df_test_input[['PhraseId','Sentiment']]

submission.to_csv('submission.csv',index=False)