1 1 1 1 1 1 1 1 1 1 Rating 0.00 (0 Votes)

In the previous post, we discussed various steps of text processing involved in Nature Language Processing (NLP) and also implemented a basic Sentiment Analyzer using some of the classical ML techniques.

Deep learning has demonstrated superior performance on a wide variety of tasks including NLP, Computer Vision, and Games. To explore further, we will discuss and use some of the advanced NLP techniques, based on Deep Learning, to create an improved Sentiment Classifier.

 
Courtesy (KDnuggets)

Sentiment Classification Problem

Sentiment classification is the task of looking at a piece of text and telling if someone likes or dislikes the thing they’re talking about.

The input X is a piece of text and the output Y is the sentiment which we want to predict, such as the star rating of a movie review.

 

If we can train a system to map from X to Y based on a labelled data set like above, then such a system can be used to predict sentiment of a reviewer after watching a movie.

In this post we will focus on below tasks:

  • Build a Deep Neural Network for Sentiment Classification.
  • Learn Word Embedding : while training the network and using Word2Vec.

Architecture

Deep learning text classification model architectures generally consist of the following components connected in sequence:

 
Deep Learning Architecture
  • Embedding Layer
Word Embedding is a representation of text where words that have the same meaning have a similar representation. In other words it represents words in a coordinate system where related words, based on a corpus of relationships, are placed closer together. In the deep learning frameworks such as TensorFlow, Keras, this part is usually handled by an embedding layer which stores a lookup table to map the words represented by numeric indexes to their dense vector representations.
  • Deep Network
Deep network takes the sequence of embedding vectors as input and converts them to a compressed representation. The compressed representation effectively captures all the information in the sequence of words in the text. The deep neywrok part is usually an RNN or some forms of it like LSTM/GRU. The dropout is added to overcome the tendency to overfit, a very common problem with RNN based networks. Please refer here for detailed discussion on LSTM,GRU.
  • Fully Connected Layer
The fully connected layer takes the deep representation from the RNN/LSTM/GRU and transforms it into the final output classes or class scores. This component is comprised of fully connected layers along with batch normalization and optionally dropout layers for regularization.
  • Output Layer
Based on the problem at hand, this layer can have either Sigmoid for binary classification or Softmax for both binary and multi classification output.

DataSet

The IMDB movie review set can be downloaded from here. This dataset for binary sentiment classification contains set of 25,000 highly polar movie reviews for training, and 25,000 for testing. The dataset after initial pre-processing is saved to movie_data.csv file. First we load the IMDb dataset, the text reviews are labelled as 1 or 0 for positive and negative sentiment respectively.

 
IMDb movie review dataset

Learn Word Embedding

The word embeddings of our dataset can be learned while training a neural network on the classification problem. Before it can be presented to the network, the text data is first encoded so that each word is represented by a unique integer. This data preparation step can be performed using the Tokenizer API provided with Keras. We add padding to make all the vectors of same length (max_length). Below code converts the text to integer indexes, now ready to be used in Keras embedding layer.

 

The Embedding layer requires the specification of the vocabulary size (vocab_size), the size of the real-valued vector space EMBEDDING_DIM = 100, and the maximum length of input documents max_length .

 

Build Model

We are now ready to define our neural network model. The model will use an Embedding layer as the first hidden layer. The Embedding layer is initialized with random weights and will learn an embedding for all of the words in the training dataset during training of the model.

 

The summary of the model is:

 

We have used a simple deep network configuration for demonstration purpose. You can try out different configuration of the network and compare the performance. The embedding param count 12560200 = (vocab_size * EMBEDDING_DIM). Maximum input length max_length = 2678. The model during training shall learn the word embeddings from the input text. The total trainable params are 12,573,001.

Train Model

Now let us train the model on training set and cross validate on test set. We can see from below training epochs that the model after each epoch is improving the accuracy. After a few epochs we reach validation accuracy of around 84%. Not bad :)

 

Test Model

We can test our model with some sample reviews to check how it is predicting the sentiment of each review. First we will have to convert the text review to tokens and use model to predict as below.

 

The output gives the prediction of the word either to be 1 (positive sentiment) or 0 (negative sentiment).

 

Value closer to 1 is strong positive sentiment and a value close to 0 is a strong negative sentiment. I can clearly see that the model prediction is wrong for test_sample_7 and is doing reasonably well for rest of the samples.

In the above approach we learn word embedding as part of fitting a neural network model.

Train word2vec Embedding

There is another approach to building the Sentiment clarification model. Instead of training the embedding layer, we can first separately learn word embeddings and then pass to the embedding layer. This approach also allows to use any pre-trained word embedding and also saves the time in training the classification model.

We will use the Gensim implementation of Word2Vec. The first step is to prepare the text corpus for learning the embedding by creating word tokens, removing punctuation, removing stop words etc. The word2vec algorithm processes documents sentence by sentence.

 

we have 50000 review lines in our text corpus. Gensim’s Word2Vec APIrequires some parameters for initialization.

 

i. sentences – List of sentences; here we pass the list of review sentences.

ii. size – The number of dimensions in which we wish to represent our word. This is the size of the word vector.

iii. min_count – Word with frequency greater than min_count only are going to be included into the model. Usually, the bigger and more extensive your text, the higher this number can be.

iv. window – Only terms that occur within a window-neighborhood of a term, in a sentence, are associated with it during training. The usual value is 4 or 5.

v. workers– Number of threads used in training parallelization, to speed up training

Test Word2Vec Model

After we train the model on our IMDb dataset, it builds a vocabulary size = 134156 . Let us try some word embeddings the model learnt from the movie review dataset.

The most similar words for word horrible are:

 
most similar words

Try some math on the word vectors — woman+king-man=?

 

Let us find the odd word woman, king, queen, movie = ?

 

This is very interesting to see the word embeddings learned by our word2vec model form the text corpus. The next step is to use the word embeddings directly in the embedding layer in our sentiment classification model. we can save the model to be used later.

 

Use Pre-trained Embedding

Since we have already trained word2vec model with IMDb dataset, we have the word embeddings ready to use. The next step is to load the word embedding as a directory of words to vectors. The word embedding was saved in file imdb_embedding_word2vec.txt. Let us extract the word embeddings from the stored file.

 

The next step is to convert the word embedding into tokenized vector. Recall that the review documents are integer encoded prior to passing them to the Embedding layer. The integer maps to the index of a specific vector in the embedding layer. Therefore, it is important that we lay the vectors out in the Embedding layer such that the encoded words map to the correct vector.

 

Now we will map embeddings from the loaded word2vec model for each word to the tokenizer_obj.word_index vocabulary and create a matrix with of word vectors.

 

We are now ready with the trained embedding vector to be used directly in the embedding layer. In the below code, the only change from previous model is using the embedding_matrix as input to the Embedding layer and setting trainable = False, since the embedding is already learned.

 
 
Model summary with pre- trained Embedding

Look closely, you can see that model total params = 13,428,501 but trainable params = 12801. Since the model uses pre-trained word embedding it has very few trainable params and hence should train faster.

To train the sentiment classification model, we use VALIDATION_SPLIT= 0.2, you can vary this to see effect on the accuracy of the model.

 

Finally training the classification model on train and validation test set, we get improvement in accuracy with each epoch run. We reach 88% accuracy with just around 5 epochs.

 

You can try to improve the accuracy of the model by changing hyper-parameters, running more epochs etc,. Also, you can use some other pre-trained embeddings prepared on very large corpus of text data that you can directly download.

Conclusion

In this post we discussed in detail the architecture of Deep Learning model for sentiment classification. We also trained a word2vec model and used it as a per-trained embedding for sentiment classification.

Thanks for reading, if you liked it, please give a clap to it.

Further Reading

http://ruder.io/deep-learning-nlp-best-practices

Hands-On NLP with Python, By Rajesh Arumugam, Rajalingappaa Shanmugamani July 2018