Using Recurrent Neural Networks in Movie Classifications

Deep Learning Project

Author: Jiaru (Katherine) Fu, Yuting Yan, Hangyu Kang




Abstract

These days, with the development of the movie industry, it is extremely convenient and essential for people to select a specific genre to watch from the plot provided by the producers, and we wish to provide such a model. In our project, we performed supervised training since the plots and labels for each movie are provided in the data set. First, we employed the word2vec algorithm to create word embeddings as distributed representations of words in a vector space for the whole movie plots after data pre-processing. Different word embeddings of the movie plot generated by word2vec, TF-IDF weighted word2vec, and doc2vec were trained by the KNN classifier. The word2vec embeddings were applied as the weights of the embedding layer for the deep learning model. The RNN model architecture consists of a bi-direction LSTM system and attention mechanism to process sequential text data. We also applied different algorithms including CNN, RNN-CNN, an ensemble of the three deep learning models, and machine learning, and compared their results with RNN using various evaluation means. The results show that RNN gives the best performance in accuracy, F1 score, and recall metric. The SGD optimizer and step learning rate decay were employed and the binary cross-entropy was used to calculate the loss between predicted values and real values. By inputting the sequences of text plots, our model is robust and capable enough of successfully classifying movies into different categories.