Over the past few weeks, I have been looking into sentiment analysis techniques, and implemented some of these in Python. I used the NLTK library for the SentiWordNet data set, sentence tokenization, and part of speech tagging. The various algorithms were tested using a set of comments from the android app store, and the output was compared with the user's actual rating. Using sentiment scores of only verbs and adverbs seemed to have the best performance on this data set.
A technique to account for negation described in "Handbook of Natural Language Processing" was implemented (http://www.cs.uic.edu/~liub/FBS/NLP-handbook-sentiment-analysis.pdf). Whenever a negative adverb such as 'not' appears, the sentiment analysis scores of the siblings in the parse tree are multiplied by -1. However, this did not seem to have much effect on the performance of the sentiment analyzer.
I will continue to look into sentiment analysis techniques in the coming weeks, but we have a good start so far.
Friday, December 5, 2014
Implementation of sentiment analysis of app comments has begun. We are using Python's NLTK library for tasks such as tokenization and part of speech tagging. SentiWordNet is used as a data set for the sentiment classification of various English words. Various algorithms that assign a positive or negative sentiment analysis score to a comment are being looked at, and some simple algorithms have been implemented. To test the suitability of various algorithms, we are comparing the 1-5 star rating associated with the comment to the output of the sentiment analyzer.