Over the past few weeks, I have been looking into sentiment analysis techniques, and implemented some of these in Python. I used the NLTK library for the SentiWordNet data set, sentence tokenization, and part of speech tagging. The various algorithms were tested using a set of comments from the android app store, and the output was compared with the user's actual rating. Using sentiment scores of only verbs and adverbs seemed to have the best performance on this data set.
A technique to account for negation described in "Handbook of Natural Language Processing" was implemented (http://www.cs.uic.edu/~liub/FBS/NLP-handbook-sentiment-analysis.pdf). Whenever a negative adverb such as 'not' appears, the sentiment analysis scores of the siblings in the parse tree are multiplied by -1. However, this did not seem to have much effect on the performance of the sentiment analyzer.
I will continue to look into sentiment analysis techniques in the coming weeks, but we have a good start so far.
-Theresa
Thursday, December 18, 2014
Friday, December 5, 2014
Sentiment Analysis
Implementation of sentiment analysis of app comments has begun. We are using Python's NLTK library for tasks such as tokenization and part of speech tagging. SentiWordNet is used as a data set for the sentiment classification of various English words. Various algorithms that assign a positive or negative sentiment analysis score to a comment are being looked at, and some simple algorithms have been implemented. To test the suitability of various algorithms, we are comparing the 1-5 star rating associated with the comment to the output of the sentiment analyzer.
Wednesday, November 19, 2014
Python NLTK
Python NLTK looks promising as a possible toolkit for the project. The homepage can be found here:
http://www.nltk.org/
The students will begin testing the toolkit to see if it is powerful enough for the project at hand.
http://www.nltk.org/
The students will begin testing the toolkit to see if it is powerful enough for the project at hand.
Monday, November 10, 2014
Update November 10
After our last meeting, the entire team agreed on a direction for the metric. The students agreed that the most interesting approach would be to go the way of natural language processing for analyzing the reviews and descriptions of applications in the Google Play store.
At this step, the team is tasked with finding the tools to perform the natural language processing as well as writing a script to crawl the relevant websites and gather data for processing. We will meet again in the next week or so to discuss our progress on these tasks.
Tuesday, October 28, 2014
Update October 28, 2014
After a good few weeks of research, the team has started to discuss the details about the nature and qualities we would like in a metric to have. Dr. Zheng has given us more papers to read and discuss. These take a more modern approach to the analysis of the data on top of traditional statistical methods.
The students have expressed much interest in the approaches. We will be meeting again this Friday to discuss the new papers and hopefully the direction for the rest of the semester.
The students have expressed much interest in the approaches. We will be meeting again this Friday to discuss the new papers and hopefully the direction for the rest of the semester.
Monday, October 13, 2014
Meeting Minutes
Last week, the group got together to discuss the project progress and details. The students were given papers to read. At this step we are working to think about aggregation of existing reputation metrics as well as thinking about the development of our own metrics that can be put into practice.
Our goal is to meet every two weeks as we discuss research and implementation.
Our goal is to meet every two weeks as we discuss research and implementation.
Subscribe to:
Posts (Atom)