Sentiment strength detection in short informal text
A huge number of informal messages are posted every day in social network sites, blogs, and discussion forums. Emotions seem to be frequently important in these texts for expressing friendship, showing social support or as part of online arguments. Algorithms to identify sentiment and sentiment strength are needed to help understand the role of emotion in this informal communication and also to identify inappropriate or anomalous affective utterances, potentially associated with threatening behavior to the self or others. Nevertheless, existing sentiment detection algorithms tend to be commercially oriented, designed to identify opinions about products rather than user behaviors. This article partly fills this gap with a new algorithm, SentiStrength, to extract sentiment strength from informal English text, using new methods to exploit the de facto grammars and spelling styles of cyberspace. Applied to MySpace comments and with a lookup table of term sentiment strengths optimized by machine learning, SentiStrength is able to predict positive emotion with 60.6% accuracy and negative emotion with 72.8% accuracy, both based upon strength scales of 1–5. The former, but not the latter, is better than baseline and a wide range of general machine learning approaches.