Sentiment Analysis research often attempts to assign a positive or negative score (e.g. Likert scale) to text at the sentence level. Different sentiment analysis systems use different scales and approaches to the problem. One common approach is to use a 1-5, very negative to very positive, scale. This approach also often overlaps with opinion mining and product reviews.
For TDM Studio, however, we attempt to assign an affective state or emotion to each sentence in a document. This approach is slightly different because it focuses on assigning emotions instead of a positive or negative score to text.
We use BERT-based, sentence embeddings to represent each sentence in a dataset. We then train a model using the sentence embeddings to predict the probability of each sentence being assigned to each affective state (i.e. ‘Anger’, ‘Disgust’, ‘Fear’, ‘Sadness’, ‘Happiness’, ‘Love’, ‘Surprise’, ‘Neutral’, ‘Other’). One thing to note is that the emotions which are expressed vary between different domains—The emotions that are important in a teaching and learning context are different than those in a research context. For TDM Studio, we chose primary emotions based on research by Ekman’s and others. The initial work for this classification system was develop as part of a pilot exploratory research project with the University of Michigan.
For training the classification model, we use a combination of newspaper as well as literary data. Depending on the task, as well as the time period, the training data used for affect classification will impact the results of the classifier.
In the first ten days of September, 2001, we can see that the most common emotions expressed in The New York Times are: Neutral, Happiness, and Surprise. We can also see that less common emotions are: Fear, Love and Anger. Looking at different newspapers as well as different time periods, do the most common emotions change over time? Were the most common emotions for the first ten days of September 2020 the same as the most common emotions for the previous one hundred Septembers?