While opinion mining (the study of opinions as positive, negative or neutral in free texts) has received great attention over the past few years,1
less work has been performed in the field of emotion mining that aims at identifying emotion labels, as for instance “anger”, “love” or “hate”. The lack of consensus in emotion models, the difficulty to annotate data-sets as well as the complexity of analyzing emotion expressions in free texts strongly participate in this phenomenon. The success of opinion mining can be explained by the availability of Internet user ratings as well as the simplicity of opinion representations: the task of opinion classification is often tackled as a classical binary classification task.
I2B2’s challenge track 2 consists in learning to discriminate emotions labels in free texts.2
To this aim, participants are provided with a training set made of 600 suicide notes annotated at the sentence level according to M
= 15 predefined emotion labels (see for a complete list). Sentences in the learning set are associated with zero or multiple emotion labels, gives the distribution of the labels over the whole dataset. We observe that sentences labeled with more than one emotion represent approximately 7% of the whole data-set and up to 5 emotions are labeled at maximum. Micro averaged F1 score is employed to evaluate submitted systems over a testing set composed of 300 notes. To our knowledge, it is the first challenge on emotion classification particularly focused on machine learning; SemEval 2007 proposed a track (task 14)3
consisting in classifying news headlines for several emotions, but due to the small size of the training set, purely linguistic approaches were strongly favored.
Number of occurences of each emotion in the training set in decreasing order.
We propose a system based on the early fusion of n
-grams of increasing orders for representing sentences. Early fusion is the process of merging information from different sources in the input examples. In other words it is the process of taking into account features from different sources at the vector level. Fusion performed at the classifier level is called late fusion, at the similarity function level, intermediate fusion.4
Here, each order, ie each n
value, defines a specific representation of a sentence, a decision surface is then learned in the space made of the concatenation of these representations.
The motivation behind the use of grams of higher orders is to mix features with increasing lengths for representing expressions of emotions. While unigrams are widely employed for representing documents in the classical text classification task, they do not seem to provide enough description in the case of sentiment analysis. By fusing grams of increasing orders, one is able to make use of richer features to describe naturally complex and subtle expressions of emotions. An interesting example is the negation which plays an important role in the detection of emotions’ patterns. For instance, given the unigram “bad”, the change in polarity held by the expression “not bad” is captured by bigrams. More subtle constructs like “not really bad” are represented by trigrams and higher orders can capture even more complex and subtle expressions.
Given a specific gram’s order n, we refer to the set of all unique n-grams in the training set as a dictionary Dn. We must note that the higher the order, the more likely are features to appear uniquely in the dataset and the larger the size of the resulting dictionary. When performing early fusion based on increasing grams’ orders, one must therefore consider a feature selection process in order to maintain the different dictionaries at balanced sizes. In this paper we make use of two criteria: we extract frequent n-grams which occur more than a given threshold and we select emotion specific features among these frequent n-grams according to their Shannon’s entropy measures.
The rest of this paper is organized as follow. Related work is presented in Section 1. We then describe our system: sentences are first lemmatized (to this aim we employ TreeTagger)5
then represented as binary feature vectors made of the fusion of increasing grams’ orders (in the vector, 1 indicates the presence of a feature, 0 indicates its absence). In Section 2 we introduce a method for filtering frequent n
-grams based on the Shannon’s entropy measure, leading to dictionaries specific to each emotion label and each gram’s order. The learning of the models is described in Section 3. The decision process is implemented as a 2-step algorithm: a neutral vs. emotion classifier is applied to the pre-processed sentences, sentences recognized as bearing emotions are further ran through M
different classifiers, one for each emotion (we adopt the classical one vs. all strategy). Finally, we present the results obtained on the testing set composed of 300 notes in Section 4. Conclusion and perspectives of this work are given in Section 5.