Search tips
Search criteria 


Logo of biiBiomedical Informatics Insights
Biomed Inform Insights. 2012; 5(Suppl 1): 165–174.
Published online 2012 January 30. doi:  10.4137/BII.S8981
PMCID: PMC3409484

A Hybrid System for Emotion Extraction from Suicide Notes


The reasons that drive someone to commit suicide are complex and their study has attracted the attention of scientists in different domains. Analyzing this phenomenon could significantly improve the preventive efforts. In this paper we present a method for sentiment analysis of suicide notes submitted to the i2b2/VA/Cincinnati Shared Task 2011. In this task the sentences of 900 suicide notes were labeled with the possible emotions that they reflect. In order to label the sentence with emotions, we propose a hybrid approach which utilizes both rule based and machine learning techniques. To solve the multi class problem a rule-based engine and an SVM model is used for each category. A set of syntactic and semantic features are selected for each sentence to build the rules and train the classifier. The rules are generated manually based on a set of lexical and emotional clues. We propose a new approach to extract the sentence’s clauses and constitutive grammatical elements and to use them in syntactic and semantic feature generation. The method utilizes a novel method to measure the polarity of the sentence based on the extracted grammatical elements, reaching precision of 41.79 with recall of 55.03 for an f-measure of 47.50. The overall mean f-measure of all submissions was 48.75% with a standard deviation of 7%.

Keywords: NLP, sentiment analysis, emotion classification, polarity measurement, machine learning


Suicide is a complex phenomenon that for decades has attracted the attention of scientists in different domains such as psychology, sociology and philosophy. About one million people commit suicide worldwide each year.1 Suicide is reported as the 11th leading cause of death in United states.2 It is a serious public health problem that demands attention and prevention. Although prevention is not an easy task, monitoring people’s mental health and performing early actions can reduce the number of suicides. A person at risk of suicide is likely to talk or write about his or her feelings, many times in social network sites or via email, which highlights the importance of natural language processing for automated monitoring and preventive purposes.

The i2b2/VA/Cincinnati Sentiment classification challenge is a shared task that required automated identification of emotions in suicide notes. The participants were asked to find emotions in the notes at the sentence level. This is a multi-class classification problem where each sentence can accept any of the final categories of emotion. The emotions include: hopelessness, guilt, sorrow, blame, anger, abuse, fear, forgiveness, thankfulness, love, pride, hopefulness, and happiness/peacefulness. There are also two objective categories: information and instruction. A total of 900 annotated suicide notes were used for this task, 600 of them were used for training and 300 kept for testing purposes. More detailed information about the task and the annotated data are published separately.3

In this paper we present our approach to the sentiment classification problem defined for the shared task. The proposed method is a hybrid approach that combines machine learning and rule based techniques. We designed a rule-based engine and trained a Support Vector Machine (SVM) classifier for each possible emotion. A set of syntactic and semantic features are extracted from sentences to build the rules and train the classifier. In order to generate the sentence features we propose a new approach to identify a sentence’s clauses and its constitutive grammatical elements and to use them to measure the polarity (a quantitative measure of the positive or negative feelings reflected in it) of a given sentence.


Recently, the natural language research community has demonstrated an increased interest in the analysis of “sentiment” or emotions in text documents in different domains. Several rule based46 and machine learning based approaches7,8 have been developed for emotion identification in text. Lu et al6 developed a system that classified a sentence into 4 emotion categories; they applied a rule-based emotion recognizer that is based on keyword spotting and event extraction from text. A set of rules were defined by considering the relation of the verb, subject and the object of the sentence. The common actions between users of a chatting room and real life objects such as “book” or “jewelry” were extracted from web and the objects were classified into affective categories. Manual rules were then utilized for classifying a sentence based on the relation of the verb, subject and the categorized objects. Andreevskaia et al4 compared two different approaches on news headline sentiment detection: a knowledge-based, unsupervised approach with a supervised machine learning approach. The knowledge based approach uses a list of subjective words and considers the impact of polarity shifters on the word’s polarity score by defining a set of rules. Their study show that the knowledge based approach can produce high quality results with good precision while the supervised approach generated results with good recall and low precision. Cambria et al10 utilized ConceptNet11 and WordNet-Affect9 in defining emotion vectors; they used clustering techniques to find the most similar emotion vector to a sentence vector and assigned that emotion to the sentence. Neviarouskaya et al8 developed a machine learning based tool that extracts emotions from text and the extracted emotion is then used to create a 3D virtual model. They created a lexicon containing a set of trigger words for each emotion that were mainly taken from Wordnet-Affect.9


Our proposed method is a combination of rule-based and machine learning techniques. To handle the multi-class classification problem we implemented an emotion detector component for each of the 15 emotion categories (Fig. 1A). Each emotion detector component consists of a rule engine and an SVM classifier (illustrated in Fig. 1B).

Figure 1.
The overall system architecture (A) includes pre-processing steps and an Emotion Detector for each emotion (Emotion Detector 1… Emotion Detector 15), with an output of 1 (present) or 0 (not present) for each emotion. Each detector (B) consists ...

The design of the rule engine applies a triple logic whereby the output of the rule engine can be 0 (the emotion is not present), 1 (the emotion is present), or 2 (the presence of the emotion cannot be determined by the rules). For 0 or 1, that would be the final decision for the related emotion. Otherwise (output is 2), the sentence will be a candidate for the SVM training set. In fact, for a given emotion the classifier is trained on sentences that were not covered by any of the rules. Thus, the rules are applied first on a sentence and if the result of the rule engine is 2, the final decision is based on the classifier’s result. This approach was taken given empirical testing showed that using the rules ahead of the classifier resulted in higher f-measure. The main components of the system are: preprocessing, rule engine and SVM classifier. Each component is explained in detail in the following sections.


Spelling and structural error correction. The suicide notes in the shared task dataset are typed from the scanned version of the original notes. The process introduced many spelling and syntactic mistakes such as using symbols or spaces in the wrong place; For example “don*t” is an example where asterisk is used instead of apostrophe and “I ‘ve” is an examples of unexpected space. For spelling correction we used Text::SpellChecker which is a perl CPANa module that deals with a block of text to correct the misspellings; it uses GNU Aspell,b a free and open source spell checker, which its main feature is that it suggests possible replacements for a misspelled word. Considering the common mistakes in the training data, we prepared a script to automatically do the required replacements. However, for the rest of the misspelled words the system requires the user to manually select from the list of suggestions.

Parsing and POS tagging. We used Stanford parser,12 to parse the sentences and Stanford tagger13 for the part of speech tagging. Stanford dependencies were utilized to extract the sentence syntactic elements (subject, verb, object, and others) in the next steps.

Name entity recognition. Detecting some of the determinative entities such as persons, locations, phone numbers and others in a sentence is necessary in our method for defining the rules. We used ConceptNet11 as a knowledge base for named entity recognition, where for example we could determine that “daughter” is a human and “desk” is an object. In addition we have used regular expression to detect addresses, phone numbers, and names.

Rule engine

The rule engine consists of two sets of rules for each emotion: positive rules (where if the rule premise is satisfied the emotion is likely present in the sentence); and negative rules (where if the rule premise is satisfied the emotion is likely to not exist in the sentence). If a sentence is not covered by either group of the rules it is passed to the SVM classifier for the final decision. The rules for each emotion are based on lexical and emotional clues in the sentences. The simplest lexical clues are based on the presence of common vocabulary or language expressions which people use for expressing an emotion (eg, “thank” for “thankfulness” or “forgive” for “forgiveness”). More complex lexical clues consider additional features other than keywords and will be discussed in the following sections. Emotional clues are real-life conditions that a person experiences that trigger or indicate a specific emotion. Both lexical and emotional clues were manually extracted by analyzing the training data. A few examples of the emotional clues are listed in Table 1.

Table 1.
Examples of emotional clues.

In order to find the emotions, we defined a rule for each clue. A set of syntactic and semantic features of the sentence were extracted. The syntactic features include: sentence clauses, verb, subjects, objects, indirect objects, complements, adjectives, adverbs, verb auxiliaries and other grammatical elements. The semantic features include: subject or object type (eg, first, second or third person), verb tense, verb polarity, and verb argument’s polarity (eg, object polarity).

In defining the rules we considered the relation of the verb and the semantic roles in the sentence. Consider “forgive” as the main verb of a clause. If the subject of “forgive” is a first person then the emotion label will be different from when the subject is second or third person and the object is first person; the former will usually be labeled as “forgiveness” while the latter will be labeled as “guilt”. Examples of translating an emotional clue to a rule are shown in Table 2.

Table 2.
Examples of “hopelessness” clues and the corresponding rule.

Feature extraction

In order to define the rules based on the clues, syntactic and semantic features should be extracted from the sentence. In some rules only lexical features are included, while some other rules need polarity features of the whole sentence or the sentence elements. For example for category “thankfulness”, if the value of the verb belongs to “thank, appreciate, apprise ...” then the verb condition is satisfied; while in detecting “hopelessness”, when the person describes himself/herself with a negative adjective, the polarity of the adjective is considered to satisfy part of the rule condition rather than the exact value of the adjective. The calculation of sentence polarity is explained in the Polarity Measurement section. In the following sections we briefly explain how we extracted grammatical elements followed by an illustration of our approach in finding negations in the sentence. Then our proposed method for polarity measurement is elaborated. Finally our method for building a bag of trigger words for each emotion is described.

Finding the grammatical relations

Each sentence was analyzed at the clause level. A clause is a part of a sentence which has only one main verb. We used Stanford dependencies to extract the grammatical relations. The dependencies represent the grammatical relationships with arguments of a relation being the words. The offset of an argument in the sentence is also attached to it. The sentence clauses were built from the dependencies. For example consider the generated dependencies for the sentence “I hope you will forgive me.”:

  • nsubj(hope-2, I-1)
  • nsubj(forgive-5, you-3)
  • aux(forgive-5, will-4)
  • ccomp(hope-2, forgive-5)
  • dobj(forgive-5, me-6)

where “nsubj” is the name of the relation and indicates that “hope”(hope-2) which is the second word in the sentence is the verb and “I” (I-1) which is the first word is the subject. “dobj” is another relation that shows “me” is the object of “forgive”. Stanford type dependencies are explained in detail by Marneffe et al.14

To build the clauses, first we consider the relations that include a verb as part of the relation, such as (nsubj, dobj, cop) and build the clause by adding the verb and the corresponding element in the relation. In order to find other elements of the clauses such as subjects, objects, indirect objects, complements and their modifiers (adjective or adverbs) we loop through all dependency lines and modify the existing clauses. For example by analyzing the dependency aux(forgive-5, will-4), we add “will” to the list of auxiliaries of the verb “forgive”. From the dependency line ccomp(hope-2, forgive-5) which shows a clausal complement relation, we add the clause that has the verb “forgive” to the list of dependent clauses of the clause with the main verb “hope”. Therefore we convert each sentence to a list of nested clauses where each clause can have various grammatical elements. We consider the relations between different clauses in a sentence by analyzing relations such as “ccomp”.

Negation detection

Negation in this context is more complex than in other more direct genres, such as biomedical literature or clinical records, where a lot of work has been done to process negations. We determine the negated words by initially considering special relations in the dependencies such as “neg”, where it illustrates the relation between negation words and the target word. Consider the sentence “I don’t know ...”, the negated verb can easily be detected by processing the related dependency “neg(know-4, n’t-3)”. However, sometimes a concept is semantically negated but there is no direct negation relation in the dependencies. For example in the sentence “I don’t want to leave you alone”, “leave” is semantically negated and this can be determined by processing the governor clause (“I don’t want ...”).

In addition to dependency analysis, we consider the presence of semantically negative words (eg, “no one”, “nobody”, “without”) to detect negated words. The presence of the conjunctives such as “but” and “except” also can negate the meaning. Consider the sentence “no one is to blame except me.”, the phrase “no one” has negated “blame” and the conjunction “except” has negated it again, therefore overall the verb “blame” is not considered as negated and the sentence has the concept of blame.

Polarity measurement

Effective sentiment analysis of the sentences has a positive impact on the accuracy of our proposed method. The possibility that we can find “happiness_peacefulness” in a sentence with a negative tone (polarity) is very low. On the other hand a sentence with positive polarity is not likely to reflect “blame” or “guilt”.

Here we propose a novel approach to measure sentence polarity. In our approach, sentence polarity is calculated as an integer number; positive numbers are associated with positive sentences and negative sentences have negative numbers as their polarities and the polarity of neutral sentences is 0. If we consider a sentence (S) as a set of clauses(cls) and phrases, sentence polarity is calculated based on Equation 1:


Equation 1: Sentence polarity

where clsi is a clause and phrasej is a phrase in the sentence such as noun phrase or adjective phrase which is not included in any of the sentence clauses with a verb. The polarity of each clause is calculated as the sum of the polarity of the verb, objects and the complements (Equation 2):


Equation 2: Clause polarity

where “verb” is the lemma of the verb of the clause that can be a phrasal verb also, “obj” can be the direct or the indirect object and “compl” is the complement of the clause.

The polarity number for the individual words is acquired from the Subjectivity Lexiconc which contains the polarity of approximately 8000 English words. More explanation about the lexicon can be found in Wilson et al15. The polarity can be +1, −1 or 0; and the intensity can be weak or strong. Considering the intensity of the polarities, we define the initial polarity of each word as an integer between −2 and +2. If a word is negated the polarity number will also be negated. In addition to negations, we incorporate the impact of other modifiers such as adjective or adverbs (eg, “good”, “terribly”) on the polarity of a word. Consider the phrase “poor children”: although the polarity of “children” is 0, when incorporating its modifier the polarity will be −1. Based on the proposed algorithm, the polarity of the phrase “truly sorry” is −4, since the polarity of “sorry” is −2, and the polarity of “truly” is +2. The algorithm whereby we calculate the polarity of a single word considering its modifiers is presented in Table 3.

Table 3.
Word polarity calculation.

Building the list of keywords for emotions

In order to build the rules based on lexical clues, we need to utilize a list of possible trigger words for each category. For example, one of the rules to detect “happiness_peacefulness” is to look for adjectives describing happiness or joy within the complements of a clause with a first person subject (eg, “I am happy”). In addition, while processing the emotional clues aside from measuring the polarities, for some rules we need to consider the base values (lemma) of the sentence elements and limit the range of acceptable values. For example, to satisfy this emotional clue: “If the continuation of life appears to be impossible for the person.”, the verb of the clause should belong to the following set of verbs and their synonyms: {“go on”, “continue”, ”stand”, ....}.

We generated the list of triggers by collecting the words from different resources. For some emotions we prepared a list of seed words based on the words with the highest TF-IDF(Term Frequency-Inverse document Frequency) in that emotion from the training data and expanded the list using Wordnet16 synonyms. A large number of keywords are also from Wordnet-Affect.9 For some other emotions we collected the possible triggers by selecting all the sentences with the target emotion and extracting the verbs and complements. The collected words are stored into the database with the corresponding part of speech.

SVM classifier

In each Emotion detector component shown in Figure1 a trained SVM classifier is utilized. First the rules are applied on the sentences in the train set and those that are not covered by any rule are used as instances for training the classifier. We used SVMLight17 library with polynomial kernel to train 15 SVM models, one for each emotion. For each sentence, the following attributes are calculated:

  • TF-IDF features: TF-IDF vector of the sentence used as a set of features. Each keyword in the sentence used as a feature which has value equal to TF-IDF weight of the word. We also included TF-IDF of the next and previous sentences.
  • Syntactical features: Number of sentences in the document, number of words in the sentence, the sentence offset in the note.
  • Clausal features: a sentence is divided to clauses and each clause element was used as a clause feature.


We evaluated the performance of the system based on the gold standard released by the challenge organizers. There were 600 training notes and 300 test notes. The system performance was measured using micro-average of three standard measures: recall (R), precision (P) and F-measure (F). We present the system performance results for different experiments in Table 4. We compared the performance of the system when we just applied rule-based or machine learning methods with the experiments where we applied a combination of both. In one experiment (ML only), we applied machine learning to a limited number of emotions for which the classifier generated acceptable results (love, guilt, hopelessness, information). Then, in order to incorporate other emotions, rules were first applied and classifiers were utilized in 4 emotions with acceptable classification results (Rule+ML1). The system performance was tested in another experiment (Rule+ML2) while the rules and the classifier were used for all the emotions.

Table 4.
System result on test set for different experiments; the best micro-average f-measure achieved while using a combination of rule-based and machine learning for limited categories (Rule + ML1).

For a given sentence, true positives are the number of emotions that are both assigned by the system and exist in the gold standard. False positives are the number of emotions that are assigned by the system but do not exist in the gold standard. The emotions that are assigned to the sentence just in the gold standard but not by the system are considered as false negatives.

Using machine learning without rules resulted in micro-average f-measure of 41.96%, while using rules alone resulted in the f-measure of 45.95%. Then we applied the combination of machine learning and rules (Rule + ML2) for all emotions and the f-measure increased to 47.36%. In the Rule + ML1experiment where we removed the emotions with small training instances (eg, “abuse”) we had 0.14% increase in the performance and reached to 47.50% of f-measure.


As shown in Table 4, the f-measure increased by just by 1.55 percent when we applied the classifier over the rules’ results. This can be due to the fact that most of the sentences that have obvious lexical clues are handled by the rules and the classifier could not handle the more complicated sentences to significantly improve the results. We observed that using SVM for some emotions like “blame” generated more false positives than true positives which resulted in overall reduction of the performance, forcing us to just use rules for those emotions (all emotions except love, guilt, hopelessness, information). We also limited the number of rules for those emotions since rule engine also generated high number of false positives. Although in the training data the system could find true positives, the rules were not enough to cover the test cases for some emotion categories (hopefulness, blame, anger and abuse) which caused their result to become zero. However we could eliminate zero results for some categories by doing further tuning on SVM parameters and applying the classifier on top of the rules for all the emotions (Rule + ML2).

An external file that holds a picture, illustration, etc.
Object name is bii-suppl-1-2012-165f3.jpg

In general, dealing with the subjective data that contains ambiguity is a challenging task. For a given sentence two different persons may find different emotions while there is no obvious clue or difference in the context of the sentences. In this task the sentence level inter-annotation agreement is reported as 54.6%. This ambiguous nature was the source of a large part of the false positives and false negatives of the system. For example the sentence “please forgive me.” was tagged as reflecting “guilt” in many of the notes, while in some other notes there was no emotion assigned to it.

Some of the rules in our system are based on the lexical clues and mainly are based on the presence of some trigger words or phrases. However, based on the training data, some emotions share common triggers; for example “sorry” is common in the sentences with “sorrow”, “guilt” and “hopelessness”. In addition some of the emotional clues are also common between different emotions. There are delicate semantic differences in the context of such sentences that leads the sentences to reflect different emotions; many of such differences are handled in the defined rules and the unhandled cases caused part of the system errors.

As we explained in the Methods section, the rules are defined based on the extracted semantic roles in the sentences which are defined based on the output of Stanford parser. Part of the system errors are related to erroneous dependencies in the parser output that partially are caused by the nature of the sentences that contain many grammatical errors.

Furthermore, a large number of false negatives are caused by using just a limited number of rules that were based on the most obvious clues. By extracting more emotional clues for each category and defining the corresponding rules, many of the uncovered sentences will be handled and the performance can be improved.


We presented our approach in sentiment analysis of the suicide notes which is submitted to i2b2/ VA/Cincinnati shared task 2011. The task required finding the possible emotions in the sentences of suicide notes. We proposed a hybrid system that utilized a set of defined rules and trained classifiers for each emotion. A set of syntactic and semantic features were extracted from sentences and were used as classifier features and also in defining the rules. We proposed a new approach for measuring the polarity of a sentence by considering the relationships between the grammatical elements of the sentence. In addition, an algorithm was proposed to extract the sentence clauses and the constitutive grammatical elements.

We have reached an f-measure of 47.50% with precision of 41.79% and recall of 55.03%. As we discussed, there are delicate semantic differences between some of the sentences reflecting different emotions; we handled part of them by defining the semantic features. Adding more syntactic and semantic features for training the classifier can improve the performance of the SVM classifier. In addition we plan to improve our proposed method in semantic role extraction from sentences. Utilizing logic and reasoning and generally automating the process of the rule generation are other future plans to explore.

Figure 2.
System performance per emotion comparing different experiments (Rule, Rule + ML1 and Rule + ML2).






Author(s) have provided signed confirmations to the publisher of their compliance with all applicable legal and ethical obligations in respect to declaration of conflicts of interest, funding, authorship and contributorship, and compliance with ethical requirements in respect to treatment of human and animal test subjects. If this article contains identifiable human subject(s) author(s) were required to supply signed patient consent prior to publication. Author(s) have confirmed that the published article is unique and not under consideration nor published by any other publication and that they have consent to reproduce any copyrighted material. The peer reviewers declared no conflicts of interest.


1. Fleischmann A. World Suicide Prevention Day 2008
2. Anon Suicide in the USA. Based on Current (2007) Statistics. 2007. pp. 2–5.
3. Pestian JP, Pawel M, Michelle L-G, et al. Sentiment Analysis of Suicide Notes: A Shared Task. Biomedical Informatics Insights. 2012;5(Suppl.1):3–16. [PMC free article] [PubMed]
4. Andreevskaia A, Bergler S. CLaC and CLaC-NB: Knowledge-based and corpus-based approaches to sentiment tagging. Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007); 2007. pp. 119–20.
5. Chaumartin F-régis. UPAR7: A knowledge-based system for headline sentiment tagging. Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007); 2007. pp. 422–5.
6. Lu CY, Hsu WWY, Peng HT, Chung JM, Ho JM. Emotion Sensing for Internet Chatting: A Web Mining Approach for Affective Categorization of Events. 2010 13th IEEE International Conference on Computational Science and Engineering; 2010. pp. 295–301.
7. Das D. Proceedings of the 20th international conference companion on World wide web. ACM; 2011. Analysis and tracking of emotions in english and bengali texts: a computational approach; pp. 343–8.
8. Neviarouskaya A, Prendinger H, Ishizuka M. EmoHeart: Conveying Emotions in Second Life Based on Affect Sensing from Text. Advances in Human-Computer Interaction. 2010;(2):1–13.
9. Strapparava C, Valitutti A. WordNet-Affect: an affective extension of Word-Net. Proceedings of LREC Vol 4. Citeseer. 2004:1083–6.
10. Cambria E, Hussain A, Havasi C, Eckl C. Affectivespace: Blending common sense and affective knowledge to perform emotive reasoning. WOMSA at CAEPIA, Seville. 2009:32–41.
11. Liu H, Singh P. ConceptNet—A Practical Commonsense Reasoning Tool-Kit. BT technology journal. 2004;22(4):211–26.
12. Manning CD, Klein D. Accurate Unlexicalized Parsing. Proceedings of the 41st Meeting of the Association for Computational Linguistics. 2003. pp. 423–30.
13. Toutanova K, Klein D, Manning CD, Singer Y. Feature-rich part-of-speech tagging with a cyclic dependency network. Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology—NAACL ’03. 2003;1:173–80.
14. Marneffe MC De, Manning CD. Stanford typed dependencies manual. 2009-01-10.\_manual.pdf. 2010;(September 2008):1–22.
15. Wilson T, Wiebe J, Hoffmann P. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing. Morristown, NJ, USA: Association for Computational Linguistics; 2005. Recognizing contextual polarity in phrase-level sentiment analysis; pp. 347–54.
16. Miller G a. WordNet: a lexical database for English. Communications of the ACM. 1995;38(11):39–41.
17. Joachims T. Making large scale SVM learning practical. Advances in Kernel Methods—Support Vector Learning. 1999

Articles from Biomedical Informatics Insights are provided here courtesy of SAGE Publications