The authors present a system developed for the 2011 i2b2 Challenge on Sentiment Classification, whose aim was to automatically classify sentences in suicide notes using a scheme of 15 topics, mostly emotions. The system combines machine learning with a rule-based methodology. The features used to represent a problem were based on lexico–semantic properties of individual words in addition to regular expressions used to represent patterns of word usage across different topics. A naïve Bayes classifier was trained using the features extracted from the training data consisting of 600 manually annotated suicide notes. Classification was then performed using the naïve Bayes classifier as well as a set of pattern–matching rules. The classification performance was evaluated against a manually prepared gold standard consisting of 300 suicide notes, in which 1,091 out of a total of 2,037 sentences were associated with a total of 1,272 annotations. The competing systems were ranked using the micro-averaged F-measure as the primary evaluation metric. Our system achieved the F-measure of 53% (with 55% precision and 52% recall), which was significantly better than the average performance of 48.75% achieved by the 26 participating teams.
natural language processing; sentiment analysis; topic classification; naïve Bayes classifier
This paper describes the sentiment classification system developed by the Mayo Clinic team for the 2011 I2B2/VA/Cincinnati Natural Language Processing (NLP) Challenge. The sentiment classification task is to assign any pertinent emotion to each sentence in suicide notes. We have implemented three systems that have been trained on suicide notes provided by the I2B2 challenge organizer—a machine learning system, a rule-based system, and a system consisting of a combination of both. Our machine learning system was trained on re-annotated data in which apparently inconsistent emotion assignment was adjusted. Then, the machine learning methods by RIPPER and multinomial Naïve Bayes classifiers, manual pattern matching rules, and the combination of the two systems were tested to determine the emotions within sentences. The combination of the machine learning and rule-based system performed best and produced a micro-average F-score of 0.5640.
sentiment classification; suicidal emotion; natural language processing; machine learning
We describe the Open University team’s submission to the 2011 i2b2/VA/Cincinnati Medical Natural Language Processing Challenge, Track 2 Shared Task for sentiment analysis in suicide notes. This Shared Task focused on the development of automatic systems that identify, at the sentence level, affective text of 15 specific emotions from suicide notes. We propose a hybrid model that incorporates a number of natural language processing techniques, including lexicon-based keyword spotting, CRF-based emotion cue identification, and machine learning-based emotion classification. The results generated by different techniques are integrated using different vote-based merging strategies. The automated system performed well against the manually-annotated gold standard, and achieved encouraging results with a micro-averaged F-measure score of 61.39% in textual emotion recognition, which was ranked 1st place out of 24 participant teams in this challenge. The results demonstrate that effective emotion recognition by an automated system is possible when a large annotated corpus is available.
emotion recognition; keyword-based model; machine-learning-based model; hybrid model; result integration
To create a sentiment classification system for the Fifth i2b2/VA Challenge Track 2, which can identify thirteen subjective categories and two objective categories.
We developed a hybrid system using Support Vector Machine (SVM) classifiers with augmented training data from the Internet. Our system consists of three types of classification-based systems: the first system uses spanning n-gram features for subjective categories, the second one uses bag-of-n-gram features for objective categories, and the third one uses pattern matching for infrequent or subtle emotion categories. The spanning n-gram features are selected by a feature selection algorithm that leverages emotional corpus from weblogs. Special normalization of objective sentences is generalized with shallow parsing and external web knowledge. We utilize three sources of web data: the weblog of LiveJournal which helps to improve the feature selection, the eBay List which assists in special normalization of information and instructions categories, and the suicide project web which provides unlabeled data with similar properties as suicide notes.
The performance is evaluated by the overall micro-averaged precision, recall and F-measure.
Our system achieved an overall micro-averaged F-measure of 0.59. Happiness_peacefulness had the highest F-measure of 0.81. We were ranked as the second best out of 26 competing teams.
Our results indicated that classifying fine-grained sentiments at sentence level is a non-trivial task. It is effective to divide categories into different groups according to their semantic properties. In addition, our system performance benefits from external knowledge extracted from publically available web data of other purposes; performance can be further enhanced when more training data is available.
sentiment analysis; suicide note; spanning n-gram; web data; supervised approach
The reasons that drive someone to commit suicide are complex and their study has attracted the attention of scientists in different domains. Analyzing this phenomenon could significantly improve the preventive efforts. In this paper we present a method for sentiment analysis of suicide notes submitted to the i2b2/VA/Cincinnati Shared Task 2011. In this task the sentences of 900 suicide notes were labeled with the possible emotions that they reflect. In order to label the sentence with emotions, we propose a hybrid approach which utilizes both rule based and machine learning techniques. To solve the multi class problem a rule-based engine and an SVM model is used for each category. A set of syntactic and semantic features are selected for each sentence to build the rules and train the classifier. The rules are generated manually based on a set of lexical and emotional clues. We propose a new approach to extract the sentence’s clauses and constitutive grammatical elements and to use them in syntactic and semantic feature generation. The method utilizes a novel method to measure the polarity of the sentence based on the extracted grammatical elements, reaching precision of 41.79 with recall of 55.03 for an f-measure of 47.50. The overall mean f-measure of all submissions was 48.75% with a standard deviation of 7%.
NLP; sentiment analysis; emotion classification; polarity measurement; machine learning
This paper describes a system for automatic emotion classification, developed for the 2011 i2b2 Natural Language Processing Challenge, Track 2. The objective of the shared task was to label suicide notes with 15 relevant emotions on the sentence level. Our system uses 15 SVM models (one for each emotion) using the combination of features that was found to perform best on a given emotion. Features included lemmas and trigram bag of words, and information from semantic resources such as WordNet, SentiWordNet and subjectivity clues. The best-performing system labeled 7 of the 15 emotions and achieved an F-score of 53.31% on the test data.
emotion classification; topic classification; suicide; suicide notes; machine learning
In 2007, suicide was the tenth leading cause of death in the U.S. Given the significance of this problem, suicide was the focus of the 2011 Informatics for Integrating Biology and the Bedside (i2b2) Natural Language Processing (NLP) shared task competition (track two). Specifically, the challenge concentrated on sentiment analysis, predicting the presence or absence of 15 emotions (labels) simultaneously in a collection of suicide notes spanning over 70 years. Our team explored multiple approaches combining regular expression-based rules, statistical text mining (STM), and an approach that applies weights to text while accounting for multiple labels. Our best submission used an ensemble of both rules and STM models to achieve a micro-averaged F1 score of 0.5023, slightly above the mean from the 26 teams that competed (0.4875).
sentiment analysis; machine learning; text analysis; i2b2 competition
In this paper we report on the approaches that we developed for the 2011 i2b2 Shared Task on Sentiment Analysis of Suicide Notes. We have cast the problem of detecting emotions in suicide notes as a supervised multi-label classification problem. Our classifiers use a variety of features based on (a) lexical indicators, (b) topic scores, and (c) similarity measures. Our best submission has a precision of 0.551, a recall of 0.485, and a F-measure of 0.516.
similarity method; statistical method; sentiment classification; suicide notes
Operative notes contain rich information about techniques, instruments, and materials used in procedures. To assist development of effective information extraction (IE) techniques for operative notes, we investigated the sublanguage used to describe actions within the operative report ‘procedure description’ section. Deep parsing results of 362,310 operative notes with an expanded Stanford parser using the SPECIALIST Lexicon resulted in 200 verbs (92% coverage) including 147 action verbs. Nominal action predicates for each action verb were gathered from WordNet, SPECIALIST Lexicon, New Oxford American Dictionary and Stedman’s Medical Dictionary. Coverage gaps were seen in existing lexical, domain, and semantic resources (Unified Medical Language System (UMLS) Metathesaurus, SPECIALIST Lexicon, WordNet and FrameNet). Our findings demonstrate the need to construct surgical domain-specific semantic resources for IE from operative notes.
This paper describes the Duluth systems that participated in the Sentiment Analysis track of the i2b2/VA/Cincinnati Children’s 2011 Challenge. The top Duluth system was a rule-based approach derived through manual corpus analysis and the use of measures of association to identify significant ngrams. This performed in the median range of systems, attaining an F-measure of 0.45. The second system was automatically derived from the most frequent bigrams unique to one or two emotions. It achieved an F-measure of 0.36. The third system was the union of the first two, and reached an F-measure of 0.44.
rule-based; sentiment classification; suicide notes
This paper presents our solution for the i2b2 sentiment classification challenge. Our hybrid system consists of machine learning and rule-based classifiers. For the machine learning classifier, we investigate a variety of lexical, syntactic and knowledge-based features, and show how much these features contribute to the performance of the classifier through experiments. For the rule-based classifier, we propose an algorithm to automatically extract effective syntactic and lexical patterns from training examples. The experimental results show that the rule-based classifier outperforms the baseline machine learning classifier using unigram features. By combining the machine learning classifier and the rule-based classifier, the hybrid system gains a better trade-off between precision and recall, and yields the highest micro-averaged F-measure (0.5038), which is better than the mean (0.4875) and median (0.5027) micro-average F-measures among all participating teams.
sentiment analysis; emotion identification; suicide note
In this paper, we present the system we have developed for participating in the second task of the i2b2/VA 2011 challenge dedicated to emotion detection in clinical records. On the official evaluation, we ranked 6th out of 26 participants. Our best configuration, based upon a combination of both a machine-learning based approach and manually-defined transducers, obtained a 0.5383 global F-measure, while the distribution of the other 26 participants’ results is characterized by mean = 0.4875, stdev = 0.0742, min = 0.2967, max = 0.6139, and median = 0.5027. Combination of machine learning and transducer is achieved by computing the union of results from both approaches, each using a hierarchy of sentiment specific classifiers.
emotion detection; machine-learning; SVM classifier; transducers
An ensemble of supervised maximum entropy classifiers can accurately detect and identify sentiments expressed in suicide notes. Using lexical and syntactic features extracted from a training set of externally annotated suicide notes, we trained separate classifiers for each of fifteen pre-specified emotions. This formed part of the 2011 i2b2 NLP Shared Task, Track 2. The precision and recall of these classifiers related strongly with the number of occurrences of each emotion in the training data. Evaluating on previously unseen test data, our best system achieved an F1 score of 0.534.
natural language processing; text analysis; emotion classification; suicide notes
We describe our approach for creating a system able to detect emotions in suicide notes. Motivated by the sparse and imbalanced data as well as the complex annotation scheme, we have considered three hybrid approaches for distinguishing between the different categories. Each of the three approaches combines machine learning with manually derived rules, where the latter target very sparse emotion categories. The first approach considers the task as single label multi-class classification, where an SVM and a CRF classifier are trained to recognise fifteen different categories and their results are combined. Our second approach trains individual binary classifiers (SVM and CRF) for each of the fifteen sentence categories and returns the union of the classifiers as the final result. Finally, our third approach is a combination of binary and multi-class classifiers (SVM and CRF) trained on different subsets of the training data. We considered a number of different feature configurations. All three systems were tested on 300 unseen messages. Our second system had the best performance of the three, yielding an F1 score of 45.6% and a Precision of 60.1% whereas our best Recall (43.6%) was obtained using the third system.
emotion classification; hybrid; suicide; sentence classification
In this study, we analyzed the compatibility between an ontology of the biomedical domain (the UMLS Semantic Network) and two other ontologies: the Upper Cyc Ontology (UCO) and WordNet. 1) We manually mapped UMLS Semantic Types to UCO. One fifth of the UMLS Semantic Types had exact mapping to UCO types. UCO provides generic concepts and a structure that relies on a larger number of categories, despite its lack of depth in the biomedical domain. 2) We compared semantic classes in the UMLS and WordNet. 2% of the UMLS concepts from the Health Disorder class were present in WordNet, and compatibility between classes was 48%. WordNet, as a general language-oriented ontology is a source of lay knowledge, particularly important for consumer health applications.
Each biomedical system has its own way of naming the pieces of information it contains, i.e., of defining its data elements (DEs). Integrating DEs facilitates the integration of biomedical resources. However, the mapping of DEs to the UMLS is ambiguous in many cases, when any correspondence is found at all. We propose to evaluate the potential contribution of a more general terminology: WordNet. Our method is based on synonyms, definitions, and structural properties of the terminologies. We applied it to a set of 474 DEs extracted from eleven biomedical sources. We show that WordNet can improve the direct mapping of DEs to UMLS when used to validate and disambiguate UMLS direct mappings. WordNet can also help identify indirect mappings of DEs to the UMLS.
Almost all suicidal persons who consult physicians wish to live. Generally they fall into one of two groups. Interpersonal suiciders manifest frequent threats and attempts, are emotionally labile, have ill-defined suicide plans, and clear ideas as to how their crises might be resolved. Intrapersonal suiciders are less open in manifestations of suicidal drive, withdrawn rather than emotional, often have clearly-formulated suicide plans and do not have ideas (other than suicide) as to how their crises might end. The suicidal situation results from two factors: (1) the loss of some valuable person or commodity, and (2) the loss of self-esteem. What ensues is temporary character disorganization—crisis. Treatment is based on restoration or replacement of lost objects and building up of self-esteem.
This paper reports on the results of an initiative to create and annotate a corpus of suicide notes that can be used for machine learning. Ultimately, the corpus included 1,278 notes that were written by someone who died by suicide. Each note was reviewed by at least three annotators who mapped words or sentences to a schema of emotions. This corpus has already been used for extensive scientific research.
natural language processing; computational linguistics; corpus; suicide
While several sources of biomedical knowledge are available, these resources are often highly specialized and usually not suitable for a lay audience. This paper evaluates whether concepts needed for molecular biology and genetic diseases are present in WordNet, the electronic lexical database.
Terms for four broad categories of concepts (phenotype, molecular function, biological process, and cellular component) were extracted from LocusLink and mapped to WordNet. All terms from the Gene Ontology database (gene products and ontology concepts) were also mapped to WordNet in order to evaluate its global coverage of the domain. Additionally, we tested two methods for improving the mapping of genetic disease names to WordNet.
The coverage of concepts ranged from 0% (gene product symbols) to 2.8% (cellular components). Removing specialization markers from the terms and using synonyms significantly increased the rate of mapping of genetic disease names to WordNet.
Many of the most common single gene disorders are present in WordNet, as well as many high-level concepts in Gene Ontology. Therefore, WordNet is likely to be a useful source of lay knowledge in the framework of a consumer health information system on genetic diseases.
WordNet; Genetic diseases; Molecular biology; Consumer health information
This paper describes the National Research Council of Canada’s submission to the 2011 i2b2 NLP challenge on the detection of emotions in suicide notes. In this task, each sentence of a suicide note is annotated with zero or more emotions, making it a multi-label sentence classification task. We employ two distinct large-margin models capable of handling multiple labels. The first uses one classifier per emotion, and is built to simplify label balance issues and to allow extremely fast development. This approach is very effective, scoring an F-measure of 55.22 and placing fourth in the competition, making it the best system that does not use web-derived statistics or re-annotated training data. Second, we present a latent sequence model, which learns to segment the sentence into a number of emotion regions. This model is intended to gracefully handle sentences that convey multiple thoughts and emotions. Preliminary work with the latent sequence model shows promise, resulting in comparable performance using fewer features.
natural language processing; text analysis; emotion classification; suicide notes; support vector machines; latent variable modeling
Cross-linguistic differences in emotionality of autobiographical memories were examined by eliciting memories of immigration from bilingual speakers. Forty-seven Russian-English bilinguals were asked to recount their immigration experiences in either Russian or English. Bilinguals used more emotion words when describing their immigration experiences in the second language (English) than in the first language (Russian). Bilinguals' immigration narratives contained more negative emotion words than positive emotion words. In addition, language preference (but not language proficiency) influenced results, with emotional expression amplified when speaking in the preferred language. These findings carry implications for organization of the bilingual lexicon and the special status of emotion words within it. We suggest that bilinguals' expression of emotion may vary across languages and that the linguistic and affective systems are interconnected in the bilingual cognitive architecture.
Poets and philosophers have long acknowledged moral sentiments as key motivators of human social behavior. Prosocial sentiments, which include guilt, pity and embarrassment, enable us to care about others and to be concerned about our mistakes. Functional imaging studies have implicated frontopolar, ventromedial frontal and basal forebrain regions in the experience of prosocial sentiments. Patients with lesions of the frontopolar and ventromedial frontal areas were observed to behave inappropriately and less prosocially, which could be attributed to a generalized emotional blunting. Direct experimental evidence for brain regions distinctively associated with moral sentiment impairments is lacking, however. We investigated this issue in patients with the behavioral variant of frontotemporal dementia, a disorder in which early and selective impairments of social conduct are consistently observed. Using a novel moral sentiment task, we show that the degree of impairment of prosocial sentiments is associated with the degree of damage to frontopolar cortex and septal area, as assessed with 18-Fluoro-Deoxy-Glucose-Positron Emission Tomography, an established measure of neurodegenerative damage. This effect was dissociable from impairment of other-critical feelings (anger and disgust), which was in turn associated with dorsomedial prefrontal and amygdala dysfunction. Our findings suggest a critical role of the frontopolar cortex and septal region in enabling prosocial sentiments, a fundamental component of moral conscience.
frontopolar cortex; prefrontal cortex; subgenual; septal area; amygdala; orbitofrontal cortex; moral sentiment; emotion
A long-standing challenge for scientific and clinical work on suicidal behavior is that people often are motivated to deny or conceal suicidal thoughts. We proposed that people considering suicide would possess an objectively measurable attentional bias toward suicide-related stimuli, and that this bias would predict future suicidal behavior. Participants were 124 adults presenting to a psychiatric emergency department who were administered a modified emotional Stroop task and followed for six months. Suicide attempters showed an attentional bias toward suicide-related words relative to neutral words, and this bias was strongest among those who had made a more recent attempt. Importantly, this suicide-specific attentional bias predicted which people made a suicide attempt over the next six months, above and beyond other clinical predictors. Attentional bias toward more general negatively-valenced words did not predict any suicide-related outcomes, supporting the specificity of the observed effect. These results suggest that suicide-specific attentional bias can serve as a behavioral marker for suicidal risk, and ultimately improve scientific and clinical work on suicide-related outcomes.
suicide; attentional bias; Stroop task; prediction
We describe the submission entered by SRI International and UC Davis for the I2B2 NLP Challenge Track 2. Our system is based on a machine learning approach and employs a combination of lexical, syntactic, and psycholinguistic features. In addition, we model the sequence and locations of occurrence of emotions found in the notes. We discuss the effect of these features on the emotion annotation task, as well as the nature of the notes themselves. We also explore the use of bootstrapping to help account for what appeared to be annotator fatigue in the data. We conclude a discussion of future avenues for improving the approach for this task, and also discuss how annotations at the word span level may be more appropriate for this task than annotations at the sentence level.
emotion detection; natural language processing; suicide note; psycholinguistic resources
This paper reports on a shared task involving the assignment of emotions to suicide notes. Two features distinguished this task from previous shared tasks in the biomedical domain. One is that it resulted in the corpus of fully anonymized clinical text and annotated suicide notes. This resource is permanently available and will (we hope) facilitate future research. The other key feature of the task is that it required categorization with respect to a large set of labels. The number of participants was larger than in any previous biomedical challenge task. We describe the data production process and the evaluation measures, and give a preliminary analysis of the results. Many systems performed at levels approaching the inter-coder agreement, suggesting that human-like performance on this task is within the reach of currently available technologies.
Sentiment analysis; suicide; suicide notes; natural language processing; computational linguistics; shared task; challenge 2011