PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (31)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Decomposing Phenotype Descriptions for the Human Skeletal Phenome 
Over the course of the last few years there has been a significant amount of research performed on ontology-based formalization of phenotype descriptions. The intrinsic value and knowledge captured within such descriptions can only be expressed by taking advantage of their inner structure that implicitly combines qualities and anatomical entities. We present a meta-model (the Phenotype Fragment Ontology) and a processing pipeline that enable together the automatic decomposition and conceptualization of phenotype descriptions for the human skeletal phenome. We use this approach to showcase the usefulness of the generic concept of phenotype decomposition by performing an experimental study on all skeletal phenotype concepts defined in the Human Phenotype Ontology.
doi:10.4137/BII.S10729
PMCID: PMC3572876  PMID: 23440304
human skeletal phenome; phenotype decomposition; phenotype segmentation; ontologies
2.  What’s In a Note: Construction of a Suicide Note Corpus 
This paper reports on the results of an initiative to create and annotate a corpus of suicide notes that can be used for machine learning. Ultimately, the corpus included 1,278 notes that were written by someone who died by suicide. Each note was reviewed by at least three annotators who mapped words or sentences to a schema of emotions. This corpus has already been used for extensive scientific research.
doi:10.4137/BII.S10213
PMCID: PMC3500150  PMID: 23170067
natural language processing; computational linguistics; corpus; suicide
3.  Sentiment Analysis of Suicide Notes: A Shared Task 
Biomedical informatics insights  2012;5(Suppl 1):3-16.
This paper reports on a shared task involving the assignment of emotions to suicide notes. Two features distinguished this task from previous shared tasks in the biomedical domain. One is that it resulted in the corpus of fully anonymized clinical text and annotated suicide notes. This resource is permanently available and will (we hope) facilitate future research. The other key feature of the task is that it required categorization with respect to a large set of labels. The number of participants was larger than in any previous biomedical challenge task. We describe the data production process and the evaluation measures, and give a preliminary analysis of the results. Many systems performed at levels approaching the inter-coder agreement, suggesting that human-like performance on this task is within the reach of currently available technologies.
PMCID: PMC3299408  PMID: 22419877
Sentiment analysis; suicide; suicide notes; natural language processing; computational linguistics; shared task; challenge 2011
4.  Sentiment Analysis of Suicide Notes: A Shared Task 
Biomedical Informatics Insights  2012;5(Suppl. 1):3-16.
This paper reports on a shared task involving the assignment of emotions to suicide notes. Two features distinguished this task from previous shared tasks in the biomedical domain. One is that it resulted in the corpus of fully anonymized clinical text and annotated suicide notes. This resource is permanently available and will (we hope) facilitate future research. The other key feature of the task is that it required categorization with respect to a large set of labels. The number of participants was larger than in any previous biomedical challenge task. We describe the data production process and the evaluation measures, and give a preliminary analysis of the results. Many systems performed at levels approaching the inter-coder agreement, suggesting that human-like performance on this task is within the reach of currently available technologies.
doi:10.4137/BII.S9042
PMCID: PMC3299408  PMID: 22419877
Sentiment analysis; suicide; suicide notes; natural language processing; computational linguistics; shared task; challenge 2011
5.  Introductory Editorial 
Biomedical Informatics Insights  2012;5(Suppl. 1):1.
doi:10.4137/BII.S9297
PMCID: PMC3409250  PMID: 22879756
6.  Using Ensemble Models to Classify the Sentiment Expressed in Suicide Notes 
Biomedical Informatics Insights  2012;5(Suppl. 1):77-85.
In 2007, suicide was the tenth leading cause of death in the U.S. Given the significance of this problem, suicide was the focus of the 2011 Informatics for Integrating Biology and the Bedside (i2b2) Natural Language Processing (NLP) shared task competition (track two). Specifically, the challenge concentrated on sentiment analysis, predicting the presence or absence of 15 emotions (labels) simultaneously in a collection of suicide notes spanning over 70 years. Our team explored multiple approaches combining regular expression-based rules, statistical text mining (STM), and an approach that applies weights to text while accounting for multiple labels. Our best submission used an ensemble of both rules and STM models to achieve a micro-averaged F1 score of 0.5023, slightly above the mean from the 26 teams that competed (0.4875).
doi:10.4137/BII.S8931
PMCID: PMC3409473  PMID: 22879763
sentiment analysis; machine learning; text analysis; i2b2 competition
7.  Three Hybrid Classifiers for the Detection of Emotions in Suicide Notes 
Biomedical Informatics Insights  2012;5(Suppl. 1):175-184.
We describe our approach for creating a system able to detect emotions in suicide notes. Motivated by the sparse and imbalanced data as well as the complex annotation scheme, we have considered three hybrid approaches for distinguishing between the different categories. Each of the three approaches combines machine learning with manually derived rules, where the latter target very sparse emotion categories. The first approach considers the task as single label multi-class classification, where an SVM and a CRF classifier are trained to recognise fifteen different categories and their results are combined. Our second approach trains individual binary classifiers (SVM and CRF) for each of the fifteen sentence categories and returns the union of the classifiers as the final result. Finally, our third approach is a combination of binary and multi-class classifiers (SVM and CRF) trained on different subsets of the training data. We considered a number of different feature configurations. All three systems were tested on 300 unseen messages. Our second system had the best performance of the three, yielding an F1 score of 45.6% and a Precision of 60.1% whereas our best Recall (43.6%) was obtained using the third system.
doi:10.4137/BII.S8967
PMCID: PMC3409474  PMID: 22879774
emotion classification; hybrid; suicide; sentence classification
8.  Rule-Based and Lightly Supervised Methods to Predict Emotions in Suicide Notes 
Biomedical Informatics Insights  2012;5(Suppl. 1):185-193.
This paper describes the Duluth systems that participated in the Sentiment Analysis track of the i2b2/VA/Cincinnati Children’s 2011 Challenge. The top Duluth system was a rule-based approach derived through manual corpus analysis and the use of measures of association to identify significant ngrams. This performed in the median range of systems, attaining an F-measure of 0.45. The second system was automatically derived from the most frequent bigrams unique to one or two emotions. It achieved an F-measure of 0.36. The third system was the union of the first two, and reached an F-measure of 0.44.
doi:10.4137/BII.S8953
PMCID: PMC3409475  PMID: 22879775
rule-based; sentiment classification; suicide notes
9.  Statistical and Similarity Methods for Classifying Emotion in Suicide Notes 
Biomedical Informatics Insights  2012;5(Suppl. 1):195-204.
In this paper we report on the approaches that we developed for the 2011 i2b2 Shared Task on Sentiment Analysis of Suicide Notes. We have cast the problem of detecting emotions in suicide notes as a supervised multi-label classification problem. Our classifiers use a variety of features based on (a) lexical indicators, (b) topic scores, and (c) similarity measures. Our best submission has a precision of 0.551, a recall of 0.485, and a F-measure of 0.516.
doi:10.4137/BII.S8958
PMCID: PMC3409476  PMID: 22879776
similarity method; statistical method; sentiment classification; suicide notes
10.  A Hybrid Model for Automatic Emotion Recognition in Suicide Notes 
Biomedical Informatics Insights  2012;5(Suppl. 1):17-30.
We describe the Open University team’s submission to the 2011 i2b2/VA/Cincinnati Medical Natural Language Processing Challenge, Track 2 Shared Task for sentiment analysis in suicide notes. This Shared Task focused on the development of automatic systems that identify, at the sentence level, affective text of 15 specific emotions from suicide notes. We propose a hybrid model that incorporates a number of natural language processing techniques, including lexicon-based keyword spotting, CRF-based emotion cue identification, and machine learning-based emotion classification. The results generated by different techniques are integrated using different vote-based merging strategies. The automated system performed well against the manually-annotated gold standard, and achieved encouraging results with a micro-averaged F-measure score of 61.39% in textual emotion recognition, which was ranked 1st place out of 24 participant teams in this challenge. The results demonstrate that effective emotion recognition by an automated system is possible when a large annotated corpus is available.
doi:10.4137/BII.S8948
PMCID: PMC3409477  PMID: 22879757
emotion recognition; keyword-based model; machine-learning-based model; hybrid model; result integration
11.  Combining Lexico-semantic Features for Emotion Classification in Suicide Notes 
Biomedical Informatics Insights  2012;5(Suppl. 1):125-128.
This paper describes a system for automatic emotion classification, developed for the 2011 i2b2 Natural Language Processing Challenge, Track 2. The objective of the shared task was to label suicide notes with 15 relevant emotions on the sentence level. Our system uses 15 SVM models (one for each emotion) using the combination of features that was found to perform best on a given emotion. Features included lemmas and trigram bag of words, and information from semantic resources such as WordNet, SentiWordNet and subjectivity clues. The best-performing system labeled 7 of the 15 emotions and achieved an F-score of 53.31% on the test data.
doi:10.4137/BII.S8960
PMCID: PMC3409478  PMID: 22879768
emotion classification; topic classification; suicide; suicide notes; machine learning
12.  A Combined Approach to Emotion Detection in Suicide Notes 
Biomedical Informatics Insights  2012;5(Suppl. 1):105-114.
In this paper, we present the system we have developed for participating in the second task of the i2b2/VA 2011 challenge dedicated to emotion detection in clinical records. On the official evaluation, we ranked 6th out of 26 participants. Our best configuration, based upon a combination of both a machine-learning based approach and manually-defined transducers, obtained a 0.5383 global F-measure, while the distribution of the other 26 participants’ results is characterized by mean = 0.4875, stdev = 0.0742, min = 0.2967, max = 0.6139, and median = 0.5027. Combination of machine learning and transducer is achieved by computing the union of results from both approaches, each using a hierarchy of sentiment specific classifiers.
doi:10.4137/BII.S8969
PMCID: PMC3409479  PMID: 22879766
emotion detection; machine-learning; SVM classifier; transducers
13.  Binary Classifiers and Latent Sequence Models for Emotion Detection in Suicide Notes 
Biomedical Informatics Insights  2012;5(Suppl. 1):147-154.
This paper describes the National Research Council of Canada’s submission to the 2011 i2b2 NLP challenge on the detection of emotions in suicide notes. In this task, each sentence of a suicide note is annotated with zero or more emotions, making it a multi-label sentence classification task. We employ two distinct large-margin models capable of handling multiple labels. The first uses one classifier per emotion, and is built to simplify label balance issues and to allow extremely fast development. This approach is very effective, scoring an F-measure of 55.22 and placing fourth in the competition, making it the best system that does not use web-derived statistics or re-annotated training data. Second, we present a latent sequence model, which learns to segment the sentence into a number of emotion regions. This model is intended to gracefully handle sentences that convey multiple thoughts and emotions. Preliminary work with the latent sequence model shows promise, resulting in comparable performance using fewer features.
doi:10.4137/BII.S8933
PMCID: PMC3409480  PMID: 22879771
natural language processing; text analysis; emotion classification; suicide notes; support vector machines; latent variable modeling
14.  Early Fusion of Low Level Features for Emotion Mining 
Biomedical Informatics Insights  2012;5(Suppl. 1):129-136.
We study the discrimination of emotions annotated in free texts at the sentence level: a sentence can either be associated with no emotion (neutral) or multiple labels of emotion. The proposed system relies on three characteristics. We implement an early fusion of grams of increasing orders transposing an approach successfully employed in the related task of opinion mining. We apply a filtering process that consists in extracting frequent n-grams and making use of the Shannon’s entropy measure to respectively maintain dictionaries at balanced sizes and keep emotion specific features. Finally the overall system is implemented as a 2-step decision process: a first classifier discriminates between neutral and emotion bearing sentences, then one classifier per emotion is applied on emotion bearing sentences. The final decision is given by the classifier holding the maximum confidence. Results obtained on the testing set are promising.
doi:10.4137/BII.S8973
PMCID: PMC3409481  PMID: 22879769
emotion mining; text analysis; n-grams; fusion; entropy
15.  Discovering Fine-grained Sentiment in Suicide Notes 
Biomedical Informatics Insights  2012;5(Suppl. 1):137-145.
This paper presents our solution for the i2b2 sentiment classification challenge. Our hybrid system consists of machine learning and rule-based classifiers. For the machine learning classifier, we investigate a variety of lexical, syntactic and knowledge-based features, and show how much these features contribute to the performance of the classifier through experiments. For the rule-based classifier, we propose an algorithm to automatically extract effective syntactic and lexical patterns from training examples. The experimental results show that the rule-based classifier outperforms the baseline machine learning classifier using unigram features. By combining the machine learning classifier and the rule-based classifier, the hybrid system gains a better trade-off between precision and recall, and yields the highest micro-averaged F-measure (0.5038), which is better than the mean (0.4875) and median (0.5027) micro-average F-measures among all participating teams.
doi:10.4137/BII.S8963
PMCID: PMC3409482  PMID: 22879770
sentiment analysis; emotion identification; suicide note
16.  Labeling Emotions in Suicide Notes: Cost-Sensitive Learning with Heterogeneous Features 
Biomedical Informatics Insights  2012;5(Suppl. 1):99-103.
This paper describes a system developed for Track 2 of the 2011 Medical NLP Challenge on identifying emotions in suicide notes. Our approach involves learning a collection of one-versus-all classifiers, each deciding whether or not a particular label should be assigned to a given sentence. We explore a variety of features types—syntactic, semantic and surface-oriented. Cost-sensitive learning is used for dealing with the issue of class imbalance in the data.
doi:10.4137/BII.S8930
PMCID: PMC3409483  PMID: 22879765
emotion classification; suicidology; support vector machines; cost-sensitive learning
17.  A Hybrid System for Emotion Extraction from Suicide Notes 
Biomedical Informatics Insights  2012;5(Suppl. 1):165-174.
The reasons that drive someone to commit suicide are complex and their study has attracted the attention of scientists in different domains. Analyzing this phenomenon could significantly improve the preventive efforts. In this paper we present a method for sentiment analysis of suicide notes submitted to the i2b2/VA/Cincinnati Shared Task 2011. In this task the sentences of 900 suicide notes were labeled with the possible emotions that they reflect. In order to label the sentence with emotions, we propose a hybrid approach which utilizes both rule based and machine learning techniques. To solve the multi class problem a rule-based engine and an SVM model is used for each category. A set of syntactic and semantic features are selected for each sentence to build the rules and train the classifier. The rules are generated manually based on a set of lexical and emotional clues. We propose a new approach to extract the sentence’s clauses and constitutive grammatical elements and to use them in syntactic and semantic feature generation. The method utilizes a novel method to measure the polarity of the sentence based on the extracted grammatical elements, reaching precision of 41.79 with recall of 55.03 for an f-measure of 47.50. The overall mean f-measure of all submissions was 48.75% with a standard deviation of 7%.
doi:10.4137/BII.S8981
PMCID: PMC3409484  PMID: 22879773
NLP; sentiment analysis; emotion classification; polarity measurement; machine learning
18.  A Naïve Bayes Approach to Classifying Topics in Suicide Notes 
Biomedical Informatics Insights  2012;5(Suppl. 1):87-97.
The authors present a system developed for the 2011 i2b2 Challenge on Sentiment Classification, whose aim was to automatically classify sentences in suicide notes using a scheme of 15 topics, mostly emotions. The system combines machine learning with a rule-based methodology. The features used to represent a problem were based on lexico–semantic properties of individual words in addition to regular expressions used to represent patterns of word usage across different topics. A naïve Bayes classifier was trained using the features extracted from the training data consisting of 600 manually annotated suicide notes. Classification was then performed using the naïve Bayes classifier as well as a set of pattern–matching rules. The classification performance was evaluated against a manually prepared gold standard consisting of 300 suicide notes, in which 1,091 out of a total of 2,037 sentences were associated with a total of 1,272 annotations. The competing systems were ranked using the micro-averaged F-measure as the primary evaluation metric. Our system achieved the F-measure of 53% (with 55% precision and 52% recall), which was significantly better than the average performance of 48.75% achieved by the 26 participating teams.
doi:10.4137/BII.S8945
PMCID: PMC3409485  PMID: 22879764
natural language processing; sentiment analysis; topic classification; naïve Bayes classifier
19.  Fine-Grained Emotion Detection in Suicide Notes: A Thresholding Approach to Multi-Label Classification 
Biomedical Informatics Insights  2012;5(Suppl. 1):61-69.
We present a system to automatically identify emotion-carrying sentences in suicide notes and to detect the specific fine-grained emotion conveyed. With this system, we competed in Track 2 of the 2011 Medical NLP Challenge,14 where the task was to distinguish between fifteen emotion labels, from guilt, sorrow, and hopelessness to hopefulness and happiness.
Since a sentence can be annotated with multiple emotions, we designed a thresholding approach that enables assigning multiple labels to a single instance. We rely on the probability estimates returned by an SVM classifier and experimentally set thresholds on these probabilities. Emotion labels are assigned only if their probability exceeds a certain threshold and if the probability of the sentence being emotion-free is low enough. We show the advantages of this thresholding approach by comparing it to a naïve system that assigns only the most probable label to each test sentence, and to a system trained on emotion-carrying sentences only.
doi:10.4137/BII.S8966
PMCID: PMC3409486  PMID: 22879761
emotion detection; multi-label classification; thresholds; probability estimates
20.  Leveraging Psycholinguistic Resources and Emotional Sequence Models for Suicide Note Emotion Annotation 
Biomedical Informatics Insights  2012;5(Suppl. 1):155-163.
We describe the submission entered by SRI International and UC Davis for the I2B2 NLP Challenge Track 2. Our system is based on a machine learning approach and employs a combination of lexical, syntactic, and psycholinguistic features. In addition, we model the sequence and locations of occurrence of emotions found in the notes. We discuss the effect of these features on the emotion annotation task, as well as the nature of the notes themselves. We also explore the use of bootstrapping to help account for what appeared to be annotator fatigue in the data. We conclude a discussion of future avenues for improving the approach for this task, and also discuss how annotations at the word span level may be more appropriate for this task than annotations at the sentence level.
doi:10.4137/BII.S8979
PMCID: PMC3409487  PMID: 22879772
emotion detection; natural language processing; suicide note; psycholinguistic resources
21.  A Hybrid Approach to Sentiment Sentence Classification in Suicide Notes 
Biomedical Informatics Insights  2012;5(Suppl. 1):43-50.
This paper describes the sentiment classification system developed by the Mayo Clinic team for the 2011 I2B2/VA/Cincinnati Natural Language Processing (NLP) Challenge. The sentiment classification task is to assign any pertinent emotion to each sentence in suicide notes. We have implemented three systems that have been trained on suicide notes provided by the I2B2 challenge organizer—a machine learning system, a rule-based system, and a system consisting of a combination of both. Our machine learning system was trained on re-annotated data in which apparently inconsistent emotion assignment was adjusted. Then, the machine learning methods by RIPPER and multinomial Naïve Bayes classifiers, manual pattern matching rules, and the combination of the two systems were tested to determine the emotions within sentences. The combination of the machine learning and rule-based system performed best and produced a micro-average F-score of 0.5640.
doi:10.4137/BII.S8961
PMCID: PMC3409488  PMID: 22879759
sentiment classification; suicidal emotion; natural language processing; machine learning
22.  Emotion Detection in Suicide Notes using Maximum Entropy Classification 
Biomedical Informatics Insights  2012;5(Suppl. 1):51-60.
An ensemble of supervised maximum entropy classifiers can accurately detect and identify sentiments expressed in suicide notes. Using lexical and syntactic features extracted from a training set of externally annotated suicide notes, we trained separate classifiers for each of fifteen pre-specified emotions. This formed part of the 2011 i2b2 NLP Shared Task, Track 2. The precision and recall of these classifiers related strongly with the number of occurrences of each emotion in the training data. Evaluating on previously unseen test data, our best system achieved an F1 score of 0.534.
doi:10.4137/BII.S8972
PMCID: PMC3409489  PMID: 22879760
natural language processing; text analysis; emotion classification; suicide notes
23.  LASSA: Emotion Detection via Information Fusion 
Biomedical Informatics Insights  2012;5(Suppl. 1):71-76.
Due to the complexity of emotions in suicide notes and the subtle nature of sentiments, this study proposes a fusion approach to tackle the challenge of sentiment classification in suicide notes: leveraging WordNet-based lexicons, manually created rules, character-based n-grams, and other linguistic features. Although our results are not satisfying, some valuable lessons are learned and promising future directions are identified.
doi:10.4137/BII.S8949
PMCID: PMC3409490  PMID: 22879762
fusion; dependency parsing; character n-grams
24.  Topic Categorisation of Statements in Suicide Notes with Integrated Rules and Machine Learning 
Biomedical Informatics Insights  2012;5(Suppl. 1):115-124.
We describe and evaluate an automated approach used as part of the i2b2 2011 challenge to identify and categorise statements in suicide notes into one of 15 topics, including Love, Guilt, Thankfulness, Hopelessness and Instructions. The approach combines a set of lexico-syntactic rules with a set of models derived by machine learning from a training dataset. The machine learning models rely on named entities, lexical, lexico-semantic and presentation features, as well as the rules that are applicable to a given statement. On a testing set of 300 suicide notes, the approach showed the overall best micro F-measure of up to 53.36%. The best precision achieved was 67.17% when only rules are used, whereas best recall of 50.57% was with integrated rules and machine learning. While some topics (eg, Sorrow, Anger, Blame) prove challenging, the performance for relatively frequent (eg, Love) and well-scoped categories (eg, Thankfulness) was comparatively higher (precision between 68% and 79%), suggesting that automated text mining approaches can be effective in topic categorisation of suicide notes.
doi:10.4137/BII.S8978
PMCID: PMC3409492  PMID: 22879767
text mining; text classification; suicide notes; sentiment mining
25.  Suicide Note Sentiment Classification: A Supervised Approach Augmented by Web Data 
Biomedical Informatics Insights  2012;5(Suppl. 1):31-41.
Objective:
To create a sentiment classification system for the Fifth i2b2/VA Challenge Track 2, which can identify thirteen subjective categories and two objective categories.
Design:
We developed a hybrid system using Support Vector Machine (SVM) classifiers with augmented training data from the Internet. Our system consists of three types of classification-based systems: the first system uses spanning n-gram features for subjective categories, the second one uses bag-of-n-gram features for objective categories, and the third one uses pattern matching for infrequent or subtle emotion categories. The spanning n-gram features are selected by a feature selection algorithm that leverages emotional corpus from weblogs. Special normalization of objective sentences is generalized with shallow parsing and external web knowledge. We utilize three sources of web data: the weblog of LiveJournal which helps to improve the feature selection, the eBay List which assists in special normalization of information and instructions categories, and the suicide project web which provides unlabeled data with similar properties as suicide notes.
Measurements:
The performance is evaluated by the overall micro-averaged precision, recall and F-measure.
Result:
Our system achieved an overall micro-averaged F-measure of 0.59. Happiness_peacefulness had the highest F-measure of 0.81. We were ranked as the second best out of 26 competing teams.
Conclusion:
Our results indicated that classifying fine-grained sentiments at sentence level is a non-trivial task. It is effective to divide categories into different groups according to their semantic properties. In addition, our system performance benefits from external knowledge extracted from publically available web data of other purposes; performance can be further enhanced when more training data is available.
doi:10.4137/BII.S8956
PMCID: PMC3409493  PMID: 22879758
sentiment analysis; suicide note; spanning n-gram; web data; supervised approach

Results 1-25 (31)