PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-2 (2)
 

Clipboard (0)
None
Journals
Authors
Year of Publication
Document Types
1.  Suicide Note Classification Using Natural Language Processing: A Content Analysis 
Biomedical informatics insights  2010;2010(3):19-28.
Suicide is the second leading cause of death among 25–34 year olds and the third leading cause of death among 15–25 year olds in the United States. In the Emergency Department, where suicidal patients often present, estimating the risk of repeated attempts is generally left to clinical judgment. This paper presents our second attempt to determine the role of computational algorithms in understanding a suicidal patient’s thoughts, as represented by suicide notes. We focus on developing methods of natural language processing that distinguish between genuine and elicited suicide notes. We hypothesize that machine learning algorithms can categorize suicide notes as well as mental health professionals and psychiatric physician trainees do. The data used are comprised of suicide notes from 33 suicide completers and matched to 33 elicited notes from healthy control group members. Eleven mental health professionals and 31 psychiatric trainees were asked to decide if a note was genuine or elicited. Their decisions were compared to nine different machine-learning algorithms. The results indicate that trainees accurately classified notes 49% of the time, mental health professionals accurately classified notes 63% of the time, and the best machine learning algorithm accurately classified the notes 78% of the time. This is an important step in developing an evidence-based predictor of repeated suicide attempts because it shows that natural language processing can aid in distinguishing between classes of suicidal notes.
PMCID: PMC3107011  PMID: 21643548
suicide; suicide prediction; suicide notes; machine learning
2.  A Method to Detect Differential Gene expression in Cross-Species Hybridization Experiments at Gene and Probe Level 
Biomedical informatics insights  2010;2010(3):1-10.
Motivation
Whole genome microarrays are increasingly becoming the method of choice to study responses in model organisms to disease, stressors or other stimuli. However, whole genome sequences are available for only some model organisms, and there are still many species whose genome sequences are not yet available. Cross-species studies, where arrays developed for one species are used to study gene expression in a closely related species, have been used to address this gap, with some promising results. Current analytical methods have included filtration of some probes or genes that showed low hybridization activities. But consensus filtration schemes are still not available.
Results
A novel masking procedure is proposed based on currently available target species sequences to filter out probes and study a cross-species data set using this masking procedure and gene-set analysis. Gene-set analysis evaluates the association of some priori defined gene groups with a phenotype of interest. Two methods, Gene Set Enrichment Analysis (GSEA) and Test of Test Statistics (ToTS) were investigated. The results showed that masking procedure together with ToTS method worked well in our data set. The results from an alternative way to study cross-species hybridization experiments without masking are also presented. We hypothesize that the multi-probes structure of Affymetrix microarrays makes it possible to aggregate the effects of both well-hybridized and poorly-hybridized probes to study a group of genes. The principles of gene-set analysis were applied to the probe-level data instead of gene-level data. The results showed that ToTS can give valuable information and thus can be used as a powerful technique for analyzing cross-species hybridization experiments.
Availability
Software in the form of R code is available at http://anson.ucdavis.edu/~ychen/cross-species.html
PMCID: PMC2928260  PMID: 20798791
gene expression; cross-species; probes; genes; hybridization

Results 1-2 (2)