|Home | About | Journals | Submit | Contact Us | Français|
A growing body of evidence has put forward clinical risk factors associated with patients with mood disorders that attempt suicide. However, what is not known is how to integrate clinical variables into a clinically useful tool in order to estimate the probability of an individual patient attempting suicide.
A total of 144 patients with mood disorders were included. Clinical variables associated with suicide attempts among patients with mood disorders and demographic variables were used to ‘train’ a machine learning algorithm. The resulting algorithm was utilized in identifying novel or ‘unseen’ individual subjects as either suicide attempters or non-attempters. Three machine learning algorithms were implemented and evaluated.
All algorithms distinguished individual suicide attempters from non-attempters with prediction accuracy ranging between 65%-72% (p<0.05). In particular, the relevance vector machine (RVM) algorithm correctly predicted 103 out of 144 subjects translating into 72% accuracy (72.1% sensitivity and 71.3% specificity) and an area under the curve of 0.77 (p<0.0001). The most relevant predictor variables in distinguishing attempters from non-attempters included previous hospitalizations for depression, a history of psychosis, cocaine dependence and post-traumatic stress disorder (PTSD) comorbidity.
Risk for suicide attempt among patients with mood disorders can be estimated at an individual subject level by incorporating both demographic and clinical variables. Future studies should examine the performance of this model in other populations and its subsequent utility in facilitating selection of interventions to prevent suicide.
When a patient with a depressive episode and no known previous history of suicide attempts first presents for treatment, what is the likelihood that this individual patient will attempt suicide? This question is a major unmet challenge in mood disorder treatment and also an important question for society at large, as suicide is a tragic but highly preventable event (Mann et al., 2005). Suicide accounted for 4.8% and 5.7% of total global deaths in female and male subjects aged 15-49 years respectively in 2010 (Lozano et al., 2012). In the same year, suicide was reported as the sixth leading cause of life years lost in North-America (Lozano et al., 2012). Furthermore, it has been reported that approximately 90% of individuals who die through suicide are diagnosed with a mental disorder prior to their death (Arsenault-Lapierre et al., 2004). However, there is little awareness in medical practice on objective stratification of suicide risk, which has led suicide to be referred to as “the quiet epidemic” (Turecki, 2014). The current situation is particularly worrisome in major depressive disorder (MDD) and bipolar disorder (BD), given the high prevalence of these disorders and the strong association between suicide and depressive symptoms.
Suicide is usually viewed as an extreme response to a catastrophic event, such as loss of a close relative (M A Oquendo et al., 2014). However, many individuals, including patients with mood disorders, go through these kinds of stressors and yet they do not attempt suicide (Maria A Oquendo et al., 2014). Consequently, a growing body of knowledge has put forward several risk factors associated with patients that attempt suicide (Brundin et al., 2015; Mann and Currier, 2010; Maria A Oquendo et al., 2014; Turecki, 2014). They are developmental and distal factors (as opposed to precipitating factors, such as loss of a close relative) associated to the suicidal behavior (Turecki, 2014; Turecki et al., 2012). These efforts have largely reported average group-level differences between suicide attempters and non-attempters. However, what is not known is how to integrate these variables to build a signature of suicide attempt and estimate the probability of an individual patient with mood disorder attempting suicide while faced with a stressful event.
In this study we set out to investigate whether a predictive clinical signature derived from easily accessible clinical and demographic variables can objectively identify individual patients likely to attempt suicide. Predictor variables were selected using a priori knowledge. Notably, the majority of clinical variables selected were related to psychiatric comorbidities (Almeida et al., 2012; Bhui et al., 2012; Blackmore et al., 2008; Blanco et al., 2012; Bostwick and Pankratz, 2000; Bottlender et al., 2000; de Araújo et al., 2015; Foley et al., 2006; Galfalvy et al., 2006; Goes et al., 2012; Goldstein et al., 2012; Gonzalez-Pinto et al., 2006; Holma et al., 2014; Isometsä et al., 2014; Johnson et al., 1991; Katz et al., 2011; Lenze et al., 2000; Morina et al., 2013; Neves et al., 2009; Oquendo et al., 2010; Schaffer et al., 2014a, 2014b; Simon et al., 2004; Soloff et al., 2000; Stein, 2006; Torres et al., 2011; Webb et al., 2014), given the recent findings that effects of mental disorders on the risk of suicide attempt were exerted almost exclusively through a general psychopathology factor representing the shared effect across all mental disorders (Hoertel et al., 2015). As a result, we implemented a set of algorithms that are able to integrate the information from multiple variables to subsequently identify an individual patient's probability or risk of being a suicide attempter.
Machine learning in psychiatric research is an emerging field with a great potential for innovation and paradigm shift as these algorithms facilitate integration of multiple measurements as well as allows objective predictions of previously ‘unseen’ observations. Typically, these algorithms are implemented in three key stages. First, subjects' data are separated into algorithm ‘training’ and ‘testing’ sets with the former being used to ‘train’ a machine learning model. Second, the predictive accuracy of the ‘learnt’ model is evaluated using the ‘testing’ set that consists of observations previously ‘unseen’ by the model, and algorithm accuracy, sensitivity and specificity reported. In this study, we implemented three machine learning algorithms namely; least absolute shrinkage and selection operator (LASSO) (Robert Tibshirani, 2011), support vector machines (SVM) (Vapnik, 1998) and relevance vector machine (RVM) (Tipping, 2001). The LASSO technique was implemented using a generalized linear model (GLM) with a LASSO regularization penalty. A detailed discussion of these algorithms in the context of psychiatric research is given elsewhere (Mwangi et al., 2012).
In summary, the main objective of the current study was to establish a clinically useful predictive signature able to accurately determine individual patients' likely to committ suicide. We used easily accessible clinical variables to achieve our aim. Of note, this is a proof-of-concept study due to a small sample size.
The Institutional Review Board of the University of Texas Health Science Center at Houston approved the study. All subjects signed informed consent before any study-related procedures with ample time for questions. Participants included in the current study were recruited from January 2006 to June 2010.
A total of 144 subjects were recruited from the community and psychiatric clinics through flyers, radio, and newspaper advertisements. Inclusion criteria were subjects with MDD or BD types I or II (DSM-IV), and aged between 18 and 65. Exclusion criteria were head trauma with residual effects, neurological disorder, and uncontrolled major medical conditions.
Subjects were evaluated using a socio-demographic history form to assess age, gender, years of education, and occupational status. Axis-I diagnoses and clinical characteristics were assessed with the Structured Clinical Interview for DSM-IV axis-I Disorders (SCID-I), which was administered by fully trained staff. The criterion for suicidality was one or more documented actual suicide attempts assessed by the suicide history form from the Conte Center for the Neuroscience of Mental Disorders (New York State Psychiatric Institute, Columbia University). Current dimensional mood and anxiety symptoms were assessed with the Hamilton Depression Rating Scale (HDRS), the Young Mania Rating Scale (YMRS), and Hamilton Anxiety Rating Scale (HARS). Data from all instruments were collected regardless of treatment status.
Selection of predictor variables to be utilized in ‘training’ an algorithm is a challenge in machine learning. However, a recommended method of selecting relevant predictor variables is the use of expert domain knowledge – largely from previously published literature (Mwangi et al., 2013; Perlis, 2013). We performed a structured search on Pubmed to find relevant studies that report association of clinical and demographic variables included in this study with suicide attempts (Table 1). Of note, we chose patients with mood disorders because a) the developmental variables associated with suicide in both BD and MDD are somewhat similar (see table 1); and b) our goal was to create a tool that could be used with the large number of patients with mood disorders treated in primary care settings (Table 1).
Three machine learning algorithms (LASSO, SVM, and RVM) were implemented in MATLAB (The Mathworks, Inc., Natick, Massachusetts, United States). Specifically, predictor variables (see Table 1) were normalized by z-scoring and together with corresponding categorical labels (1 – suicide attempt, 0 – non-attempt) input into the machine learning algorithm. Here, we briefly describe these machine learning algorithms but detailed and technical discussions are given elsewhere (Robert Tibshirani, 2011; Tipping, 2001; Vapnik, 1998). SVM and RVM are ‘kernel-learning’ algorithms which transform a set of input variables (feature vectors) into a ‘kernel’ or similarity matrix through a ‘kernel function’ and subsequently used in the algorithm ‘learning’ process. These algorithms can utilize multiple types of kernel functions (e.g. linear, polynomial, Gaussian) but in the current study a linear function was utilized in training both SVM and RVM algorithms as described elsewhere (Mwangi et al., 2012). In contrast, the LASSO utilizes a typical linear regression formulation but introduces a ‘penalization’ procedure by assigning some predictor variables zero coefficients or weighting factors - a process also referred to as regularization. The penalization or regularization process reportedly improves the linear regression algorithm prediction accuracy and generalization ability but entails identifying an optimal regularization parameter (Robert Tibshirani, 2011). In the current study, the regularization parameter was identified through a 10-fold cross-validation process using training data only.
Class imbalance is a common problem in machine learning where observations (subjects) in one class (e.g. suicide non-attempters) exceed observations in the other class (e.g. suicide attempters). A typical machine learning algorithm trained using an imbalanced data set assigns new observations to the majority class (e.g. suicide non-attempters) (Dubey et al., 2014). In this study, the class imbalance problem was circumvented by ‘under-sampling’ the majority class (suicide non-attempters) followed by training an algorithm with a balanced sample – a process which was repeated until all observations in the majority class were selected at least once and predictions aggregated as shown in Figure 1.
A unique characteristic of machine learning algorithms is the ability to utilize a cross-validation approach to separate data into model ‘training’ and ‘testing’ sets. In this study, a leave-one-out cross-validation (LOOCV) approach was used. LOOCV involves training a machine learning algorithm with all subjects except one while the ‘left out’ subject is used for algorithm testing. This iterative process is repeated until all subjects are left out for algorithm testing at least once as shown in Figure 1 and described elsewhere (Johnston et al., 2014). Cross-validation is used in machine learning to establish the generalization ability of an algorithm to new or previously ‘unseen’ subjects. The validity of the algorithms in predicting individual suicide attempters from non-attempters was evaluated using prediction accuracy, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). In this study, sensitivity and specificity represented correctly predicted suicide attempters (true positives) and correctly predicted non-attempters (true negatives) respectively. A ‘classical’ prediction accuracy was calculated as the sum of true positives, true negatives and divided by the total sample. In contrast, to account for the class imbalance problem, a ‘balanced’ accuracy was calculated as the average of predicted sensitivity (true positives) and specificity (true negatives) as reported elsewhere (Feis et al., 2013). In addition, PPV was calculated as the proportion of subjects predicted as suicide attempters that were actual positives (attempters) and NPV as the proportion of subjects predicted as non-attempters but were actual non-attempters (true negatives). Receiver operating characteristic curves and the corresponding area under curve were computed and reported. Lastly, chi-square statistical test between actual and machine learning predicted labels were calculated and the results reported. The outcome of a chi-square test was considered significant at p<0.05.
A total of 144 subjects with BD or MDD were included in the study. Table 2 summarizes subjects' clinical and demographic characteristics. All algorithms (RVM, SVM and LASSO) distinguished individual suicide attempters from non-attempters with prediction accuracy ranging (64.7%-72%) and all models were significant (chi-square p<0.05) as detailed in Table 3. In particular, the RVM algorithm correctly identified 103 out of 144 as either attempters or non-attempters. This translated into 72% accuracy (72.1% sensitivity and 71.3% specificity) and significant at χ2 p<0.0001. The RVM receiver operating characteristic (ROC) curve and the ‘confusion matrix’, which were used in calculating the sensitivity, specificity and area under ROC curve values, are shown in Figure 2. To calculate ‘concentration of risk’ – RVM predicted subjects' probabilities were divided into 20 groups of equal sizes (ventiles) and number of subjects correctly predicted within the top two ventiles calculated. In both top two ventiles, 5 out of 7 subjects were correctly predicted – translating into 71.4 % accuracy in both ventiles. Individual subjects' predicted probabilities are shown in Table S1 of supplementary materials.
The most relevant predictor variables in distinguishing suicide attempters from non-attempters that were assigned positive coefficients or weighting factors by RVM – indicating an increase in suicide attempters included; 1) a high number of previous hospitalizations for depression, 2) a history of psychosis, 3) cocaine dependence, and 4) PTSD. In contrast, patients’ age and mood diagnosis (0 - bipolar disorder or 1 - unipolar depression) were assigned negative weighting factors indicating a reduction of these variables was relevant in identifying suicide attempters. Figure 3 shows predictor variables with corresponding weighting factors.
An additional RVM algorithm was evaluated by removing previous number of hospitalizations in the analysis and the algorithm was able to identify individual subjects with accuracy = 68.9%, sensitivity = 74%, specificity = 63%, p<0.05 and area under ROC curve =0.71. This is an indication that past hospitalization variable was not the only contributor to the algorithm performance.
Although risk factors for suicide are widely known, synthesizing easily accessible clinical information to optimize suicide risk identification in subjects with mood disorders has been an elusive goal up to now. This is the first study to evaluate the feasibility of using a clinical tool developed with advanced machine learning algorithms to identifying an individual's risk of being a suicide attempter among patients with mood disorders. All algorithms achieved greater than chance (>50%) accuracy in distinguishing attempters from non-attempters. Most notably, the RVM algorithm achieved the best prediction accuracy 72% and an area under receiver operating characteristic curve of 0.77. Notably, the prediction accuracy, sensitivity and specificity of the machine learning algorithm in identifying previously ‘unseen’ suicide attempters was determined through a robust cross-validation approach. The main outcome of the machine learning algorithm was a probability score able to quantify the risk of a subject being a suicide attempter.
Most relevant predictor variables distinguishing suicide attempters from non-attempters by RVM included number of prior hospitalizations for depression, lifetime psychotic symptoms, and comorbidity with cocaine dependence or PTSD. Previous average group level studies showed that these variables are associated with suicide attempts in patients with BD. In this sense, a prospective study with 72 bipolar I disorder patients showed that suicidal risk was higher with more hospitalization (Gonzalez-Pinto et al., 2006), while two studies showed that psychotic symptoms were associated with suicidality in patients with MDD and BD (Bottlender et al., 2000; Johnson et al., 1991). In addition, a cross-sectional study with 500 patients with BD reported that patients with BD and comorbid PTSD presented more suicide attempts compared against patients with BD and without PTSD (Simon et al., 2004). Finally, a review suggested that there is cross-sensitization between cocaine use and number of mood episodes in patients with bipolar disorder leading to faster illness progression, which may potentially increase the number of suicide attempts (Post and Kalivas, 2013).
In contrast with other branches of medicine (e.g. cardiology, oncology) – the lack of risk stratification approaches in psychiatric clinical practice is notable (Kapczinski and Passos, 2015). Noteworthy risk stratification applications include, prediction of cardiac arrest risk, cancer screening, and prediction of mortality risk in intensive care units (Anothaisintawee et al., 2012; Lee et al., 1999). Indeed, in personalized medicine, risk stratification plays an important role in treatment planning from the time of diagnosis. For instance, in acute care of myocardial infarction efforts to develop treatment algorithms on the basis of clinical presentation are increasingly integrated in clinical practice (5).
One important question is how this suicide risk stratification tool can be applied in a clinical practice to optimize individual patient treatment? For example, it is possible that individual patients with a high risk of attempting suicide as identified by the algorithm would benefit from preventive strategies or antisuicidal drugs. For instance, a randomized clinical trial of suicidal patients reported that a selective serotonin-reuptake inhibitor led to a greater acute improvement in suicidal ideation compared to a noradrenergic-dopaminergic drug. Notably, this therapeutic advantage was greater in patients with the most severe suicidal ideation (Grunebaum et al., 2012). Furthermore, evidence from meta-analysis reported that lithium reduces suicide attempts (Cipriani et al., 2013). In addition, clozapine recently emerged as one option for treatment-resistant patients with BD and also showed a protective effect against suicidal behavior compared with olanzapine (Meltzer et al., 2003). These studies further highlight the need to develop novel suicide stratification algorithms as presented in this study and allow individualized treatments and care for patients predicted with a high risk of attempting suicide.
The current study has some potential limitations. The study sample is small and fromthe Texas region which may not fully represent the entire United States population. This mayexplain the reason why comorbid generalized anxiety disorder (GAD) and obsessive-compulsive disorder (OCD) findings were not fully in agreement with previous studies.However, multivariate machine learning techniques may have found unexpected interactionsamong predictive variables and therefore unexpected results on GAD and OCD. Therefore,future studies should confirm or reject these results. In this sense, the current study serves as aproof-of-concept and future longitudinal studies should collect larger samples from multiplecenters to replicate current findings. In line with our hypothesis, our findings are related topatients with mood disorder and may not be generalized to patient with other mentaldisorders. However, future studies may utilize a similar approach to accommodate othermental disorders. Of note, borderline personality disorder that is also shown to be associatedwith suicide attempts in patients with mood disorders (Schaffer et al., 2014a) was notincluded in the algorithm due to missing data in our database. Notwithstanding theselimitations, this study is the first to describe and implement a risk stratification tool usingmachine learning algorithms to identify an individual's probability of attempting suicideamong patients with mood disorders. This algorithm could be used in a primary care settingby general practitioners to identify the risk of a mood disorder patient attempting suicide.Future work will focus on validating the algorithm using a larger sample as well asdeveloping a web-based suicide risk calculator. Importantly, the strengths of our studyinclude the application and testing of a multivariate machine learning algorithm in suicidestratification leading to a high area under the ROC curve of 0.77. Notably, a recent studyreported that most breast cancer prediction algorithms, including the widely-studied ‘Gailscore’ report area under ROC curves below .70 (Anothaisintawee et al., 2012). Indeed, machine learning is a fascinating field that may change the traditional doctor-patient relationship in psychiatry. We hypothesize that highly accurate predictive models will support important clinical decisions such as selection of treatment options, preventive strategies, and prognosis orientations. For instance, several studies recently utilized machine learning techniques to predict treatment response using functional brain scans and neurocognitive data (Hahn et al., 2015; Johnston et al., 2015; van Waarde et al., 2015). Moreover, a recent study reported a prediction model able to estimate the risk of renal failure in patients with BD treated with lithium (Castro et al., 2015). Therefore, future studies should develop web-based calculators and medical devices based on validated models to allow translation of these predictive models into clinical practice.
In summary, we report a highly accurate algorithm able to identify suicide attempts in patients with mood disorders using clinical and demographic data. These results suggest that it is possible to utilize clinical measures in identifying individual patients at greater risk of attempting suicide. Lastly, we hypothesize that the combination of genetics, blood markers, neuroimaging, and the clinical signature together with a machine learning algorithm could generate an even more accurate multi-modal signature.
Our study was supported in part by grants NIH grant MH68766, NIMH grant R01 085667, NIH grant RR2057, and the John S. Dunn Foundation from United States. Dr Jair C. Soares received these grants. ICP was supported by scholarship from Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior (CAPES) and JQ, MKS, and FK were supported by Conselho Nacional de Desenvolvimento (CNPq).
Role of Funding Source: The funders had no role in the design, data collection, data management, data analysis and data interpretation.
Contributors: Dr Passos participated in the study design, data collection, data analysis, interpretation of findings, literature search, writing, implementation and approval of final manuscript. Dr Mwangi participated in the study design, writing, figures, tables, and approval of final manuscript. Dr Cao participated in the study design, literature search, figures, tables, and approval of final manuscript. Dr Hamilton participated in the literature search, writing, and approval of final manuscript. Dr Wu participated in literature search, figures, tables, and approval of final manuscript. Dr Zhang participated in the literature search, writing and approval of final manuscript. Dr Zunta-Soares participated in literature search, writing, and approval of final manuscript. Dr Quevedo participated in the study design, interpretation of findings, writing, and approval of final manuscript. Dr Kauer-Sant'Anna participated in the study design, interpretation of findings, writing, and approval of final manuscript. Dr Kapczinski participated in the study design, data analysis, interpretation of findings, literature search, writing, and approval of final manuscript. Dr Soares participated in the study design, data analysis, interpretation of findings, literature search, writing, and approval of final manuscript.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.