The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 106 records and 104 dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.
Diagnostic information for psychiatric research often depends on both clinical interviews and medical records. Although discrepancies between these two sources are well known, there have been few studies into the degree and origins of inconsistencies.
We compared data from structured interviews and medical records on 1,970 Han Chinese women with recurrent DSM-IV major depression (MD). Correlations were high for age at onset of MD (0.93) and number of episodes (0.70), intermediate for family history (+0.62) and duration of longest episode (+0.43) and variable but generally more modest for individual depressive symptoms (mean kappa = 0.32). Four factors were identified for twelve symptoms from medical records and the same four factors emerged from analysis of structured interviews. Factor congruencies were high but the correlation of factors between interviews and records were modest (i.e. +0.2 to +0.4).
Structured interviews and medical records are highly concordant for age of onset, and the number and length of episodes, but agree more modestly for individual symptoms and symptom factors. The modesty of these correlations probably arises from multiple factors including i) inconsistency in the definition of the worst episode, ii) inaccuracies in self-report and iii) difficulties in coding medical records where symptoms were recorded solely for clinical purposes.
Oxidative stress plays a critical role in the etiology and pathogenesis of neurodegenerative disorders, and the molecular mechanisms that control the neuron response to ROS have been extensively studied. However, the oxidative stress-effect on miRNA expression in hippocampal neurons has not been investigated, and little is known on the effect of ROS-modulated miRNAs on cell function. In this study, H2O2 was used to stimulate the mouse primary hippocampal neurons to develop an oxidative stress cell model. The alterations of miRNAs expression were detected by microarray analysis and five miRNAs were validated by real-time RT-PCR. The bioinformatic analysis of deregulated miRNAs was performed to determine their potential roles in the pathogenesis of neurological disorders. We found that H2O2 mediated a total of 101 deregulated miRNAs, which mainly took part in the regulation of the MAPK pathway. Among them, miR-135b and miR-708 were up-regulated significantly and their targets were predicted to be involved in DNA recombination, protein ubiquitination, protein autophosphorylation and development of neurons. These results demonstrated that oxidative stress alters the miRNA expression profile of hippocampal neurons, and the deregulated miRNAs might play a potential role in the pathogenesis of neurodegenerative diseases, such as Alzheimer’s disease (AD).
oxidative stress; microRNA; hippocampal neurons; array analysis; Alzheimer disease
The relationship between major depressive disorder (MDD) and dysthymia, a form of chronic depression, is complex. The two conditions are highly comorbid and it is unclear whether they are two separate disease entities. We investigated the extent to which patients with dysthymia superimposed on major depression can be distinguished from those with recurrent MDD.
We examined the clinical features in 1970 Han Chinese women with MDD (DSM-IV) between 30 and 60 years of age across China. Logistic regression was used to determine the association between clinical features of MDD and dysthymia and between dysthymia and disorders comorbid with major depression.
The 354 cases with dysthymia had more severe MDD than those without, with more episodes of MDD and greater co-morbidity for anxiety disorders. Patients with dysthymia had higher neuroticism scores and were more likely to have a family history of MDD. They were also more likely to have suffered serious life events.
Results were obtained in a clinically ascertained sample of Chinese women and may not generalize to community-acquired samples or to other populations. It is not possible to determine whether the associations represent causal relationships.
The additional diagnosis of dysthymia in Chinese women with recurrent MDD defines a meaningful and potentially important subtype. We conclude that in some circumstances it is possible to distinguish double depression from recurrent MDD.
Major depressive disorder; Dysthymia; Symptom; Comorbidity
Individuals with early-onset depression may be a clinically distinct group with particular symptom patterns, illness course, comorbidity and family history. This question has not been previously investigated in a Han Chinese population.
We examined the clinical features of 1970 Han Chinese women with DSM-IV major depressive disorder (MDD) between 30 and 60 years of age across China. Analysis of linear, logistic and multiple logistic regression models was used to determine the association between age at onset (AAO) with continuous, binary and discrete characteristic clinical features of MDD.
Earlier AAO was associated with more suicidal ideation and attempts and higher neuroticism, but fewer sleep, appetite and weight changes. Patients with an earlier AAO were more likely to suffer a chronic course (longer illness duration, more MDD episodes and longer index episode), increased rates of MDD in their parents and a lower likelihood of marriage. They tend to have higher comorbidity with anxiety disorders (general anxiety disorder, social phobia and agoraphobia) and dysthymia.
Early AAO in MDD may be an index of a more severe, highly comorbid and familial disorder. Our findings indicate that the features of MDD in China are similar to those reported elsewhere in the world.
Major depressive disorder; Age at onset; Symptom; Comorbidity
Ensemble methods have been widely used to improve prediction accuracy over individual classifiers. In this paper, we achieve a few results about the prediction accuracies of ensemble methods for binary classification that are missed or misinterpreted in previous literature. First we show the upper and lower bounds of the prediction accuracies (i.e. the best and worst possible prediction accuracies) of ensemble methods. Next we show that an ensemble method can achieve > 0.5 prediction accuracy, while individual classifiers have < 0.5 prediction accuracies. Furthermore, for individual classifiers with different prediction accuracies, the average of the individual accuracies determines the upper and lower bounds. We perform two experiments to verify the results and show that it is hard to achieve the upper and lower bounds accuracies by random individual classifiers and better algorithms need to be developed.
ensemble methods; binary classification; prediction accuracy; upper bound; lower bound
As a growing number of protein structures are resolved without known functions, using computational methods to help predict protein functions from the structures becomes more and more important. Some computational methods predict protein functions by aligning to homologous proteins with known functions, but they fail to work if such homology cannot be identified. In this paper we classify enzymes/non-enzymes using non-alignment features. We propose a new ensemble method that includes three support vector machines (SVM) and two k-nearest neighbor algorithms (k-NN) and uses a simple majority voting rule. The test on a data set of 697 enzymes and 480 non-enzymes adapted from Dobson and Doig shows 85.59% accuracy in a 10-fold cross validation and 86.49% accuracy in a leave-one-out validation. The prediction accuracy is much better than other non-alignment features based methods and even slightly better than alignment features based methods. To our knowledge, our method is the first time to use ensemble methods to classify enzymes/non-enzymes and is superior over a single classifier.
enzyme/non-enzyme classification; ensemble methods; support vector machine; k-nearest neighbour algorithm
Although accurate details in RNA structure are of great importance for understanding RNA function, the backbone conformation is difficult to determine, and most existing RNA structures show serious steric clashes (≥ 0.4Å overlap) when hydrogen atoms are taken into account. We have developed a program called RNABC (RNA Backbone Correction) that performs local perturbations to search for alternative conformations that avoid those steric clashes or other local geometry problems. Its input is an all-atom coordinate file for an RNA crystal structure (usually from the MolProbity web service), with problem areas specified. RNABC rebuilds a suite (the unit from sugar to sugar) by anchoring the phosphorus and base positions, which are clearest in crystallographic electron density, and reconstructing the other atoms using forward kinematics. Geometric parameters are constrained within user-specified tolerance of canonical or original values, and torsion angles are constrained to ranges defined through empirical database analyses. Several optimizations reduce the time required to search the many possible conformations. The output results are clustered and presented to the user, who can choose whether to accept one of the alternative conformations.
Two test evaluations show the effectiveness of RNABC, first on the S-motifs from 42 RNA structures, and second on the worst problem suites (clusters of bad clashes, or serious sugar pucker outliers) in 25 unrelated RNA structures. Among the 101 S-motifs, 88 had diagnosed problems, and RNABC produced clash-free conformations with acceptable geometry for 71 of those (about 80%). For the 154 worst problem suites, RNABC proposed alternative conformations for 72. All but 8 of those were judged acceptable after examining electron density (where available) and local conformation. Thus, even for these worst cases, nearly half the time RNABC suggested corrections suitable to initiate further crystallographic refinement. The program is available from http://kinemage.biochem.duke.edu.
kinematic chain; RNA backbone conformation; RNA backbone adjustment; RNA crystallography; automated rebuilding; steric clash; S-motifs; all-atom contacts; structure validation
MolProbity is a general-purpose web server offering quality validation for 3D structures of proteins, nucleic acids and complexes. It provides detailed all-atom contact analysis of any steric problems within the molecules as well as updated dihedral-angle diagnostics, and it can calculate and display the H-bond and van der Waals contacts in the interfaces between components. An integral step in the process is the addition and full optimization of all hydrogen atoms, both polar and nonpolar. New analysis functions have been added for RNA, for interfaces, and for NMR ensembles. Additionally, both the web site and major component programs have been rewritten to improve speed, convenience, clarity and integration with other resources. MolProbity results are reported in multiple forms: as overall numeric scores, as lists or charts of local problems, as downloadable PDB and graphics files, and most notably as informative, manipulable 3D kinemage graphics shown online in the KiNG viewer. This service is available free to all users at http://molprobity.biochem.duke.edu.
Studies conducted in Europe and the USA have shown that co-morbidity between major depressive disorder (MDD) and anxiety disorders is associated with various MDD-related features, including clinical symptoms, degree of familial aggregation and socio-economic status. However, few studies have investigated whether these patterns of association vary across different co-morbid anxiety disorders. Here, using a large cohort of Chinese women with recurrent MDD, we examine the prevalence and associated clinical features of co-morbid anxiety disorders.
A total of 1970 female Chinese MDD patients with or without seven co-morbid anxiety disorders [including generalized anxiety disorder (GAD), panic disorder, and five phobia subtypes] were ascertained in the CONVERGE study. Generalized linear models were used to model association between co-morbid anxiety disorders and various MDD features.
The lifetime prevalence rate for any type of co-morbid anxiety disorder is 60.2%. Panic and social phobia significantly predict an increased family history of MDD. GAD and animal phobia predict an earlier onset of MDD and a higher number of MDD episodes, respectively. Panic and GAD predict a higher number of DSM-IV diagnostic criteria. GAD and blood-injury phobia are both significantly associated with suicidal attempt with opposite effects. All seven co-morbid anxiety disorders predict higher neuroticism.
Patterns of co-morbidity between MDD and anxiety are consistent with findings from the US and European studies; the seven co-morbid anxiety disorders are heterogeneous when tested for association with various MDD features.
Co-morbid anxiety disorders; major depression
A number of clinical features potentially reflect an individual's familial vulnerability to major depression (MD), including early age at onset, recurrence, impairment, episode duration, and the number and pattern of depressive symptoms. However, these results are drawn from studies that have exclusively examined individuals from a European ethnic background. We investigated which clinical features of depressive illness index familial vulnerability in Han Chinese females with MD.
We used lifetime MD and associated clinical features assessed at personal interview in 1,970 Han Chinese women with DSM-IV MD between 30–60 years of age. Odds Ratios were calculated by logistic regression.
Individuals with a high familial risk for MD are characterized by severe episodes of MD without known precipitants (such as stress life events) and are less likely to feel irritable/angry or anxious/nervous.
The association between family history of MD and the lack of a precipitating stressor, traditionally a characteristic of endogenous or biological depression, may reflect the association seen in other samples between recurrent MD and a positive family history. The symptomatic associations we have seen may reflect a familial predisposition to other dimensions of psychopathology, such as externalizing disorders or anxiety states. Depression and Anxiety 0:1–6, 2011. © 2011 Wiley-Liss, Inc.
major depression; family history; symptom; life events