Search tips
Search criteria

Results 1-10 (10)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Letter to the Editor: On the stability and ranking of predictors from random forest variable importance measures 
Briefings in Bioinformatics  2011;12(4):369-373.
A recent study examined the stability of rankings from random forests using two variable importance measures (mean decrease accuracy (MDA) and mean decrease Gini (MDG)) and concluded that rankings based on the MDG were more robust than MDA. However, studies examining data-specific characteristics on ranking stability have been few. Rankings based on the MDG measure showed sensitivity to within-predictor correlation and differences in category frequencies, even when the number of categories was held constant, and thus may produce spurious results. The MDA measure was robust to these data characteristics. Further, under strong within-predictor correlation, MDG rankings were less stable than those using MDA.
PMCID: PMC3137934  PMID: 21498552
Random forest; variable importance measures; stability; ranking; correlation; linkage disequilibrium
3.  Linkage disequilibrium and age of HLA region SNPs in relation to classic HLA gene alleles within Europe 
The HLA region on chromosome 6 is gene-rich and under selective pressure because of the high proportion of immunity-related genes. Linkage disequilibrium (LD) patterns and allele frequencies in this region are highly differentiated across broad geographical populations, making it a region of interest for population genetics and immunity-related disease studies. We examined LD in this important region of the genome in six European populations using 166 putatively neutral SNPs and the classical HLA-A, -B and -C gene alleles. We found that the pattern of association between classic HLA gene alleles and SNPs implied that most of the SNPs predated the origin of classic HLA gene alleles. The SNPs most strongly associated with HLA gene alleles were in some cases highly predictive of the HLA allele carrier status (misclassification rates ranged from <1 to 27%) in independent populations using five or fewer SNPs, a much smaller number than tagSNP panels previously proposed and often with similar accuracy, showing that our approach may be a viable solution to designing new HLA prediction panels. To describe the LD within this region, we developed a new haplotype clustering method/software based on r2, which may be more appropriate for use within regions of strong LD. Haplotype blocks created using this proposed method, as well as classic HLA gene alleles and SNPs, were predictive of a northern versus southern European population membership (misclassification error rates ranged from 0 to 23%, depending on which independent population was used for prediction), indicating that this region may be a rich source of ancestry informative markers.
PMCID: PMC2987379  PMID: 20354563
HLA; population genetics; Europe; LD; haplotype
4.  The behaviour of random forest permutation-based variable importance measures under predictor correlation 
BMC Bioinformatics  2010;11:110.
Random forests (RF) have been increasingly used in applications such as genome-wide association and microarray studies where predictor correlation is frequently observed. Recent works on permutation-based variable importance measures (VIMs) used in RF have come to apparently contradictory conclusions. We present an extended simulation study to synthesize results.
In the case when both predictor correlation was present and predictors were associated with the outcome (HA), the unconditional RF VIM attributed a higher share of importance to correlated predictors, while under the null hypothesis that no predictors are associated with the outcome (H0) the unconditional RF VIM was unbiased. Conditional VIMs showed a decrease in VIM values for correlated predictors versus the unconditional VIMs under HA and was unbiased under H0. Scaled VIMs were clearly biased under HA and H0.
Unconditional unscaled VIMs are a computationally tractable choice for large datasets and are unbiased under the null hypothesis. Whether the observed increased VIMs for correlated predictors may be considered a "bias" - because they do not directly reflect the coefficients in the generating model - or if it is a beneficial attribute of these VIMs is dependent on the application. For example, in genetic association studies, where correlation between markers may help to localize the functionally relevant variant, the increased importance of correlated predictors may be an advantage. On the other hand, we show examples where this increased importance may result in spurious signals.
PMCID: PMC2848005  PMID: 20187966
5.  A novel, primate-specific, brain isoform of KCNH2 impacts cortical physiology, cognition, neuronal repolarization and risk for schizophrenia 
Nature medicine  2009;15(5):509-518.
Organized neuronal firing is critical for cortical processing and is disrupted in schizophrenia. Using 5’ RACE in human brain, we identified a primate-specific isoform (3.1) of the K+-channel KCNH2 that modulates neuronal firing. KCNH2-3.1 mRNA levels are comparable to KCNH2-1A in brain, but 1000-fold lower in heart. In schizophrenic hippocampus, KCNH2-3.1 expression is 2.5-fold greater than KCNH2-1A. A meta-analysis of 5 clinical samples (367 families, 1158 unrelated cases, 1704 controls) shows association of SNPs in KCNH2 with schizophrenia. Risk-associated alleles predict lower IQ scores and speed of cognitive processing, altered memory-linked fMRI signals, and increased KCNH2-3.1 expression in post-mortem hippocampus. KCNH2-3.1 lacks a domain critical for slow channel deactivation. Overexpression of KCNH2-3.1 in primary cortical neurons induces a rapidly deactivating K+ current and a high-frequency, non-adapting firing pattern. These results identify a novel KCNH2 channel involved in cortical physiology, cognition, and psychosis, providing a potential new psychotherapeutic drug target.
PMCID: PMC2756110  PMID: 19412172
6.  Functional Polymorphisms in PRODH Are Associated with Risk and Protection for Schizophrenia and Fronto-Striatal Structure and Function 
PLoS Genetics  2008;4(11):e1000252.
PRODH, encoding proline oxidase (POX), has been associated with schizophrenia through linkage, association, and the 22q11 deletion syndrome (Velo-Cardio-Facial syndrome). Here, we show in a family-based sample that functional polymorphisms in PRODH are associated with schizophrenia, with protective and risk alleles having opposite effects on POX activity. Using a multimodal imaging genetics approach, we demonstrate that haplotypes constructed from these risk and protective functional polymorphisms have dissociable correlations with structure, function, and connectivity of striatum and prefrontal cortex, impacting critical circuitry implicated in the pathophysiology of schizophrenia. Specifically, the schizophrenia risk haplotype was associated with decreased striatal volume and increased striatal-frontal functional connectivity, while the protective haplotype was associated with decreased striatal-frontal functional connectivity. Our findings suggest a role for functional genetic variation in POX on neostriatal-frontal circuits mediating risk and protection for schizophrenia.
Author Summary
Schizophrenia is a major mental illness affecting 1% of the population. It is known that genetics plays a role in the disease susceptibility, and it is thought that the illness is a complex disorder involving multiple genes. We show that the schizophrenia susceptibility gene, PRODH, conveys its risk through a variation that increases its enzyme activity. We further show that protection is associated with variations that decrease enzyme activity and these protective variations are enriched in their unaffected siblings. We then used brain imaging of structure and memory function to dissect the risk and protective haplotypes differential effects, and found that the schizophrenia risk haplotype was associated with decreased striatal gray matter volume and increased subcortical to frontal lobe functional connectivity, while the schizophrenia protective haplotype was associated with trend-level increase of frontal lobe volume and decreased subcortical to frontal lobe connectivity. These findings indicate a new target for treating schizophrenia and characterize associated structural and functional deficits.
PMCID: PMC2573019  PMID: 18989458
7.  Genetic variation in AKT1 is linked to dopamine-associated prefrontal cortical structure and function in humans  
The Journal of Clinical Investigation  2008;118(6):2200-2208.
AKT1-dependent molecular pathways control diverse aspects of cellular development and adaptation, including interactions with neuronal dopaminergic signaling. If AKT1 has an impact on dopaminergic signaling, then genetic variation in AKT1 would be associated with brain phenotypes related to cortical dopaminergic function. Here, we provide evidence that a coding variation in AKT1 that affects protein expression in human B lymphoblasts influenced several brain measures related to dopaminergic function. Cognitive performance linked to frontostriatal circuitry, prefrontal physiology during executive function, and frontostriatal gray-matter volume on MRI were altered in subjects with the AKT1 variation. Moreover, on neuroimaging measures with a main effect of the AKT1 genotype, there was significant epistasis with a functional polymorphism (Val158Met) in catechol-O-methyltransferase [COMT], a gene that indexes cortical synaptic dopamine. This genetic interaction was consistent with the putative role of AKT1 in dopaminergic signaling. Supportive of an earlier tentative association of AKT1 with schizophrenia, we also found that this AKT1 variant was associated with risk for schizophrenia. These data implicate AKT1 in modulating human prefrontal-striatal structure and function and suggest that the mechanism of this effect may be coupled to dopaminergic signaling and relevant to the expression of psychosis.
PMCID: PMC2391279  PMID: 18497887
8.  catmap: Case-control And TDT Meta-Analysis Package 
BMC Bioinformatics  2008;9:130.
Risk for complex disease is thought to be controlled by multiple genetic risk factors, each with small individual effects. Meta-analyses of several independent studies may be helpful to increase the ability to detect association when effect sizes are modest. Although many software options are available for meta-analysis of genetic case-control data, no currently available software implements the method described by Kazeem and Farrall (2005), which combines data from independent family-based and case-control studies.
I introduce the package catmap for the R statistical computing environment that implements fixed- and random-effects pooled estimates for case-control and transmission disequilibrium methods, allowing for the use of genetic association data across study types. In addition, catmap may be used to create forest and funnel plots and to perform sensitivity analysis and cumulative meta-analysis. catmap is available from the Comprehensive R Archive Network .
catmap allows researchers to synthesize data to assess evidence for association in studies of genetic polymorphisms, facilitating the use of pooled data analyses which may increase power to detect moderate genetic associations.
PMCID: PMC2291045  PMID: 18307795
9.  Stability of variable importance scores and rankings using statistical learning tools on single-nucleotide polymorphisms and risk factors involved in gene × gene and gene × environment interactions 
BMC Proceedings  2007;1(Suppl 1):S58.
Risk of complex disorders is thought to be multifactorial, involving interactions between risk factors. However, many genetic studies assess association between disease status and markers one single-nucleotide polymorphism (SNP) at a time, due to the high-dimensional nature of the search space of all possible interactions. Three ensemble methods have been recently proposed for use in high-dimensional data (Monte Carlo logic regression, random forests, and generalized boosted regression). An intuitive way to detect an association between genetic markers and disease status is to use variable importance measures, even though the stability of these measures in the context of a whole-genome association study is unknown. For the simulated data of Problem 3 in the Genetic Analysis Workshop 15 (GAW15), we examined the variability of both rankings and magnitude of variable importance measures using 10 variables simulated to participate in gene × gene and gene × environment interactions. We conducted 500 analyses per method on one randomly selected replicate, tallying the rankings and importance measures for each of the 10 variables of interest. When the simulated effect size was strong, all three methods showed stable rankings and estimates of variable importance. However, under conditions more commonly expected to be encountered in complex diseases, random forests and generalized boosted regression showed more stable estimates of variable importance and variable rankings. Individuals endeavoring to apply statistical learning methods to detect interaction in complex disease studies should perform repeated analyses in order to assure variable importance measures and rankings do not vary greatly, even for statistical learning algorithms that are thought to be stable.
PMCID: PMC2367584  PMID: 18466558
10.  Comparison of type I error for multiple test corrections in large single-nucleotide polymorphism studies using principal components versus haplotype blocking algorithms 
BMC Genetics  2005;6(Suppl 1):S78.
Although permutation testing has been the gold standard for assessing significance levels in studies using multiple markers, it is time-consuming. A Bonferroni correction to the nominal p-value that uses the underlying pair-wise linkage disequilibrium (LD) structure among the markers to determine the number of effectively independent tests has recently been proposed. We propose using the number of independent LD blocks plus the number of independent single-nucleotide polymorphisms for correction. Using the Collaborative Study on the Genetics of Alcoholism LD data for chromosome 21, we simulated 1,000 replicates of parent-child trio data under the null hypothesis with two levels of LD: moderate and high. Assuming haplotype blocks were independent, we calculated the number of independent statistical tests using 3 haplotype blocking algorithms. We then compared the type I error rates using a principal components-based method, the three blocking methods, a traditional Bonferroni correction, and the unadjusted p-values obtained from FBAT. Under high LD conditions, the PC method and one of the blocking methods were slightly conservative, whereas the 2 other blocking methods exceeded the target type I error rate. Under conditions of moderate LD, we show that the blocking algorithm corrections are closest to the desired type I error, although still slightly conservative, with the principal components-based method being almost as conservative as the traditional Bonferroni correction.
PMCID: PMC1866703  PMID: 16451692

Results 1-10 (10)