To consider recent findings from quantitative genetic research in the context of molecular genetic research, especially genome-wide association studies. We focus on findings that go beyond merely estimating heritability. We use learning abilities and disabilities as examples.
Recent twin research in the area of learning abilities and disabilities was reviewed.
Three findings from quantitative genetic research stand out for their far-reaching implications for child and adolescent psychiatry. First, common disorders such as learning difficulties are the quantitative extreme of the same genetic factors responsible for genetic influence throughout the normal distribution (the Common Disorders are Quantitative Traits Hypothesis). Second, the same set of genes is largely responsible for genetic influence across diverse learning and cognitive abilities and disabilities (the Generalist Genes Hypothesis). Third, experiences are just as influenced genetically as are behaviors and genetic factors mediate associations between widely used measures of the environment and behavioural outcomes (the Nature of Nurture Hypothesis).
Quantitative genetics can go far beyond the rudimentary ‘how much’ question about nature versus nurture, and can continue to provide important findings in the era of molecular genetics.
Quantitative genetics; molecular genetics; twin studies; learning abilities; disabilities
We review models of the Baldwin effect, i.e., the hypothesis that adaptive learning (i.e., learning to improve fitness) accelerates genetic evolution of the phenotype. Numerous theoretical studies scrutinized the hypothesis that a non-evolving ability of adaptive learning accelerates evolution of genetically determined behavior. However, their results are conflicting in that some studies predict an accelerating effect of learning on evolution, whereas others show a decelerating effect. We begin by describing the arguments underlying the hypothesis on the Baldwin effect and identify the core argument: adaptive learning influences the rate of evolution because it changes relative fitness of phenotypes. Then we analyze the theoretical studies of the Baldwin effect with respect to their model of adaptive learning and discuss how their contrasting results can be explained from differences in (1) the ways in which the effect of adaptive learning on the phenotype is modeled, (2) the assumptions underlying the function used to quantify fitness and (3) the time scale at which the evolutionary rate is measured. We finish by reviewing the specific assumptions used by the theoretical studies of the Baldwin effect and discuss the evolutionary implications for cases where these assumptions do not hold.
The Baldwin effect; Fitness landscape; Evolution of phenotype; Adaptive learning; Innate behavior; Phenotypic variation; Genetic variation
The new view of cognitive neuropsychology that considers not just case studies of rare severe disorders but also common disorders, as well as normal variation and quantitative traits, is more amenable to recent advances in molecular genetics, such as genome-wide association studies, and advances in quantitative genetics, such as multivariate genetic analysis. A surprising finding emerging from multivariate quantitative genetic studies across diverse learning abilities is that most genetic influences are shared: they are ‘generalist’, rather than ‘specialist’.
We exploited widespread access to inexpensive and fast Internet connections in the United Kingdom to assess over 5000 pairs of 12-year-old twins from the Twins Early Development Study (TEDS) on four distinct batteries: reading, mathematics, general cognitive ability (g) and, for the first time, language.
Genetic correlations remain high among all of the measured abilities, with language as highly correlated genetically with g as reading and mathematics.
Despite developmental upheaval, generalist genes remain important into early adolescence, suggesting optimal strategies for molecular genetic studies seeking to identify the genes of small effect that influence learning abilities and disabilities.
Learning Ability; Intelligence; Reading; Mathematics; Language; Development; Adolescence; Genetics; Twins
Elucidating the genetic basis of human diseases is a central goal of genetics and molecular biology. While traditional linkage analysis and modern high-throughput techniques often provide long lists of tens or hundreds of disease gene candidates, the identification of disease genes among the candidates remains time-consuming and expensive. Efficient computational methods are therefore needed to prioritize genes within the list of candidates, by exploiting the wealth of information available about the genes in various databases.
We propose ProDiGe, a novel algorithm for Prioritization of Disease Genes. ProDiGe implements a novel machine learning strategy based on learning from positive and unlabeled examples, which allows to integrate various sources of information about the genes, to share information about known disease genes across diseases, and to perform genome-wide searches for new disease genes. Experiments on real data show that ProDiGe outperforms state-of-the-art methods for the prioritization of genes in human diseases.
ProDiGe implements a new machine learning paradigm for gene prioritization, which could help the identification of new disease genes. It is freely available at http://cbio.ensmp.fr/prodige.
Bird song is unusual as a sexually selected trait because its expression depends on learning as well as genetic and other environmental factors. Prior work has demonstrated that males who are deprived of the opportunity to learn produce songs that function little if at all in male-female interactions. We asked whether more subtle variation in male song-learning abilities influences female response to song. Using a copulation solicitation assay, we measured the response of female song sparrows (Melospiza melodia) to songs of laboratory-reared males that differed in the amount of learned versus invented material that they included and in the degree to which learned material accurately matched the model from which it was copied. Females responded significantly more to songs that had been learned better, by either measure. Females did not discriminate between the best-learned songs of laboratory-reared males and songs of wild males used as models during learning. These results provide, to our knowledge, a first experimental demonstration that variation in learning abilities among males plays a functionally important part in the expression of a sexually selected trait, and further provide support for the hypothesis that song functions as an indicator of male quality because it reflects variation in response to early developmental stress.
Despite several decades of research suggesting the importance of both genetic and environmental factors, these findings are not well integrated into the larger educational literature. Following a discussion of quantitative and molecular genetic methods, this article reviews behavioral genetic findings related to cognitive and academic skills. This literature suggests that (a) the relative importance of genes and environments varies developmentally; (b) genetics, and to a lesser extend the environment, account for a substantial portion of the covariance within and across academic domains; and (c) some forms of disability are qualitatively different from the population, whereas others constitute the lower end of a continuum of ability. Following a discussion of the strengths and limitations of current behavioral genetic research and intervention research, we then discuss the ways in which understanding gene –environment interplay can be used to develop better definitions of learning impairment and better explain the substantial variability in response to intervention.
Neurofibromatosis Type I (NF1) is a single-gene disorder characterized by a high incidence of complex cognitive symptoms, including learning disabilities, attention deficit disorder, executive function deficits, and motor coordination problems. Since the underlying genetic cause of this disorder is known, study of NF1 from a molecular, cellular, and systems perspective has provided mechanistic insights into the etiology of higher-order cognitive symptoms associated with the disease. In particular, studies of animal models of NF1 indicated that disruption of Ras regulation of inhibitory networks is critical to the etiology of cognitive deficits associated with NF1. Animal models of Nf1 identified mechanisms and pathways that are required for cognition, and represent an important complement to the complex neuropsychological literature on learning disabilities associated with this condition. Here, we review findings from NF1 animal models and human populations affected by NF1, highlighting areas of potential translation and discussing the implications and limitations of generalizing findings from this single-gene disease to idiopathic learning disabilities.
Ras; GABA; LTP; animal model; neurodevelopmental disorder; ADHD
Age-related declines in human cognition are well known, and there are correlative changes in the function of neocortical and hippocampal neurons. Similarly, age-related declines in learning have been observed in rodents, including deficits in a hippocampal-dependent learning paradigm, the Morris water maze. Furthermore, there are correlative deficits in specific signaling pathways, including protein kinase C (PKC) pathways, in cerebellar, hippocampal, or neocortical neurons. PKC pathways are strong candidates for mediating the molecular changes that underlie spatial learning, as they play critical roles in neurotransmitter release and synaptic plasticity, including long-term potentiation (LTP) and long-term depression (LTD), and deletion of specific PKC genes results in deficits in learning. Conversely, genetic activation of PKC pathways in small groups of hippocampal or cortical neurons enhances learning in specific paradigms. In this study, we delivered a constitutively active PKC into small groups of hippocampal dentate granule neurons in aged rats (using a Herpes Simplex Virus-1 vector). Aged two-year old rats that received the constitutively active PKC displayed improved performance in the Morris water maze relative to controls in three different measures. These results indicate that PKC pathways play an important role in mediating spatial learning in aged rats. Additionally, these results represent a system for studying the neural mechanisms underlying aging-related learning deficits, and potentially developing gene therapies for cognitive and age-related deficits.
spatial discrimination; aged rats; protein kinase C; dentate granule neurons; Herpes Simplex Virus vector
Pluripotent stem cells are able to self-renew, and to differentiate into all adult cell types. Many studies report data describing these cells, and characterize them in molecular terms. Machine learning yields classifiers that can accurately identify pluripotent stem cells, but there is a lack of studies yielding minimal sets of best biomarkers (genes/features). We assembled gene expression data of pluripotent stem cells and non-pluripotent cells from the mouse. After normalization and filtering, we applied machine learning, classifying samples into pluripotent and non-pluripotent with high cross-validated accuracy. Furthermore, to identify minimal sets of best biomarkers, we used three methods: information gain, random forests and a wrapper of genetic algorithm and support vector machine (GA/SVM). We demonstrate that the GA/SVM biomarkers work best in combination with each other; pathway and enrichment analyses show that they cover the widest variety of processes implicated in pluripotency. The GA/SVM wrapper yields best biomarkers, no matter which classification method is used. The consensus best biomarker based on the three methods is Tet1, implicated in pluripotency just recently. The best biomarker based on the GA/SVM wrapper approach alone is Fam134b, possibly a missing link between pluripotency and some standard surface markers of unknown function processed by the Golgi apparatus.
pluripotency; machine learning; feature selection; genetic algorithm; support vector machine
Large lecture classes and standardized laboratory exercises are characteristic of introductory biology courses. Previous research has found that these courses do not adequately convey the process of scientific research and the excitement of discovery. Here we propose a model that provides beginning biology students with an inquiry-based, active learning laboratory experience. The Dynamic Genome course replicates a modern research laboratory focused on eukaryotic transposable elements where beginning undergraduates learn key genetics concepts, experimental design, and molecular biological skills. Here we report on two key features of the course, a didactic module and the capstone original research project. The module is a modified version of a published experiment where students experience how virtual transposable elements from rice (Oryza sativa) are assayed for function in transgenic Arabidopsis thaliana. As part of the module, students analyze the phenotypes and genotypes of transgenic plants to determine the requirements for transposition. After mastering the skills and concepts, students participate in an authentic research project where they use computational analysis and PCR to detect transposable element insertion site polymorphism in a panel of diverse maize strains. As a consequence of their engagement in this course, students report large gains in their ability to understand the nature of research and demonstrate that they can apply that knowledge to independent research projects.
Genetics education; transposable elements; scientific teaching; undergraduates
The development of effective frameworks that permit an accurate diagnosis of tumors, especially in their early stages, remains a grand challenge in the field of bioinformatics. Our approach uses statistical learning techniques applied to multiple antigen tumor antigen markers utilizing the immune system as a very sensitive marker of molecular pathological processes. For validation purposes we choose the intracranial meningioma tumors as model system since they occur very frequently, are mostly benign, and are genetically stable.
A total of 183 blood samples from 93 meningioma patients (WHO stages I-III) and 90 healthy controls were screened for seroreactivity with a set of 57 meningioma-associated antigens. We tested several established statistical learning methods on the resulting reactivity patterns using 10-fold cross validation. The best performance was achieved by Naïve Bayes Classifiers. With this classification method, our framework, called Minimally Invasive Multiple Marker (MIMM) approach, yielded a specificity of 96.2%, a sensitivity of 84.5%, and an accuracy of 90.3%, the respective area under the ROC curve was 0.957. Detailed analysis revealed that prediction performs particularly well on low-grade (WHO I) tumors, consistent with our goal of early stage tumor detection. For these tumors the best classification result with a specificity of 97.5%, a sensitivity of 91.3%, an accuracy of 95.6%, and an area under the ROC curve of 0.971 was achieved using a set of 12 antigen markers only. This antigen set was detected by a subset selection method based on Mutual Information. Remarkably, our study proves that the inclusion of non-specific antigens, detected not only in tumor but also in normal sera, increases the performance significantly, since non-specific antigens contribute additional diagnostic information.
Our approach offers the possibility to screen members of risk groups as a matter of routine such that tumors hopefully can be diagnosed immediately after their genesis. The early detection will finally result in a higher cure- and lower morbidity-rate.
Identifying the catalytic residues in enzymes can aid in understanding the molecular basis of an enzyme's function and has significant implications for designing new drugs, identifying genetic disorders, and engineering proteins with novel functions. Since experimentally determining catalytic sites is expensive, better computational methods for identifying catalytic residues are needed.
We propose ResBoost, a new computational method to learn characteristics of catalytic residues. The method effectively selects and combines rules of thumb into a simple, easily interpretable logical expression that can be used for prediction. We formally define the rules of thumb that are often used to narrow the list of candidate residues, including residue evolutionary conservation, 3D clustering, solvent accessibility, and hydrophilicity. ResBoost builds on two methods from machine learning, the AdaBoost algorithm and Alternating Decision Trees, and provides precise control over the inherent trade-off between sensitivity and specificity. We evaluated ResBoost using cross-validation on a dataset of 100 enzymes from the hand-curated Catalytic Site Atlas (CSA).
ResBoost achieved 85% sensitivity for a 9.8% false positive rate and 73% sensitivity for a 5.7% false positive rate. ResBoost reduces the number of false positives by up to 56% compared to the use of evolutionary conservation scoring alone. We also illustrate the ability of ResBoost to identify recently validated catalytic residues not listed in the CSA.
It is a great challenge of modern biology to determine the functional roles of non-synonymous Single Nucleotide Polymorphisms (nsSNPs) on complex phenotypes. Statistical and machine learning techniques establish correlations between genotype and phenotype, but may fail to infer the biologically relevant mechanisms. The emerging paradigm of Network-based Association Studies aims to address this problem of statistical analysis. However, a mechanistic understanding of how individual molecular components work together in a system requires knowledge of molecular structures, and their interactions.
To address the challenge of understanding the genetic, molecular, and cellular basis of complex phenotypes, we have, for the first time, developed a structural systems biology approach for genome-wide multiscale modeling of nsSNPs - from the atomic details of molecular interactions to the emergent properties of biological networks. We apply our approach to determine the functional roles of nsSNPs associated with hypoxia tolerance in Drosophila melanogaster. The integrated view of the functional roles of nsSNP at both molecular and network levels allows us to identify driver mutations and their interactions (epistasis) in H, Rad51D, Ulp1, Wnt5, HDAC4, Sol, Dys, GalNAc-T2, and CG33714 genes, all of which are involved in the up-regulation of Notch and Gurken/EGFR signaling pathways. Moreover, we find that a large fraction of the driver mutations are neither located in conserved functional sites, nor responsible for structural stability, but rather regulate protein activity through allosteric transitions, protein-protein interactions, or protein-nucleic acid interactions. This finding should impact future Genome-Wide Association Studies.
Our studies demonstrate that the consolidation of statistical, structural, and network views of biomolecules and their interactions can provide new insight into the functional role of nsSNPs in Genome-Wide Association Studies, in a way that neither the knowledge of molecular structures nor biological networks alone could achieve. Thus, multiscale modeling of nsSNPs may prove to be a powerful tool for establishing the functional roles of sequence variants in a wide array of applications.
Genetics contributes importantly to learning abilities and disabilities—not just to reading, the target of most genetic research, but also to mathematics and other academic areas as well. One of the most important recent findings from quantitative genetic research such as twin studies is that the same set of genes is largely responsible for genetic influence across these domains. We call these “generalist genes” to highlight their pervasive influence. In other words, most genes found to be associated with a particular learning ability or disability (such as reading) will also be associated with other learning abilities and disabilities (such as mathematics). Moreover, some generalist genes for learning abilities and disabilities are even more general in their effect, encompassing other cognitive abilities such as memory and spatial ability. When these generalist genes are identified, they will greatly accelerate research on general mechanisms at all levels of analysis from genes to brain to behavior.
The IMMEX (Interactive Multi-Media Exercises) Web-based problem set platform enables the online delivery of complex, multimedia simulations, the rapid collection of student performance data, and has already been used in several genetic simulations. The next step is the use of these data to understand and improve student learning in a formative manner. This article describes the development of probabilistic models of undergraduate student problem solving in molecular genetics that detailed the spectrum of strategies students used when problem solving, and how the strategic approaches evolved with experience. The actions of 776 university sophomore biology majors from three molecular biology lecture courses were recorded and analyzed. Each of six simulations were first grouped by artificial neural network clustering to provide individual performance measures, and then sequences of these performances were probabilistically modeled by hidden Markov modeling to provide measures of progress. The models showed that students with different initial problem-solving abilities choose different strategies. Initial and final strategies varied across different sections of the same course and were not strongly correlated with other achievement measures. In contrast to previous studies, we observed no significant gender differences. We suggest that instructor interventions based on early student performances with these simulations may assist students to recognize effective and efficient problem-solving strategies and enhance learning.
scientific problem-solving strategies; hidden Markov models; learning trajectory; neural networks
Oligodendrogliomas are rare primary brain tumors. Significant attention has recently been focused on these interesting neoplasms because of their unique chemosensitivity and the durability of some of these responses. Surgery and radiation continue to play important roles in the treatment of oligodendrogliomas. Molecular genetic analyses have given new insight into the allelic deletions that distinguish these tumors and their progression from indolent to more aggressive forms. In the future, molecular genetic analysis may guide therapeutic decisions concerning patients with oligodendroglioma and may help us learn more about how to best treat other malignant brain neoplasms.
This paper has two primary goals. First, a brief tutorial on behavioral and molecular genetic methods is provided for readers without extensive training in these areas. To illustrate the application of these approaches to developmental disorders, etiologically-informative studies of reading disability (RD), math disability (MD), and attention-deficit/hyperactivity disorder (ADHD) are then reviewed. Implications of the results for these specific disorders and for developmental disabilities as a whole are discussed, and novel directions for future research are highlighted.
Previous family and twin studies of RD, MD, and ADHD are reviewed systematically, and the extensive molecular genetic literatures on each disorder are summarized. To illustrate four novel extensions of these etiologically-informative approaches, new data are presented from the Colorado Learning Disabilities Research Center, an ongoing twin study of the etiology of RD, ADHD, MD, and related disorders.
RD, MD, and ADHD are familial and heritable, and co-occur more frequently than expected by chance. Molecular genetic studies suggest that all three disorders have complex etiologies, with multiple genetic and environmental risk factors each contributing to overall risk for each disorder. Neuropsychological analyses indicate that the three disorders are each associated with multiple neuropsychological weaknesses, and initial evidence suggests that comorbidity between the three disorders is due to common genetic risk factors that lead to slow processing speed
Reading; math; ADHD; genetics; twins
As research into the neurobiology of language has focused primarily on the systems level, fewer studies have examined the link between molecular genetics and normal variations in language functions. Because the ability to learn a language varies in adults and our genetic codes also vary, research linking the two provides a unique window into the molecular neurobiology of language. We consider a candidate association between the dopamine receptor D2 gene (DRD2) and linguistic grammar learning. DRD2-TAQ-IA polymorphism (rs1800497) is associated with dopamine receptor D2 distribution and dopamine impact in the human striatum, such that A1 allele carriers show reduction in D2 receptor binding relative to carriers who are homozygous for the A2 allele. The individual differences in grammatical rule learning that are particularly prevalent in adulthood are also associated with striatal function and its role in domain-general procedural memory. Therefore, we reasoned that procedurally-based grammar learning could be associated with DRD2-TAQ-IA polymorphism. Here, English-speaking adults learned artificial concatenative and analogical grammars, which have been respectively associated with procedural and declarative memory. Language learning capabilities were tested while learners’ neural hemodynamic responses were simultaneously measured by fMRI. Behavioral learning and brain activation data were subsequently compared with the learners’ DRD2 (rs1800497) genotype. Learners who were homozygous for the A2 allele were better at concatenative (but not analogical) grammar learning and had higher striatal responses relative to those who have at least one A1 allele. These results provide preliminary evidence for the neurogenetic basis of normal variations in linguistic grammar learning and its link to domain-general functions.
Many accounts of memory suggest that an initial learning experience initiates a cascade of cellular and molecular events that are required for the consolidation of memory from a labile into a more permanent state. Studies of memory in many species have routinely found that altered gene activity and new protein synthesis are the critical components of this memory consolidation process. During extinction, when organisms learn that previously established relations between stimuli have been severed, new memories are formed and consolidated. However, the nature of the learning that underlies extinction remains unclear and there are many processes that may contribute to the weakening of behavior that occurs during extinction. In this review, we suggest that the molecular mechanisms that underlie extinction may differ depending on the learning process that is engaged by extinction. We review evidence that extinction, like initial learning, requires transcription and translation, as well as evidence that extinction occurs when protein synthesis is inhibited. We suggest that extinction occurs through the interaction of multiple behavioral and molecular mechanisms.
Extinction; memory; Lymnaea; molecular mechanisms; behavioral mechanisms
Mental retardation—known more commonly nowadays as intellectual disability—is a severe neurological condition affecting up to 3% of the general population. As a result of the analysis of familial cases and recent advances in clinical genetic testing, great strides have been made in our understanding of the genetic etiologies of mental retardation. Nonetheless, no treatment is currently clinically available to patients suffering from intellectual disability. Several animal models have been used in the study of memory and cognition. Established paradigms in Drosophila have recently captured cognitive defects in fly mutants for orthologs of genes involved in human intellectual disability. We review here three protocols designed to understand the molecular genetic basis of learning and memory in Drosophila and the genes identified so far with relation to mental retardation. In addition, we explore the mental retardation genes for which evidence of neuronal dysfunction other than memory has been established in Drosophila. Finally, we summarize the findings in Drosophila for mental retardation genes for which no neuronal information is yet available. All in all, this review illustrates the impressive overlap between genes identified in human mental retardation and genes involved in physiological learning and memory.
Drosophila; mental retardation; neurological disorders; genetics; development; treatment
Advances in DNA sequencing, high-throughput technologies, and genetic manipulation systems have enabled empirical studies of the molecular and genomic bases of adaptive evolution. This review discusses key insights learned from direct observation of the evolution process.
Laboratory evolution studies provide fundamental biological insight through direct observation of the evolution process. They not only enable testing of evolutionary theory and principles, but also have applications to metabolic engineering and human health. Genome-scale tools are revolutionizing studies of laboratory evolution by providing complete determination of the genetic basis of adaptation and the changes in the organism's gene expression state. Here, we review studies centered on four central themes of laboratory evolution studies: (1) the genetic basis of adaptation; (2) the importance of mutations to genes that encode regulatory hubs; (3) the view of adaptive evolution as an optimization process; and (4) the dynamics with which laboratory populations evolve.
epistasis; flux-balance analysis; metabolic engineering; mutation; regulatory hub
High-grade astrocytoma remains a significant challenge to the clinician and researcher alike. Intense study of the molecular pathogenesis of these tumors has allowed identification of frequent genetic alterations and critical core pathways in this disease. The use of novel mouse genetic tools to study the consequence of specific mutations in brain has led to the development of multiple representative genetically engineered mouse models that provided novel insights into gliomagenesis. As we learn more about the biology of high-grade astrocytoma from the study of these models, we anticipate that our improved understanding will eventually lead to greater success in clinical trials and improved outcome for patients.
glioblastoma; astrocytoma; mouse model; PI3K pathway
A growing list of common and rare genetic risk variants are being implicated in schizophrenia susceptibility. As with other complex genetic disorders most of the variance in genetic risk is still to be attributed. What can be learned from progress to date? The available data challenges how we conceptualize schizophrenia and suggests strong aetiological links with other psychiatric and developmental disorders. With the identification of rare copy number risk variants implicating specific genes (e.g. VIPR2 and NRXN1) it is increasingly possible to investigate molecular aetiology in patient subgroups to establish whether schizophrenia represents one or many different disease processes. This review summarizes recent research progress and suggests how the tools of modern genomics and neuroscience can be applied to best understand this devastating disorder.
Copy Number Variation; DNA variants; neurodevelopmental disorders; psychosis; schizophrenia.
Understanding the root molecular and genetic causes driving complex traits is a fundamental challenge in genomics and genetics. Numerous studies have used variation in gene expression to understand complex traits, but the underlying genomic variation that contributes to these expression changes is not well understood. In this study, we developed a framework to integrate gene expression and genotype data to identify biological differences between samples from opposing complex trait classes that are driven by expression changes and genotypic variation. This framework utilizes pathway analysis and multi-task learning to build a predictive model and discover pathways relevant to the complex trait of interest. We simulated expression and genotype data to test the predictive ability of our framework and to measure how well it uncovered pathways with genes both differentially expressed and genetically associated with a complex trait. We found that the predictive performance of the multi-task model was comparable to other similar methods. Also, methods like multi-task learning that considered enrichment analysis scores from both data sets found pathways with both genetic and expression differences related to the phenotype. We used our framework to analyze differences between estrogen receptor (ER) positive and negative breast cancer samples. An analysis of the top 15 gene sets from the multi-task model showed they were all related to estrogen, steroids, cell signaling, or the cell cycle. Although our study suggests that multi-task learning does not enhance predictive accuracy, the models generated by our framework do provide valuable biological pathway knowledge for complex traits.
With the advance of large-scale omics technologies, it is now feasible to reversely engineer the underlying genetic networks that describe the complex interplays of molecular elements that lead to complex diseases. Current networking approaches are mainly focusing on building genetic networks at large without probing the interaction mechanisms specific to a physiological or disease condition. The aim of this study was thus to develop such a novel networking approach based on the relevance concept, which is ideal to reveal integrative effects of multiple genes in the underlying genetic circuit for complex diseases.
The approach started with identification of multiple disease pathways, called a gene forest, in which the genes extracted from the decision forest constructed by supervised learning of the genome-wide transcriptional profiles for patients and normal samples. Based on the newly identified disease mechanisms, a novel pair-wise relevance metric, adjusted frequency value, was used to define the degree of genetic relationship between two molecular determinants. We applied the proposed method to analyze a publicly available microarray dataset for colon cancer. The results demonstrated that the colon cancer-specific gene network captured the most important genetic interactions in several cellular processes, such as proliferation, apoptosis, differentiation, mitogenesis and immunity, which are known to be pivotal for tumourigenesis. Further analysis of the topological architecture of the network identified three known hub cancer genes [interleukin 8 (IL8) (p ≈ 0), desmin (DES) (p = 2.71 × 10-6) and enolase 1 (ENO1) (p = 4.19 × 10-5)], while two novel hub genes [RNA binding motif protein 9 (RBM9) (p = 1.50 × 10-4) and ribosomal protein L30 (RPL30) (p = 1.50 × 10-4)] may define new central elements in the gene network specific to colon cancer. Gene Ontology (GO) based analysis of the colon cancer-specific gene network and the sub-network that consisted of three-way gene interactions suggested that tumourigenesis in colon cancer resulted from dysfunction in protein biosynthesis and categories associated with ribonucleoprotein complex which are well supported by multiple lines of experimental evidence.
This study demonstrated that IL8, DES and ENO1 act as the central elements in colon cancer susceptibility, and protein biosynthesis and the ribosome-associated function categories largely account for the colon cancer tumuorigenesis. Thus, the newly developed relevancy-based networking approach offers a powerful means to reverse-engineer the disease-specific network, a promising tool for systematic dissection of complex diseases.