Autistic Spectrum Disorder (ASD) is a heterogeneous neurodevelopmental disorder, resulting from complex interactions among genetic, genomic and environmental factors. Here we have studied the expression of Human Endogenous Retroviruses (HERVs), non-coding DNA elements with potential regulatory functions, and have tested their possible implication in autism.
The presence of retroviral mRNAs from four HERV families (E, H, K and W), widely implicated in complex diseases, was evaluated in peripheral blood mononuclear cells (PBMCs) from ASD patients and healthy controls (HCs) by qualitative RT-PCR. We also analyzed the expression of the env sequence from HERV-H, HERV-W and HERV-K families in PBMCs at the time of sampling and after stimulation in culture, in both ASD and HC groups, by quantitative Real-time PCR. Differences between groups were evaluated using statistical methods.
The percentage of HERV-H and HERV-W positive samples was higher among ASD patients compared to HCs, while HERV-K was similarly represented and HERV-E virtually absent in both groups. The quantitative evaluation shows that HERV-H and HERV-W are differentially expressed in the two groups, with HERV-H being more abundantly expressed and, conversely, HERV-W, having lower abundance, in PBMCs from ASDs compared to healthy controls. PMBCs from ASDs also showed an increased potential to up-regulate HERV-H expression upon stimulation in culture, unlike HCs. Furthermore we report a negative correlation between expression levels of HERV-H and age among ASD patients and a statistically significant higher expression in ASD patients with Severe score in Communication and Motor Psychoeducational Profile-3.
Specific HERV families have a distinctive expression profile in ASD patients compared to HCs. We propose that HERV-H expression be explored in larger samples of individuals with autism spectrum in order to determine its utility as a novel biological trait of this complex disorder.
Whole-genome and exome sequencing have already proven to be essential and powerful methods to identify genes responsible for simple Mendelian inherited disorders. These methods can be applied to complex disorders as well, and have been adopted as one of the current mainstream approaches in population genetics. These achievements have been made possible by next generation sequencing (NGS) technologies, which require substantial bioinformatics resources to analyze the dense and complex sequence data. The huge analytical burden of data from genome sequencing might be seen as a bottleneck slowing the publication of NGS papers at this time, especially in psychiatric genetics. We review the existing methods for processing NGS data, to place into context the rationale for the design of a computational resource. We describe our method, the Graphical Pipeline for Computational Genomics (GPCG), to perform the computational steps required to analyze NGS data. The GPCG implements flexible workflows for basic sequence alignment, sequence data quality control, single nucleotide polymorphism analysis, copy number variant identification, annotation, and visualization of results. These workflows cover all the analytical steps required for NGS data, from processing the raw reads to variant calling and annotation. The current version of the pipeline is freely available at http://pipeline.loni.ucla.edu. These applications of NGS analysis may gain clinical utility in the near future (e.g., identifying miRNA signatures in diseases) when the bioinformatics approach is made feasible. Taken together, the annotation tools and strategies that have been developed to retrieve information and test hypotheses about the functional role of variants present in the human genome will help to pinpoint the genetic risk factors for psychiatric disorders.
Next Generation Sequencing (NGS); LONI pipeline; SNPs; CNVs; workflow; bioinformatics
Schizophrenia (SZ) is a complex disorder resulting from both genetic and environmental causes with a lifetime prevalence world-wide of 1%; however, there are no specific, sensitive and validated biomarkers for SZ. A general unifying hypothesis has been put forward that disease-associated single nucleotide polymorphisms (SNPs) from genome-wide association study (GWAS) are more likely to be associated with gene expression quantitative trait loci (eQTL). We will describe this hypothesis and review primary methodology with refinements for testing this paradigmatic approach in SZ. We will describe biomarker studies of SZ and testing enrichment of SNPs that are associated both with eQTLs and existing GWAS of SZ. SZ-associated SNPs that overlap with eQTLs can be placed into gene–gene expression, protein–protein and protein–DNA interaction networks. Further, those networks can be tested by reducing/silencing the gene expression levels of critical nodes. We present pilot data to support these methods of investigation such as the use of eQTLs to annotate GWASs of SZ, which could be applied to the field of biomarker discovery. Those networks that have association with SNP markers, especially cis-regulated expression, might lead to a more clear understanding of important candidate genes that predispose to disease and alter expression. This method has general application to many complex disorders.
expression quantitative trait loci; cis-regulatory SNPs; GWAS; gene expression; lymphoblastoid cell lines
As biomedical technology becomes increasingly sophisticated, researchers can probe ever more subtle effects with the added requirement that the investigation of small effects often requires the acquisition of large amounts of data. In biomedicine, these data are often acquired at, and later shared between, multiple sites. There are both technological and sociological hurdles to be overcome for data to be passed between researchers and later made accessible to the larger scientific community. The goal of the Biomedical Informatics Research Network (BIRN) is to address the challenges inherent in biomedical data sharing.
Materials and methods
BIRN tools are grouped into ‘capabilities’ and are available in the areas of data management, data security, information integration, and knowledge engineering. BIRN has a user-driven focus and employs a layered architectural approach that promotes reuse of infrastructure. BIRN tools are designed to be modular and therefore can work with pre-existing tools. BIRN users can choose the capabilities most useful for their application, while not having to ensure that their project conforms to a monolithic architecture.
BIRN has implemented a new software-based data-sharing infrastructure that has been put to use in many different domains within biomedicine. BIRN is actively involved in outreach to the broader biomedical community to form working partnerships.
BIRN's mission is to provide capabilities and services related to data sharing to the biomedical research community. It does this by forming partnerships and solving specific, user-driven problems whose solutions are then available for use by other groups.
Genomics; statistical genetics; bioinformatics; complex traits; data; machine learning; data sharing; information integration; data mediation; data security; data management; knowledge engineering
Mitochondrial deficiencies with unknown causes have been observed in schizophrenia (SZ) and bipolar disorder (BD) in imaging and postmortem studies. Polymorphisms and somatic mutations in mitochondrial DNA (mtDNA) were investigated as potential causes with next generation sequencing of mtDNA (mtDNA-Seq) and genotyping arrays in subjects with SZ, BD, major depressive disorder (MDD), and controls. The common deletion of 4,977 bp in mtDNA was compared between SZ and controls in 11 different vulnerable brain regions and in blood samples, and in dorsolateral prefrontal cortex (DLPFC) of BD, SZ, and controls. In a separate analysis, association of mitochondria SNPs (mtSNPs) with SZ and BD in European ancestry individuals (n = 6,040) was tested using Genetic Association Information Network (GAIN) and Wellcome Trust Case Control Consortium 2 (WTCCC2) datasets. The common deletion levels were highly variable across brain regions, with a 40-fold increase in some regions (nucleus accumbens, caudate nucleus and amygdala), increased with age, and showed little change in blood samples from the same subjects. The common deletion levels were increased in the DLPFC for BD compared to controls, but not in SZ. Full mtDNA genome resequencing of 23 subjects, showed seven novel homoplasmic mutations, five were novel synonymous coding mutations. By logistic regression analysis there were no significant mtSNPs associated with BD or SZ after genome wide correction. However, nominal association of mtSNPs (p < 0.05) to SZ and BD were found in the hypervariable region of mtDNA to T195C and T16519C. The results confirm prior reports that certain brain regions accumulate somatic mutations at higher levels than blood. The study in mtDNA of common polymorphisms, somatic mutations, and rare mutations in larger populations may lead to a better understanding of the pathophysiology of psychiatric disorders.
mitochondria; homoplasmy; common deletion; novel mutations; schizophrenia; bipolar disorder
Many reports in different populations have demonstrated linkage of the 10q24–q26 region to schizophrenia, thus encouraging further analysis of this locus for detection of specific schizophrenia genes. Our group previously reported linkage of the 10q24–q26 region to schizophrenia in a unique, homogeneous sample of Arab-Israeli families with multiple schizophrenia-affected individuals, under a dominant model of inheritance. To further explore this candidate region and identify specific susceptibility variants within it, we performed re-analysis of the 10q24-26 genotype data, taken from our previous genome-wide association study (GWAS) (Alkelai et al, 2011). We analyzed 2089 SNPs in an extended sample of 57 Arab Israeli families (189 genotyped individuals), under the dominant model of inheritance, which best fits this locus according to previously performed MOD score analysis. We found significant association with schizophrenia of the TCF7L2 gene intronic SNP, rs12573128, (p = 7.01×10−6) and of the nearby intergenic SNP, rs1033772, (p = 6.59×10−6) which is positioned between TCF7L2 and HABP2. TCF7L2 is one of the best confirmed susceptibility genes for type 2 diabetes (T2D) among different ethnic groups, has a role in pancreatic beta cell function and may contribute to the comorbidity of schizophrenia and T2D. These preliminary results independently support previous findings regarding a possible role of TCF7L2 in susceptibility to schizophrenia, and strengthen the importance of integrating linkage analysis models of inheritance while performing association analyses in regions of interest. Further validation studies in additional populations are required.
To address the statistical challenges associated with genome-wide association studies, we present an independent component analysis (ICA) with reference approach to target a specific genetic variation and associated brain networks. First, a small set of single nucleotide polymorphisms (SNPs) are empirically chosen to reflect a feature of interest and these SNPs are used as a reference when applying ICA to a full genomic SNP array. After extracting the genetic component maximally representing the characteristics of the reference, we test its association with brain networks in functional magnetic resonance imaging (fMRI) data. The method was evaluated on both real and simulated datasets. Simulation demonstrates that ICA with reference can extract a specific genetic factor, even when the variance accounted for by such a factor is so small that a regular ICA fails. Our real data application from 48 schizophrenia patients (SZs) and 40 healthy controls (HCs) include 300K SNPs and fMRI images in an auditory oddball task. Using SNPs with allelic frequency difference in two groups as a reference, we extracted a genetic component that maximally differentiates patients from controls (p < 4 × 10−17), and discovered a brain functional network that was significantly associated with this genetic component (p < 1 × 10−4). The regions in the functional network mainly locate in the thalamus, anterior and posterior cingulate gyri. The contributing SNPs in the genetic factor mainly fall into two clusters centered at chromosome 7q21 and chromosome 5q35. The findings from the schizophrenia application are in concordance with previous knowledge about brain regions and gene function. All together, the results suggest that the ICA with reference can be particularly useful to explore the whole genome to find a specific factor of interest and further study its effect on brain.
independent component analysis with reference; genome-wide association study; brain network; schizophrenia; single nucleotide polymorphisms; functional magnetic resonance imaging
The imaging genetics approach to studying the genetic basis of disease leverages the individual strengths of both neuroimaging and genetic studies by visualizing and quantifying the brain activation patterns in the context of genetic background. Brain imaging as an intermediate phenotype can help clarify the functional link among genes, the molecular networks in which they participate, and brain circuitry and function. Integrating genetic data from a genome-wide association study (GWAS) with brain imaging as a quantitative trait (QT) phenotype can increase the statistical power to identify risk genes. A QT analysis using brain imaging (DLPFC activation during a working memory task) as a quantitative trait has identified unanticipated risk genes for schizophrenia. Several of these genes (RSRC1, ARHGAP18, ROBO1-ROBO2, GPC1, CTXN3-SLC12A2) have functions related to progenitor cell proliferation, migration, and differentiation, axonal connectivity, and development of forebrain structures. These genes, however, do not function in isolation but rather through gene regulatory networks. To obtain a deeper understanding how the GWAS-identified genes participate in larger gene regulatory networks, we measured correlations among transcript levels in the mouse and human postmortem tissue, and performed a gene set enrichment analysis (GSEA). The results of such computational approaches can be further validated in animal experiments in which the networks are experimentally studied and perturbed with specific compounds. Glypican 1 and FGF17 mouse models can be used to study such gene regulatory networks. The model demonstrates epistatic interactions between FGF and glypican on brain development and may be a useful model of negative symptom schizophrenia.
A genome-wide association screen for primary biliary cirrhosis risk alleles was performed in an Italian cohort. The results from the Italian cohort replicated IL12A and IL12RB associations, and a combined meta-analysis using a Canadian dataset identified newly associated loci at SPIB (P = 7.9 × 10–11, odds ratio (OR) = 1.46), IRF5-TNPO3 (P = 2.8 × 10–10, OR = 1.63) and 17q12-21 (P = 1.7 × 10–10, OR = 1.38).
Despite the central role of amyloid deposition in the development of Alzheimer's disease (AD), the pathogenesis of AD still remains elusive at the molecular level. Increasing evidence suggests that compromised mitochondrial function contributes to the aging process and thus may increase the risk of AD. Dysfunctional mitochondria contribute to reactive oxygen species (ROS) which can lead to extensive macromolecule oxidative damage and the progression of amyloid pathology. Oxidative stress and amyloid toxicity leave neurons chemically vulnerable. Because the brain relies on aerobic metabolism, it is apparent that mitochondria are critical for the cerebral function. Mitochondrial DNA sequence-changes could shift cell dynamics and facilitate neuronal vulnerability. Therefore we postulated that mitochondrial DNA sequence polymorphisms may increase the risk of AD. We evaluated the role of mitochondrial haplogroups derived from 138 mitochondrial polymorphisms in 358 Caucasian ADNI subjects. Our results indicate that the mitochondrial haplogroup UK may confer genetic susceptibility to AD independently of the APOE4 allele.
ADNI; Alzheimer's disease; mitochondrial polymorphism; mitochondrial haplogroups
Contemporary informatics and genomics research require efficient, flexible and robust management of large heterogeneous data, advanced computational tools, powerful visualization, reliable hardware infrastructure, interoperability of computational resources, and detailed data and analysis-protocol provenance. The Pipeline is a client-server distributed computational environment that facilitates the visual graphical construction, execution, monitoring, validation and dissemination of advanced data analysis protocols.
This paper reports on the applications of the LONI Pipeline environment to address two informatics challenges - graphical management of diverse genomics tools, and the interoperability of informatics software. Specifically, this manuscript presents the concrete details of deploying general informatics suites and individual software tools to new hardware infrastructures, the design, validation and execution of new visual analysis protocols via the Pipeline graphical interface, and integration of diverse informatics tools via the Pipeline eXtensible Markup Language syntax. We demonstrate each of these processes using several established informatics packages (e.g., miBLAST, EMBOSS, mrFAST, GWASS, MAQ, SAMtools, Bowtie) for basic local sequence alignment and search, molecular biology data analysis, and genome-wide association studies. These examples demonstrate the power of the Pipeline graphical workflow environment to enable integration of bioinformatics resources which provide a well-defined syntax for dynamic specification of the input/output parameters and the run-time execution controls.
The LONI Pipeline environment http://pipeline.loni.ucla.edu provides a flexible graphical infrastructure for efficient biomedical computing and distributed informatics research. The interactive Pipeline resource manager enables the utilization and interoperability of diverse types of informatics resources. The Pipeline client-server model provides computational power to a broad spectrum of informatics investigators - experienced developers and novice users, user with or without access to advanced computational-resources (e.g., Grid, data), as well as basic and translational scientists. The open development, validation and dissemination of computational networks (pipeline workflows) facilitates the sharing of knowledge, tools, protocols and best practices, and enables the unbiased validation and replication of scientific findings by the entire community.
Recently we have witnessed a surge of interest in using genome-wide association studies (GWAS) to discover the genetic basis of complex diseases. Many genetic variations, mostly in the form of single nucleotide polymorphisms (SNPs), have been identified in a wide spectrum of diseases, including diabetes, cancer, and psychiatric diseases. A common theme arising from these studies is that the genetic variations discovered by GWAS can only explain a small fraction of the genetic risks associated with the complex diseases. New strategies and statistical approaches are needed to address this lack of explanation. One such approach is the pathway analysis, which considers the genetic variations underlying a biological pathway, rather than separately as in the traditional GWAS studies. A critical challenge in the pathway analysis is how to combine evidences of association over multiple SNPs within a gene and multiple genes within a pathway. Most current methods choose the most significant SNP from each gene as a representative, ignoring the joint action of multiple SNPs within a gene. This approach leads to preferential identification of genes with a greater number of SNPs.
We describe a SNP-based pathway enrichment method for GWAS studies. The method consists of the following two main steps: 1) for a given pathway, using an adaptive truncated product statistic to identify all representative (potentially more than one) SNPs of each gene, calculating the average number of representative SNPs for the genes, then re-selecting the representative SNPs of genes in the pathway based on this number; and 2) ranking all selected SNPs by the significance of their statistical association with a trait of interest, and testing if the set of SNPs from a particular pathway is significantly enriched with high ranks using a weighted Kolmogorov-Smirnov test. We applied our method to two large genetically distinct GWAS data sets of schizophrenia, one from European-American (EA) and the other from African-American (AA). In the EA data set, we found 22 pathways with nominal P-value less than or equal to 0.001 and corresponding false discovery rate (FDR) less than 5%. In the AA data set, we found 11 pathways by controlling the same nominal P-value and FDR threshold. Interestingly, 8 of these pathways overlap with those found in the EA sample. We have implemented our method in a JAVA software package, called SNP Set Enrichment Analysis (SSEA), which contains a user-friendly interface and is freely available at http://cbcl.ics.uci.edu/SSEA.
The SNP-based pathway enrichment method described here offers a new alternative approach for analysing GWAS data. By applying it to schizophrenia GWAS studies, we show that our method is able to identify statistically significant pathways, and importantly, pathways that can be replicated in large genetically distinct samples.
Genes play a well-documented role in determining normal cognitive function. This paper focuses on reviewing strategies for the identification of common genetic variation in genes that modulate normal and abnormal cognition with a genome-wide association scan (GWAS). GWASs make it possible to survey the entire genome to discover important but unanticipated genetic influences.
The use of a quantitative phenotype in combination with a GWAS provides many advantages over a case-control design, both in power and in physiological understanding of the underlying cognitive processes. We review the major features of this approach, and show how, using a General Linear Model method, the contribution of each Single Nucleotide Polymorphism (SNP) to the phenotype is determined, and adjustments then made for multiple tests. An example of the strategy is presented, in which fMRI measures of cortical inefficiency while performing a working memory task is used as the quantitative phenotype. We estimate power under different effect sizes (10 to 30%) and variations in allelic frequency for a quantitative trait (10 to 20%), and compare them to a case-control design with an Odds Ratio (OR) of 1.5, showing how a QT approach is superior to a traditional case-control. In the presented example, this method identifies putative susceptibility genes for schizophrenia which affect prefrontal efficiency and have functions related to cell migration, forebrain development and stress response,
The use of Quantitative Traits (QT) as phenotypes provide increased statistical power over categorical association approaches and when combined with a GWAS creates a strategy for identification of unanticipated genes that modulate cognitive processes and cognitive disorders.
cognition; imaging phenotype; endophenotype; quantitative trait; GWAS; power; permutation; replication
Background and aim
Inflammation has been extensively implicated in the pathogenesis of Alzheimer's disease (AD). Although there is evidence of a key role for cytokines in neuroinflammation processes, so far the proinflammatory cytokine interleukin (IL)‐18 has not been associated with AD. The aim of this study was to investigate the impact of two polymorphisms of the human IL‐18 gene promoter at positions −607 (C/A) and −137 (G/C) on both susceptibility to and progression of AD.
The results revealed that the genotype distribution of the −607 (C/A) polymorphism was different between patients with AD and control subjects (χ2 = 7.99, df = 2, p = 0.0184). In particular, carriers of the CC genotype were at increased risk of developing AD (OR 2.33; 95% CI 1.29 to 4.22; p = 0.0052). The observed genotypes were in Hardy–Weinberg equilibrium, as for the −607 polymorphism, whereas the −137 polymorphism appeared in Hardy–Weinberg disequilibrium only in the patient group (p = 0.0061). Finally, in a 2 year follow‐up study, the −137 CC genotype was strongly and specifically associated with a faster cognitive decline (F = 4.024; df = 4,192; p = 0.0037 for time by IL‐18 −137 G/C group interaction) with no interaction effect with the apolipoprotein E ε4/non‐ε4 allele presence.
As IL‐18 cytokine promoter gene polymorphisms have been previously described to have functional consequences on IL‐18 expression, it is possible that individuals with a prevalent IL‐18 gene variant have a dysregulated immune response, suggesting that IL‐18 mediated immune mechanisms may play a crucial role in AD.
The cause of sporadic amyotrophic lateral sclerosis (ALS) is largely unknown, but genetic factors are thought to play a significant role in determining susceptibility to motor neuron degeneration. To identify genetic variants altering risk of ALS, we undertook a two-stage genome-wide association study (GWAS): we followed our initial GWAS of 545 066 SNPs in 553 individuals with ALS and 2338 controls by testing the 7600 most associated SNPs from the first stage in three independent cohorts consisting of 2160 cases and 3008 controls. None of the SNPs selected for replication exceeded the Bonferroni threshold for significance. The two most significantly associated SNPs, rs2708909 and rs2708851 [odds ratio (OR) = 1.17 and 1.18, and P-values = 6.98 × 10−7 and 1.16 × 10−6], were located on chromosome 7p13.3 within a 175 kb linkage disequilibrium block containing the SUNC1, HUS1 and C7orf57 genes. These associations did not achieve genome-wide significance in the original cohort and failed to replicate in an additional independent cohort of 989 US cases and 327 controls (OR = 1.18 and 1.19, P-values = 0.08 and 0.06, respectively). Thus, we chose to cautiously interpret our data as hypothesis-generating requiring additional confirmation, especially as all previously reported loci for ALS have failed to replicate successfully. Indeed, the three loci (FGGY, ITPR2 and DPP6) identified in previous GWAS of sporadic ALS were not significantly associated with disease in our study. Our findings suggest that ALS is more genetically and clinically heterogeneous than previously recognized. Genotype data from our study have been made available online to facilitate such future endeavors.
Background: Genome-wide association studies (GWASs) are increasingly used to identify risk genes for complex illnesses including schizophrenia. These studies may require thousands of subjects to obtain sufficient power. We present an alternative strategy with increased statistical power over a case-control study that uses brain imaging as a quantitative trait (QT) in the context of a GWAS in schizophrenia. Methods: Sixty-four subjects with chronic schizophrenia and 74 matched controls were recruited from the Functional Biomedical Informatics Research Network (FBIRN) consortium. Subjects were genotyped using the Illumina HumanHap300 BeadArray and were scanned while performing a Sternberg Item Recognition Paradigm in which they learned and then recognized target sets of digits in an functional magnetic resonance imaging protocol. The QT was the mean blood oxygen level–dependent signal in the dorsolateral prefrontal cortex during the probe condition for a memory load of 3 items. Results: Three genes or chromosomal regions were identified by having 2 single-nucleotide polymorphisms (SNPs) each significant at P < 10−6 for the interaction between the imaging QT and the diagnosis (ROBO1-ROBO2, TNIK, and CTXN3-SLC12A2). Three other genes had a significant SNP at <10−6 (POU3F2, TRAF, and GPC1). Together, these 6 genes/regions identified pathways involved in neurodevelopment and response to stress. Conclusion: Combining imaging and genetic data from a GWAS identified genes related to forebrain development and stress response, already implicated in schizophrenic dysfunction, as affecting prefrontal efficiency. Although the identified genes require confirmation in an independent sample, our approach is a screening method over the whole genome to identify novel SNPs related to risk for schizophrenia.
genome-wide scan; schizophrenia; working memory; genes; DLPFC; fMRI
Adducins are cytoskeletal actin-binding proteins (α, β, γ) that function as heterodimers and heterotetramers and are encoded by distinct genes. Experimental and clinical evidence implicates α- and β-adducin variants in hypertension and renal dysfunction. Here, we have addressed the role of α- and β-adducin on glomerular function and disease using β-adducin null mice, congenic substrains for α- and β-adducin from the Milan hypertensive (MHS) and Milan normotensive (MNS) rats and patients with IgA nephropathy. Targeted deletion of β-adducin in mice reduced urinary protein excretion, preceded by an increase of podocyte protein expression (phospho-nephrin, synaptopodin, α-actinin, ZO-1, Fyn). The introgression of polymorphic MHS β-adducin locus into MNS (Add2, 529R) rats was associated with an early reduction of podocyte protein expression (nephrin, synaptopodin, α-actinin, ZO-1, podocin, Fyn), followed by severe glomerular and interstitial lesions and increased urinary protein excretion. These alterations were markedly attenuated when the polymorphic MHS α-adducin locus was also present (Add1, 316Y). In patients with IgA nephropathy, the rate of decline of renal function over time was associated to polymorphic β-adducin (ADD2, 1797T, rs4984) with a significant interaction with α-adducin (ADD1, 460W, rs4961). These findings suggest that adducin genetic variants participate in the development of glomerular lesions by modulating the expression of specific podocyte proteins.
Electronic supplementary material
The online version of this article (doi:10.1007/s00109-009-0549-x) contains supplementary material, which is available to authorized users.
Adducin; Genetic renal disease; Glomerular disease; IgA nephropathy; Podocytes; Proteinuria
With the exception of APOE ε4 allele, the common genetic risk factors for sporadic Alzheimer's Disease (AD) are unknown.
Methods and Findings
We completed a genome-wide association study on 381 participants in the ADNI (Alzheimer's Disease Neuroimaging Initiative) study. Samples were genotyped using the Illumina Human610-Quad BeadChip. 516,645 unique Single Nucleotide Polymorphisms (SNPs) were included in the analysis following quality control measures. The genotype data and raw genetic data are freely available for download (LONI, http://www.loni.ucla.edu/ADNI/Data/). Two analyses were completed: a standard case-control analysis, and a novel approach using hippocampal atrophy measured on MRI as an objectively defined, quantitative phenotype. A General Linear Model was applied to identify SNPs for which there was an interaction between the genotype and diagnosis on the quantitative trait. The case-control analysis identified APOE and a new risk gene, TOMM40 (translocase of outer mitochondrial membrane 40), at a genome-wide significance level of≤10−6 (10−11 for a haplotype). TOMM40 risk alleles were approximately twice as frequent in AD subjects as controls. The quantitative trait analysis identified 21 genes or chromosomal areas with at least one SNP with a p-value≤10−6, which can be considered potential “new” candidate loci to explore in the etiology of sporadic AD. These candidates included EFNA5, CAND1, MAGI2, ARSB, and PRUNE2, genes involved in the regulation of protein degradation, apoptosis, neuronal loss and neurodevelopment. Thus, we identified common genetic variants associated with the increased risk of developing AD in the ADNI cohort, and present publicly available genome-wide data. Supportive evidence based on case-control studies and biological plausibility by gene annotation is provided. Currently no available sample with both imaging and genetic data is available for replication.
Using hippocampal atrophy as a quantitative phenotype in a genome-wide scan, we have identified candidate risk genes for sporadic Alzheimer's disease that merit further investigation.
Recent progresses in genotyping technologies allow the generation high-density genetic maps using hundreds of thousands of genetic markers for each DNA sample. The availability of this large amount of genotypic data facilitates the whole genome search for genetic basis of diseases.
We need a suitable information management system to efficiently manage the data flow produced by whole genome genotyping and to make it available for further analyses.
We have developed an information system mainly devoted to the storage and management of SNP genotype data produced by the Illumina platform from the raw outputs of genotyping into a relational database.
The relational database can be accessed in order to import any existing data and export user-defined formats compatible with many different genetic analysis programs.
After calculating family-based or case-control association study data, the results can be imported in SNPLims. One of the main features is to allow the user to rapidly identify and annotate statistically relevant polymorphisms from the large volume of data analyzed. Results can be easily visualized either graphically or creating ASCII comma separated format output files, which can be used as input to further analyses.
The proposed infrastructure allows to manage a relatively large amount of genotypes for each sample and an arbitrary number of samples and phenotypes. Moreover, it enables the users to control the quality of the data and to perform the most common screening analyses and identify genes that become “candidate” for the disease under consideration.