|Home | About | Journals | Submit | Contact Us | Français|
Autism spectrum disorders (ASDs) represent a group of childhood neurodevelopmental and neuropsychiatric disorders characterized by deficits in verbal communication, impairment of social interaction, and restricted and repetitive patterns of interests and behaviour. To identify common genetic risk factors underlying ASDs, here we present the results of genome-wide association studies on a cohort of 780 families (3,101 subjects) with affected children, and a second cohort of 1,204 affected subjects and 6,491 control subjects, all of whom were of European ancestry. Six single nucleotide polymorphisms between cadherin 10 (CDH10) and cadherin 9 (CDH9)—two genes encoding neuronal cell-adhesion molecules—revealed strong association signals, with the most significant SNP being rs4307059 (P = 3.4 × 10−8, odds ratio = 1.19). These signals were replicated in two independent cohorts, with combined P values ranging from 7.4 × 10−8 to 2.1 × 10−10. Our results implicate neuronal cell-adhesion molecules in the pathogenesis of ASDs, and represent, to our knowledge, the first demonstration of genome-wide significant association of common variants with susceptibility to ASDs.
ASDs encompass a range of clinically defined conditions, including autism and pervasive developmental disorder not otherwise specified, which are more common and severe, as well as Asperger's syndrome, which appears less frequently and is milder1. ASDs are about four times more common in boys than girls, and at present around 1 in 150 children in the United States have a diagnosis of an ASD2. Several sources of evidence suggest that strong genetic components are involved in susceptibility to ASDs: there are much higher concordance rates of ASDs in monozygotic twins (92%) than dizygotic twins (10%)3, and recent estimate of the sibling recurrence risk ratio (λs) is 22 for autism4. Despite being highly heritable, ASDs show heterogeneous clinical symptoms and genetic architecture, which have hindered the identification of common genetic susceptibility factors5. Although previous linkage studies, candidate gene association studies and cytogenetic studies have implicated several chromosomal regions for the presence of autism susceptibility loci6-9, they have failed to consistently identify and replicate common genetic variants that increase the risk of ASDs.
Besides well-known genetic conditions reported in ASDs, recent studies have identified a growing number of distinct and individually rare genetic causes, suggesting that the genetic architecture of ASDs may have a significant contribution from heterogeneous rare variants. For example, rare de novo copy number variants have been implicated in 7% of families with ASDs, but only in 1% of control families10. In addition, 16p11.2 microdeletions and microduplications have been found in approximately 1% of autism cases11,12. Several hundred rare structural variations have also been catalogued in families with ASDs13. Although these reported variants indicate a role for rare genomic variation in a proportion of families, no common variants have been previously associated with ASDs in genome-wide studies. The latter is consistent with reports from previous genome-wide association studies of other neuropsychiatric disorders, including bipolar disorder14,15, schizophrenia16 and attention deficit/hyperactivity disorder16, all of which have failed to identify common susceptibility loci with genome-wide significance, when individual data sets with small sample sizes were analysed. However, recent meta-analysis reported evidence for common variants in both schizophrenia17 and bipolar disorder18, suggesting that the search for common genetic variation that confer susceptibility to ASDs may benefit from the combined analysis of several studies.
To identify common genetic risk factors for ASDs, we carried out a genome-wide association study on 943 ASDs families (4,444 subjects) from the Autism Genetic Resource Exchange (AGRE cohort, Table 1)19. The subjects with ASDs in the AGRE cohort were diagnosed using both the Autism Diagnostic Interview-Revised (ADI-R)20 and Autism Diagnostic Observation Schedule (ADOS)21 diagnostic tools, which are the gold standard diagnostic tools for individuals with ASDs. All subjects were genotyped using the Illumina HumanHap550 BeadChip with over 550,000 single nucleotide polymorphism (SNP) markers. We applied stringent quality control criteria (Supplementary Methods), including call rates, Mendelian inconsistencies and genetically inferred ancestry, to identify a set of 3,101 subjects of European ancestry in 780 AGRE families for association tests. We performed analysis with the Pedigree Disequilibrium Test (PDT)22 for autosomes, and with X-APL23 for the X chromosome, using genotypes from 486,864 markers. The complete sets of SNP genotype data and signal intensity data were released to the academic research community in April 2008 (http://www.agre.org).
We did not observe genome-wide significant association (P < 5 × 10−8) to ASDs in the AGRE cohort, but we proposed that meaningful associations were contained within the lowest P values. To boost power for identifying these associations, we examined an Autism Case-Control cohort (ACC cohort, Table 1), comprising 1,453 subjects with ASDs from several US sites, and 7,070 control subjects without ASDs from the Children's Hospital of Philadelphia, who were also genotyped on the same platform. The subjects with ASDs in this cohort were diagnosed using the ADI and ADOS tools. After conducting thorough quality control measures on the geno-types, association analyses were conducted on 1,204 subjects with ASDs and 6,491 control subjects of inferred European ancestry. We did not detect genome-wide significant association (P < 5 × 10−8) to ASDs in the ACC cohort either. Therefore, we subsequently performed a combined analysis of these two independent data sets using recommended meta-analysis approaches24. From examining autosomes and the X chromosome, one SNP located on 5p14.1 reached genome-wide significance (rs4307059, P = 3.4 × 10−8), and five further SNPs at the same locus had Pvalues below 1 × 10−4 (Table 2 and Fig. 1a). Several other loci contain SNPs with suggestive association signals (Table 3), such as 13q33.3 (near MYO16 (myosin XVI)), 14q21.1 (between FBXO33 (F-box protein 33) and LRFN5 (leucine rich repeat and fibronectin type III domain containing 5)) and Xp22.32 (between PRKX (protein kinase, X-linked) and NLGN4X (neuroligin 4, X-linked)). We also analysed ten markers on the Y chromosome in the ACC cohort, with the most significant SNP being rs2032597 (P = 1.1 × 10−4) located within USP9Y (ubiquitin specific protease 9, Y-linked) (Supplementary Table 1). Furthermore, we have analysed 15 markers in pseudoautosomal regions of sex chromosomes in the two discovery cohorts, but no markers showed evidence of association (Supplementary Table 2).
To identify other variants that associate with ASDs but were not captured by the SNP genotyping array, we analysed the discovery cohorts using whole-genome imputed genotypes on autosomes generated by the MACH software (Supplementary Methods). The most significant association signals were still those in the 5p14.1 region. However, several other genomic loci, such as 10q21.3 (within CTNNA3 (catenin, alpha 3)) and 16p13.2 (between A2BP1 (ataxin 2-binding protein 1) and C16orf68 (chromosome 16 open reading frame 68)), contain imputed SNPs with suggestive association signals (Table 3). Follow-up studies with larger sample sizes are required to determine whether these represent genuine ASD susceptibility loci.
To replicate our genome-wide significant association signals at the 5p14.1 locus, we examined the association statistics for these markers in a third independently generated and analysed cohort, including 1,390 subjects from 447 autism families genotyped with ~1 million markers on the Illumina HumanHap1M BeadChip (CAP cohort, Table 1). The association signals for all the aforementioned SNPs were replicated in this cohort with the same direction of association, with Pvalues ranging from 0.01 to 2.8 × 10−5 (Table 2). To seek further evidence of replication, we examined association statistics from a fourth independent cohort of 108 ASD cases and 540 genetically matched control subjects, genotyped on the Illumina HumanCNV370 BeadChip, an array with over 300,000 SNP markers (CART cohort, Table 1). Because rs7704909 and rs10038113 were not present in this array platform, we analysed association on imputed genotypes. Most of the SNPs were replicated (P < 0.05) in the CART cohort with the same direction of association (Table 2). Combined analysis on all four data sets indicates that all six SNPs are associated with ASDs, with Pvalues ranging from 7.4 × 10−8 to 2.1 × 10−10 (Table 2 and Supplementary Table 3). Taken together, several sources of converging evidence firmly established that common genetic variants on 5p14.1 confer susceptibility to ASDs.
Closer examination of the 5p14.1 region indicated that all genotyped and imputed SNPs with P values below 1 × 10−7 reside within the same ~100 kilobase (kb) linkage disequilibrium block, suggesting that these SNPs are tagging the same variants (Supplementary Figs 1 and 2). The linkage disequilibrium block is located within a 2.2-megabase (Mb) intergenic region between CDH10 (cadherin 10) and CDH9 (cadherin 9) (Fig. 1b, c). Both CDH10 and CDH9 encode type II classical cadherins from the cadherin superfamily, which represent transmembrane proteins that mediate calcium-dependent cell-cell adhesion. To search for other types of variants, including copy number variations (CNVs), in the intergenic region, we used the PennCNV software25 on the signal intensity data and identified five CNV loci (Supplementary Fig. 3). All of these CNVs are present in control subjects in our study, and three of the five CNVs are also reported in the Database for Genomic Variants that annotates healthy individuals (Supplementary Fig. 4), suggesting that CNVs in the region are unlikely to be causal variants for ASDs.
We next focused on the ~100 kb linkage disequilibrium block containing the most significant SNPs, and determined whether other transcripts or functional elements are located in the block. By examining the UCSC Genome Browser annotations26, we did not identify predicted genes, predicted transcription start sites, spliced human expressed sequence tag (EST) sequences, known microRNA genes or predicted microRNA targets that overlap with the linkage disequilibrium block (Supplementary Fig. 5). However, we note that the linkage disequilibrium block contains several highly conserved genomic elements, including a 849-base pair (bp) element that ranks as the top 0.026% most-conserved elements in the entire human genome (log odds (LOD) score 5 3,480 by PhastCons27, Fig. 1b). Consistent with previous reports that large stable gene deserts typically contain regulatory elements for genes involved in development or transcription28, we hypothesized that these tagging SNPs were capturing the association of functional variants that regulate the expression and action of either CDH10 or CDH9.
Because CDH10 and CDH9 are expressed at low levels in non-neural tissues (Supplementary Figs 6 and 7), we evaluated their messenger RNA distribution in human fetal brain by in situ hybridization. Multiple sagittally sectioned human fetal brains, each between 19 and 20 weeks gestation, were hybridized with riboprobes against CDH10 or CDH9. Results for CDH9, showing uniformly low levels of expression at the time points evaluated, were largely uninformative. In contrast, a marked pattern of enrichment for CDH10 was observed in the frontal cortex (Fig. 2a)—a region known to be important in ASDs. The expression pattern was similar to that for CNTNAP2 (contactin-associated protein-like 2)29, a molecule now well-established to be involved in the ASDs1. These results are consistent with previous work showing high levels of CDH10 in the human fetal brain30 and a prominent enrichment of Cdh10 mRNA in the anterior cortical plate of the developing mouse brain31.
To examine whether the SNP genotypes associate with gene expression for CDH10 and CDH9, we next examined the SNPExpress database32 that profiles gene expression in 93 human cortical brain tissues from genotyped subjects. However, none of the SNPs in Table 2 was associated with expression levels for either CDH9 (P = 0.92 for rs4307059) or CDH10 (P = 0.86 for rs4307059) (Fig. 2b). Although the small sample size may not have sufficient power to detect subtle effect sizes, it is also possible that the causal variants regulate gene expression only in the developing brain, or that the causal variants target an unidentified functional element, similar to the variants reported in the intergenic region on 8q24, which have been implicated in various cancers33,34.
Recent genetic studies have identified several neuronal cell-adhesion genes, including NRXN1 (neurexin 1)35,36, CNTNAP2 (refs 37-39) and PCDH10 (protocadherin 10)40, as potentially disrupted in rare ASD cases. Cadherins represent a large group of transmembrane proteins that are involved in cell adhesion and the generation of synaptic complexity in the developing brain41. In light of the information described earlier, we note that several other cadherin genes were also tagged by the top 1,000 most significant SNPs of the combined discovery cohorts (Supplementary Table 4). In addition, SNPs surrounding several prominent ASD candidate loci1, including CACNA1C (L type voltage-gated calcium channel), CNTNAP2, GRIK2 (glutamate receptor, ionotropic, kainate 2), NRXN1 and NLGN4X, also show suggestive evidence of association (Supplementary Table 5). These sources of evidence indicate a potential role for cell-adhesion molecules in the pathogenesis of ASDs.
To examine if cell-adhesion molecules, as a gene family, associate with ASDs, we applied two pathway-based association approaches on the genotype data (Supplementary Methods). First, we assign each SNP to the overlapping or the closest gene, summarize the significance of each gene using the Simes-adjusted P value42 from its SNPs, and then test whether the distribution of P values differ between a group of genes and all other genes using a nonparametric rank sum test. Using the combined P values from the two discovery cohorts, we found that a group of 25 related cadherin genes show more significant association with ASDs than all other genes (P = 0.02), whereas a stronger enrichment signal (P = 0.004) was obtained when the 25 cadherin genes were combined with eight neurexin family genes (NRXN1 to NRXN3, CNTNAP1 to CNTNAP5). Second, we analysed the ACC cohort using a formal pathway-association method for case-control data sets43. This method examines whether statistics for a group of genes have modest yet consistent deviation from what is expected by chance, through shuffling case/control labels many times, each time recalculating P values for all SNPs. We confirmed that the set of cadherin genes is associated with ASDs (permutation P = 0.02), whereas the combined cadherin/neurexin genes show more significant association (permutation P = 0.002). Therefore, our pathway analysis suggests that neuronal cell-adhesion molecules may be collectively associated with ASDs.
Besides recent genetic findings supporting the role of neuronal cell-adhesion molecules in the pathogenesis of autism, an increasing number of functional neuroimaging studies have suggested the presence of cortical underconnectivity in subjects with ASDs44,45. Furthermore, neuroanatomy studies have implicated abnormal brain development of the frontal lobes in autism46,47. The genetic findings, when coupled with anatomical and functional imaging studies, convergently indicate that ASDs may result from structural and functional disconnection of brain regions that are involved in higher-order associations48-50, suggesting that ASDs may represent a neuronal disconnection syndrome.
In the current study, we have completed a genetic analysis in a large number of ASD cases and families, with a combined sample set of more than 10,000 subjects of European ancestry. We have identified and replicated common genetic variants on 5p14.1 that are associated with susceptibility to ASDs. Besides the potential roles of the nearby CDH10 and CDH9 genes, pathway-based association analysis lend further support to neuronal cell-adhesion molecules in conferring susceptibility to ASDs, suggesting that specific genetic variants in this gene class may be involved in shaping the physical structure and functional connectivity of the brain, that leads to the clinical manifestations of ASDs. Apart from highlighting the genetic complexity of ASDs and the need for large sample sizes in unveiling their genetic causes, our study represents a successful application of the genome-wide association approach in identifying common susceptibility alleles, as part of a larger effort to interrogate the complex genetic architecture of ASDs. Because the genetic aetiologies of ASDs may be linked to the neurobiological components that build and modify connectivity of the brain, by comprehensively identifying the relevant genes, genomic variants and genetic pathways, more focused analysis on gene expression, as well as structural and functional imaging, can be performed on subjects carrying specific genetic defects. Together with studies addressing epigenetic modifications and comprehensive analysis of environmental risk factors, these pieces of information can be better integrated to improve our understanding of the molecular basis of ASDs, and foster the development of early preventive and corrective strategies.
All genome-wide SNP genotyping for the discovery cohorts was performed using the Illumina HumanHap550 BeadChip at the Center for Applied Genomics at the Children's Hospital of Philadelphia. For family-based cohorts, the association tests for markers in autosomes and pseudoautosomal region of sex chromosomes were performed by PDT, whereas tests for markers in the X chromosome were performed by X-APL. For case-control cohorts, the association tests were performed by PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/). Pathway association analysis was performed by GenGen (http://www.openbioinformatics.org/gengen/), using the genotype data. The whole-genome genotype imputation was performed by MACH (http://www.sph.umich.edu/csg/abecasis/MaCH/) on the autosomal markers, on the basis of phased haplotypes (release 22) for the HapMap CEU population (http://ftp.hapmap.org/phasing/2007-08_rel22/). We removed all markers with MACH Rsq measure of less than 0.3, and zeroed out imputed genotypes with a posterior probability of less than 0.9. The case-control association tests for imputed genotypes were performed by SNPTEST (http://www.stats.ox.ac.uk/~marchini/software/gwas/snptest.html), which can handle genotype imputation uncertainty. CNV calls were generated by PennCNV (http://www. openbioinformatics.org/penncnv/) on genotyping signal intensity data. For CNV validation by multiplex ligation-dependent probe amplification (MLPA), we used the Universal Probe Library system from Roche, and all reactions were performed in triplicate with an ABI Prism 7900HT Sequence Detection System (Applied Biosystems). For CNV validation by quantitative PCR (qPCR), TaqMan probes were custom-designed using Primer Express 3.0 (Applied Biosystems). For in situ hybridization, multiple sagittally sectioned human fetal brains were obtained from the Developmental Brain and Tissue Bank at the University of Maryland. Riboprobes against CDH9 or CDH10 were used for hybridization. The SNPExpress database and software (http://people. genome.duke.edu/~dg48/SNPExpress/) were used to examine the genotype-expression relationships.
We gratefully thank all the children with ASDs and their families at the participating study sites who were enrolled in this study and all the control subjects who donated blood samples to Children's Hospital of Philadelphia (CHOP) for genetic research purposes. We also acknowledge the resources provided by the AGRE Consortium (D. H. Geschwind, M. Bucan, W. T. Brown, J. D. Buxbaum, R. M. Cantor, J. N. Constantino, T. C. Gilliam, C. M. Lajonchere, D. H. Ledbetter, C. Lese-Martin, J. Miller, S. F. Nelson, G. D. Schellenberg, C. A. Samango-Sprouse, S. Spence, M. State, R. E. Tanzi) and the participating families. AGRE is a program of Autism Speaks and is at present supported, in part, by grant 1U24MH081810 from the National Institute of Mental Health to C. M. Lajonchere (PI), and formerly by grant MH64547 to D. H. Geschwind (PI). We thank the technical staff at the Center for Applied Genomics at CHOP for producing the genotypes used for analyses, and the nursing, medical assistant and medical staff for their help with recruitment of patient and control subjects for the study. We thank R. Liu and I. Lindquist for helping with CNV validation. We thank D. J. Hedges, H. N. Cukier, J. L. McCauley, G. W. Beecham, H. H. Wright, R. K. Abramson, E. R. Martin and J. P. Hussman for their comments, advice and statistical support, and the laboratory core and the autism clinical personnel at the Miami Institute for Human Genomics and the autism clinical staff at the Vanderbilt Center for Human Genetics Research. A subset of the CAP participants was ascertained while M.A.P.-V. was a faculty member at Duke University. We thank the National Institutes of Health (NIH)-funded Developmental Brain and Tissue Bank at University of Maryland for access to the fetal brain tissues used in these studies (National Institute of Child Health and Human Development Contract no. NO1-HD-4-3368 and NO1-HD-4-3383). All genotyping of the AGRE and ACC cohort was supported by an Institutional Development Award to the Center for Applied Genomics (H.H.) at the Children's Hospital of Philadelphia. The study was supported in part by a Research Award from the Margaret Q. Landenberger Foundation (H.H.), a Research Development Award from the Cotswold Foundation (H.H. and S.F.A.G), UL1-RR024134-03 (H.H.), an Alavi-Dabiri fellowship from Mental Retardation and Developmental Disability Research Center at CHOP (K.W.), the Beatrice and Stanley A. Seaver Foundation (J.D.B.), the Department of Veterans Affairs (G.D.S.), NIH grants HD055782-01 (J.Munson, A.E., O.K., G.D. and G.D.S.), MH0666730 (J.D.B.), MH061009 and NS049261 (J.S.S.), HD055751 (E.H.C.), MH69359, M01-RR00064 and the Utah Autism Foundation (H.C., J.Miller and W.M.M.), MH64547, MH081754 (D.H.G.), HD055784 (D.H.G. and M.S.), NS26630, NS36768, MH080647 and a gift from the Hussman Foundation (M.A.P.-V.), the Autism Genome Project Consortium (B.S.A., J.P., C.W.B., D.H.G., T.H.W., W.M.M., H.C., J.I.N., J.S.S., E.H.C., J.Munson, A.E., O.K., J.D.B., B.D. and G.D.S.) funded by Autism Speaks, the Medical Research Council (UK) and the Health Research Board (Ireland). We also acknowledge the partial support to CAP cohort from the Autism Genome Project.