|Home | About | Journals | Submit | Contact Us | Français|
Exome sequencing studies of autism spectrum disorders (ASDs) have identified many de novo mutations, but few recurrently disrupted genes. We therefore developed a modified molecular inversion probe method enabling ultra-low-cost candidate gene resequencing in very large cohorts. To demonstrate the power of this approach, we captured and sequenced 44 candidate genes in 2,446 ASD probands. We discovered 27 de novo events in 16 genes, 59% of which are predicted to truncate proteins or disrupt splicing. We estimate that recurrent disruptive mutations in six genes—CHD8, DYRK1A, GRIN2B, TBR1, PTEN, and TBL1XR1—may contribute to 1% of sporadic ASDs. Our data support associations between specific genes and reciprocal subphenotypes (CHD8-macrocephaly, DYRK1A-microcephaly) and replicate the importance of a β-catenin/chromatin remodeling network to ASD etiology.
There is considerable interest in the contribution of rare variants and de novo mutations to the genetic basis of complex phenotypes such as autism spectrum disorders (ASD). However, because of extreme genetic heterogeneity, the sample sizes required to implicate any single gene in a complex phenotype are extremely large (1). Exome sequencing has identified hundreds of ASD candidate genes on the basis of de novo mutations observed in the affected offspring of unaffected parents (2–6). Yet, only a single mutation was observed in nearly all such genes, and sequencing of over 900 trios was insufficient to establish mutations at any single gene as definitive genetic risk factors (2–6).
To address this, we sought to evaluate candidate genes identified by exome sequencing (2, 3) for de novo mutations in a much larger ASD cohort. We developed a modified molecular inversion probe (MIP) strategy (Fig. 1A) (7–9) with novel algorithms for MIP design; an optimized, automatable workflow with robust performance and minimal DNA input; extensive multiplexing of samples while sequencing; and reagent costs of less than $1 per gene per sample. Extensive validation using several probe sets and sample collections demonstrated 99% sensitivity and 98% positive predictive value for single nucleotide variants at well-covered positions i.e., 92% to 98% of targeted bases (figs. S1–S7 and tables S1–S9) (10).
We applied this method to 2,494 ASD probands from the Simons Simplex Collection (SSC) (11) using two probe sets [ASD1 (6 genes) and ASD2 (38 genes)] to target 44 ASD candidate genes (12). Preliminary results using ASD1 on a subset of the SSC implicated GRIN2B as a risk locus (3). The 44 genes were selected from 192 candidates (2, 3), focusing on genes with disruptive mutations, associations with syndromic autism (13), overlap with known or suspected neurodevelopmental CNV risk loci (13, 14), structural similarities, and/or neuronal expression (table S3). Although a few of the 44 genes have been reported disrupted in individuals with neurodevelopmental or neuropsychiatric disorders (often including concurrent dysmorphologies), their role in so-called idiopathic ASD has not been rigorously established. Twenty-three of the 44 genes intersect a 49-member β-catenin/chromatin remodeling protein-protein interaction (PPI) network (2) or an expanded 74-member network (figs. S8 and S9) (3, 4).
We required samples to successfully capture with both probe sets, yielding 2,446 ASD probands with MIP data, 2,364 of which had only MIP data and 82 of which we previously exome sequenced (2, 3). The high GC content of several candidates required considerable rebalancing to improve capture uniformity (12) (figs. S3A and S10). Nevertheless, the reproducible behavior of most MIPs allowed us to identify copy number variation at targeted genes, including several inherited duplications (figs. S11 and S12 and table S10).
To discover de novo mutations, we first identified candidate sites by filtering against variants observed in other cohorts, including non-ASD exomes and MIP-based resequencing of 762 healthy, non-ASD individuals (12). The remaining candidates were further tested by MIP-based resequencing of the proband’s parents and, if potentially de novo, confirmed by Sanger sequencing of the parent-child trio (10, 12). We discovered 27 de novo mutations that occurred in 16 of the 44 genes (Fig. 1, B–E; Table 1; and table S11). Consistent with an increased sensitivity for MIP-based resequencing, six of these were not reported in exome-sequenced individuals (Table 1, tables S5 and S11, and fig. S13) (3, 4, 6). Notably, the proportion of de novo events that are severely disruptive, i.e., coding indels, nonsense mutations, and splice-site disruptions (17/27 or 0.63), is fourfold greater than the expected proportion for random de novo mutations (0.16, binomial p = 4.9×10−8) (table S12) (15).
Given their extremely low frequency, accurately establishing expectation for de novo mutations in a locus-specific manner through the sequencing of control trios is impractical. We therefore developed a probabilistic model that incorporates the overall rate of mutation in coding sequences, estimates of relative locus-specific rates based on human-chimpanzee fixed differences (fig. S14 and table S13), and other factors that may influence the distribution of mutation classes, e.g., codon structure (12). We applied this model to estimate (by simulation) the probability of observing additional de novo mutations during MIP-based resequencing of the SSC cohort. To compare expectation and observation, we treated missense mutations as one class and severe disruptions as a second class. Thus, we could evaluate the probability at a given locus of observing at least X de novo mutations, of which at least Y belong to the severe class.
We found evidence of mutation burden—a higher rate of de novo mutation than expected—in the overall set of 44 genes (observed n = 27 vs. mean expected n = 5.6, simulated p < 2×10−9) (Fig. 2A). The burden was driven by the severe class (observed n = 17 vs. mean expected n = 0.58, simulated p < 2×10−9). Most severe class mutations intersected the 74-member PPI network (16/17), although only 23/44 genes are in this network (binomial p = 0.0002) (12). Furthermore, 21/27 mutations occurred in network-associated genes (binomial p = 0.004). Of the six individual genes (CHD8, GRIN2B, DYRK1A, PTEN, TBR1, and TBL1XR1) with evidence of mutation burden (alpha of 0.05 after a Holm-Bonferroni correction for multiple testing (Fig. 2A); TBL1XR1 is not significant with a more conservative Bonferroni correction), five fall within the β-catenin/chromatin remodeling network. In our combined MIP and exome data, ~1% (24/2,573) of ASD probands harbor a de novo mutation in one of these six genes, with CHD8 representing 0.35% (9/2,573) (Fig. 1B and Table 1).
For these analyses, we conservatively used the highest available empirical estimate of the overall mutation rate in coding sequences (3). With the exception of TBL1XR, these results were robust to doubling the overall mutation rate, or to using the upper bound of the 95% confidence interval of the locus-specific rate estimate for each of these genes (10). Moreover, we obtained similar results regardless of whether parameters were estimated from rare, segregating variation or from de novo mutations in unaffected siblings (10), as well as with a sequence composition model based on genome-wide de novo mutation (16). Exome sequencing of non-ASD individuals (unaffected siblings or non-ASD cohorts) further support these conclusions (table S14) (10).
We also validated 23 inherited, severely disruptive variants in the 44 genes (table S15). Two probands with such variants carry de novo 16p11.2 duplications (table S16). Combining de novo and inherited events, severe class variants were observed at twice the rate in MIP-sequenced probands as compared with MIP-sequenced healthy, non-ASD individuals (Fisher’s exact test, p = 0.083). Severe class variants were not transmitted to 14/20 unaffected siblings (binomial p = 0.058) (table S15). However, larger cohorts than currently exist will be needed to fully evaluate these modest trends.
We analyzed phenotypic data on probands with mutations in the six implicated genes. Each was diagnosed with autism on the basis of current, strict, gold-standard criteria. No obvious dysmorphologies or recurrent comorbidities were present. Probands tended to fall into the intellectual disability range for nonverbal IQ (NVIQ) (mean 58.3) (Table 1). However, for CHD8, probands were found to have NVIQ scores ranging from profoundly impaired to average (mean 62.2, range 19–98).
Given the previously observed microcephaly in our index DYRK1A mutation case, macrocephaly in both probands with CHD8 mutations (3), and the association of these traits with other syndromic loci (13, 17), we reexamined head circumference (HC) in the larger set of probands with protein-truncation or splice-site de novo events using age and sex normalized HC Z-scores (12) (Fig. 2B). For CHD8 (n = 8), we observed significantly larger head sizes relative to individuals screened without CHD8 mutations (two-sample permutation test, two-sided p = 0.0007). De novo CHD8 mutations are present in ~2% of macrocephalic (HC > 2.0) SSC probands (n = 366), suggesting a useful phenotype for patient subclassification. For DYRK1A (n = 3), we observed significantly smaller head sizes relative to individuals screened without DYRK1A mutations (two-sample permutation test, two-sided p = 0.0005). Comparison of head size in the context of the families (Fig. 2, C and D, and table S17) provides further support for this reciprocal trend (10). These findings are also consistent with case reports of patients with structural rearrangements and mouse transgenic models that implicate DYRK1A and CHD8 as regulators of brain growth (18–21). Macrocephaly was also observed in individuals with de novo and inherited PTEN mutations (22).
Our data support an important role for de novo mutations in six genes in ~1% of sporadic ASD. As the SSC was specifically established for simplex ASD and as its probands generally have higher cognitive functioning than has been reported in other ASD cohorts (11), it is unknown how our findings will translate into other cohorts. Furthermore, while implicating specific loci in ASD, our data are insufficient to evaluate whether the observed de novo mutations are sufficient to cause ASD (tables S16 and S18).
Exome sequencing and CNV studies suggest that there are hundreds of relevant genetic loci for ASD. Technologies and study designs directed at identifying de novo mutations, both for the discovery of ASD candidate genes as well as for their validation, provide sufficient power to implicate individual genes from a relatively small number of events. The analytical framework described here can be applied to any other disorder—simple or complex—for which de novo coding mutations are suspected to contribute to risk. Additionally, the experimental methods presented here are broadly useful for the rapid and economical resequencing of candidate genes in extremely large cohorts, as may be required for the definitive implication of rare variants or de novo mutations in any genetically complex disorder.
We thank the NHLBI GO Exome Sequencing Project and its ongoing studies which produced and provided exome variant calls for comparison: the Lung GO Sequencing Project (HL-102923), the WHI Sequencing Project (HL-102924), the Broad GO Sequencing Project (HL-102925), the Seattle GO Sequencing Project (HL-102926) and the Heart GO Sequencing Project (HL-103010), Benjamin Vernot, Megan Dennis, Tonia Brown, and other members of the Eichler and Shendure labs for helpful discussions. We are grateful to all of the families at the participating Simons Simplex Collection (SSC) sites, as well as the principal investigators (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne, D. Geschwind, R. Goin-Kochel, E. Hanson, D. Grice, A. Klin, D. Ledbetter, C. Lord, C. Martin, D. Martin, R. Maxim, J. Miles, O. Ousley, K. Pelphrey, B. Peterson, J. Piggot, C. Saulnier, M. State, W. Stone, J. Sutcliffe, C. Walsh, Z. Warren, E. Wijsman). We appreciate obtaining access to phenotypic data on SFARI Base. Approved researchers can obtain the SSC population dataset described in this study (https://ordering.base.sfari.org/~browse_collection/archive[ssc_v13]/ui:view) by applying at https://base.sfari.org. This work was supported by a grant from the Simons Foundation (SFARI 137578, 191889 to E.E.E., J.S., and R.B.), NIH HD065285 (E.E.E. and J.S.), NIH NS069605 (H.C.M), and R01 NS 064077 (D.D.). E.B. is an Alfred P. Sloan Research Fellow. E.E.E. is an Investigator of the Howard Hughes Medical Institute. Scientific advisory broad or consulting affiliations: Ariosa Diagnostics (J.S), Stratos Genomics (J.S), Good Start Genetics (J.S), Adaptive Biotechnologies (J.S), Pacific Biosciences (E.E.E), SynapDx (E.E.E.), DNAnexus (E.E.E.), and SFARI GENE (H.C.M.). B.J.O. is an inventor on patent PCT/US2009/30620: Mutations in Contactin Associated Protein 2 are Associated with Increased Risk for Idiopathic Autism. Raw sequencing data available at the National Database for Autism Research, NDARCOL1878.