|Home | About | Journals | Submit | Contact Us | Français|
Research during the past decade has seen significant progress toward a model for the genetic architecture of autism spectrum disorders (ASD), with gene discovery accelerating as the characterization of genomic variation has become increasingly comprehensive. At the same time this research has highlighted ongoing challenges. Here we address the enormous impact of high throughput sequencing (HTS) on ASD gene discovery, outline a consensus view for leveraging this technology, and describe a large multi-site collaboration developed to accomplish these goals. Similar approaches could prove effective for severe neurodevelopmental disorders more broadly.
The past decade has seen tremendous advances in our understanding of the genetic contribution to autism spectrum disorders (ASD). Rapidly evolving genomic technologies combined with the availability of increasingly large study cohorts has led to a series of highly reproducible findings (Betancur, 2011; Devlin et al., 2011; Devlin and Scherer, 2012), highlighting the contribution of rare variation in both DNA sequence and chromosomal structure; placing limits on the risk conferred by individual, common genetic polymorphisms; underscoring the role of de novo germ-line mutation; suggesting a staggering degree of genetic heterogeneity that involves at least hundreds of genes; demonstrating the highly pleiotropic effects of syndrome-associated mutations; and identifying, definitively, an increasing number of specific genes and chromosomal intervals conferring risk. This progress marks a long-awaited emergence of the field from a period of tremendous uncertainty regarding viable approaches to gene discovery. At the same time, the findings underscore the scale of the challenges ahead.
Twin studies have consistently identify a significant genetic component of ASD risk (Hallmayer et al., 2011; Ronald and Hoekstra, 2011) and gene discovery dates back over a decade (Betancur, 2011; Devlin and Scherer, 2012). Recent analyses demonstrate that common polymorphisms carry risk for ASD and indeed they appear to exact substantial influence (Anney et al., 2012; Klei et al., 2012). However, common polymorphisms have so-far proven difficult to identify and replicate, likely because the relative risk conferred by these loci is small, typically <1.2 and cohort sizes have not yet reached those found necessary to clarify risks in other complex psychiatric disorders (Devlin et al., 2011). In contrast, a focus on rare and de novo mutation has already been highly productive in uncovering an appreciable fraction of population risk conferring larger biological effects.
An example of the considerable traction provided by a focus on rare inherited and de novo variation can be found in the earliest successes in ASD genetics. The protein products of risk genes for patients ascertained with non-syndromic ASD, including NLGN4X, NRXN1 and SHANK3, co-localize with those coded for by genes first identified in syndromic subjects, including FMRP, PTEN, and TSC1 and TSC2, at the post-synaptic density in excitatory glutamatergic synapses (note, however, that as gene identification continues, “syndromic” genes are being identified in non-syndromic cases and vice versa). These results are cause for optimism with regard to the prospects for identifying treatments that will have efficacy well beyond the boundaries suggested by mutation-defined subgroups. Moreover the themes highlighted in these studies presaged the current era of gene discovery, not only with regard to the contribution of rare alleles and sporadic germline mutations, but by providing the first concrete evidence of the tremendous pleiotropy and variable penetrance that are now considered characteristic of ASD risk loci.
Analyses of chromosome microarrays have provided compelling evidence that submicroscopic variations in chromosomal structure, called copy number variation or CNV, contribute to ASD risk (Betancur, 2011; Cooper et al., 2011; Pinto et al., 2010; Sanders et al., 2011). Certain CNVs are recurrent, often due to either the presence of low copy repeats or subtelomeric deletions, and within some of these the attendant risk has been related to a single gene (e.g., NRXN1 in 2p16.3, SHANK3 in 22q13.3 deletions and MBD5 in 2q23.1) (Betancur, 2011). With the widespread use of chromosome microarrays in the clinical setting, accompanied by increasingly large-scale analyses of research cohorts, the field is beginning to consolidate population level data for CNV with some clear findings: 1) Between 5–10% of previously unexplained cases will carry an ASD-CNV; 2) both de novo and transmitted CNV confer risk; 3) rare CNV generally confer larger risks than are typically associated with common variants, however, many of these high-risk regions appear to contribute to ASD through a complex pattern of inheritance; and, 4) the majority of confirmed ASD loci show both variable expressivity and pleiotropic effects.
A recent analyses of structural variation in ASD families from the Simons Simplex Collection, focusing on comprehensively assessed quartets of mother, father, ASD proband and unaffected sibling, (Sanders et al., 2011) serves as a useful illustration, Large, rare de novo CNV showed a threefold increase in probands relative their matched siblings, yielding a highly significant difference (Sanders et al., 2011). Moreover the de novo events in probands were found to carry about 10 more genes on average even after accounting for CNV size. Among the many results from these data, one of special salience is that no matter how inherited CNVs were parsed for analysis, no significant difference between probands and siblings emerged, even though there were many more inherited than de novo CNVs. A plausible interpretation of these results is that, in terms of signal to noise, de novo events that alter gene function have a much higher signal-to-noise ratio than inherited CNVs that also effect gene function; put another way, gene-rich de novo CNVs are highly likely to be capturing one or more ASD genes, inherited gene-rich CNVs are less likely on average to harbor ASD genes. Finally, with regard to pursuing biological studies, a drawback of CNVs is the often-large set of genes they cover. Accordingly, if the genetic architecture of sequence variation in ASD mirrored that suggested by CNV, HTS would represent an extremely important addition to the genomic armamentarium
Against this backdrop the Autism Sequencing Consortium (ASC) was formed in 2010, in anticipation both of the tremendous impact HTS would have on ASD genomics as well as the many challenges the field would face as a consequence. Now including more than 20 research groups, the ASC has as its goal collectively exploiting sequencing approaches to resolve a substantial fraction of genetic factors involved in ASD. While there are likely many hundreds of undiscovered ASD loci, emerging data provide sufficient empirical evidence upon which to develop sound approaches to identifying these loci.
From the outset, the effort to constitute the group and define the objectives for the ASC was faced with the challenge of balancing the obvious benefits of working cooperatively across groups with the strongly held conviction that a diversity of approaches and the presence of multiple competing efforts has played, and will continue to play, an indispensable role in the field's rapid progress. The participating investigators undertook an effort to address the range of related issues, including data sharing, prospectively and prior to the widespread availability of HTS data.
In 2011, the ASC held an open meeting of investigators, funders and other stakeholders to refine and crystalize the plans and proposals. The meeting, which included more than 100 on-site participants (see Supplemental Table 1), and additional web participants, was organized around three working groups: (1) sequence technology, data harmonization and statistical inference (B Devlin and M Daly, Chairs); (2) samples and phenotypes (J Buxbaum and M Gill, Chairs); and (3) future directions (T Lehner and M State, Chairs). Working groups addressed a variety of issues including study designs, statistical approaches, sample availability and composition, data normalization, bioinformatics challenges, and the integration of gene discovery into broader efforts at translational neuroscience (Supplemental Table 2). The meeting was video cast and can be accessed at http://videocast.nih.gov/pastevents.asp. We present a synthesis and summary of that meeting, reflecting both a current view of the field and consensus recommendations for gene discovery.
In light of the high degree of genetic heterogeneity in ASD, it was apparent that HTS would provide a powerful platform for gene discovery. Whole genome sequencing (WGS) can detect structural variation of all types ranging from gross chromosomal rearrangements to CNV and insertion-deletions (indels) while also providing highly sensitive single base resolution. Similarly, whole exome sequencing (WES) can reliably detect single nucleotide variants (SNV) in the coding segments of the genome, many indels and some CNV. Of course, both technologies provide the ability to identify rare alleles to a degree that is not possible on genotyping platforms, given the reliance of the latter on assaying known variation.
To date, four large-scale ASD WES studies have been carried out both in trios, namely a proband with ASD and the biological parents, and in quads, a trio plus an unaffected sibling (Iossifov et al., 2012; Neale et al., 2012; O'Roak et al., 2011; O'Roak et al., 2012; Sanders et al., 2012). Among the 1000 families assessed by the four studies, the rate of de novo loss of function (LoF) variation was consistently found to be significantly higher in cases compared to controls, allowing for the development of rigorous statistical approaches to identifying specific risk genes. Indeed six ASD genes were identified, CHD8, DYRK1A, GRIN2B, KATNAL2, POGZ, and SCN2A, because they carried recurrent, highly-damaging de novo events. While SCN2A has been previously implicated in epilepsy, none of these genes were known to carry ASD risk. Another key finding, one that will prove useful for gene discovery, was that roughly half of all de novo LoF mutations seen in ASD probands fall in ASD genes, with about 12% of ASD subjects showing a de novo LoF mutation.
These WES studies found a background rate of missense de novo variation that is more than tenfold higher than that for LoF alleles. These missense changes should carry information regarding ASD genes. Still there is only a 5–10% excess of such mutations in ASD cohorts, compared to controls, and this difference does not reach significance. In addition, it is not yet possible to confidently assign risk to this broad category of mutation. Given the clear relevance of LoF alleles, this difficulty surely reflects the signal to noise problems engendered by neutral background variation and the difficulties that attend differentiating the subset of truly functional missense variations.
The interpretation of case-control exome sequencing has also not been as straightforward as family studies evaluating de novo LoF events. For example, WES of a sample of 1000 cases and 1000 controls and inspection of the six novel ASD genes just described showed, in hindsight, only a slight excess of LoF mutations in KATNAL2 and CHD8 in cases, a difference that did not approach statistical significance (Neale et al., 2012). Indeed across the entire genome no genes were found to harbor a sufficiently large excess of rare alleles in cases versus controls to support a significant association after accounting for multiple comparisons (Liu et al., personal communication). These results are consistent with the multiple lines of evidence supporting a large number of ASD risk genes scattered throughout the genome. Methods to extract signal from case-controls studies, alone and in combination with de novo data, are rapidly evolving. Still it seems reasonable to conclude that large studies, involving tens of thousands of subjects, will be necessary to identify risk loci using standard analyses of mutation burden in case-control samples.
The path forward is either WES or WGS in large cohorts. Because of its higher signal-to-noise ratio, discovery of de novo mutations, especially LoF mutations that cluster in the same gene among unrelated individuals, is an immediately productive approach to gene discovery and will be emphasized. As the number of trios or quads sequenced grows linearly, the rate of gene identification is predicted to accelerate (Figure 1). Based on the first results from the ASC sites, the value of expanding efforts in search of recurrent de novo events is clear. If HTS were to be performed on 8,000 families, and even ignoring other sources of key information, the experiment should yield between 40–60 novel ASD genes and a large number of additional genes falling just short of significance that could readily be confirmed via targeted sequencing in additional large patient cohorts (Figure 1).
Efforts of this scale are underway. To give some examples, the Simons Foundation has committed to sequencing more than 2,600 quartets; the AASC has finished 400 families; Genome Canada is supporting the sequencing of 1000 trios and families; and the UK10K project is targeting ~800 ASD cases in the 10,000 to be sequenced. Autism Speaks, in partnership with the Beijing Genomics Institute (BGI), is committed to whole genome sequencing of 60 families with an ultimate target of 2,000 families.
A key question is whether an even higher yield of ASD genes can be gleaned simply by making more effective use of data generated in ongoing experiments? In fact, it is a near certainty that there will be significant traction in evaluating other types of mutations beyond de novo LoF variants. Ongoing research promises to refine the interpretation of various classes of mutations, including inherited variation from family and case-control analyses, for which the chief obstacle is the high frequency of apparently neutral rare variation in the genome. In addition, there are already emerging successes focusing on recessive LoF variation. These efforts may be aided through the study of sequence data in unusual high-risk extended pedigrees that are also available. Thus, based on refined interpretation of sequence, we expect to identify additional ASD genes. Progress in this area will also require methods to combine data on inherited variation with data on de novo events.
The ASC recognizes that a focus on DNA sequence, by itself, would be inefficient. There are additional sources of information that can be brought to bear to identify novel ASD genes (Figure 2). RNA-seq and Chip-seq studies of typical and ASD brains offer an increasingly accurate picture of gene co-expression and regulatory networks, thereby identifying processes altered in ASD, both by themselves and by overlap with genes identified as disrupted in ASD. And RNA-seq studies of peripheral samples (blood or induced neural cells) have the potential to survey thousands of individuals to identify ASD-related biological signatures. Bearing in mind that one out of every two de novo LoF events found in probands hits an ASD gene, these kinds of biological information can be exploited to separate the ASD “wheat from the chaff”. Moreover, CNVs have already identified many regions of the genome as harboring one or more ASD genes, so there will be ways of combining CNV and sequence information to identify additional ASD genes. If other sources of information prove as useful as we anticipate, the yield of ASD genes could easily amplify well beyond that predicted by Figure 1, paving the way for systems biological and neurobiological follow up. In addition, understanding gene-environment interaction and gene-environment correlation remain an important long-term goal in ASD, and such approaches will be enormously facilitated by this gene discovery.
Beyond gene discovery, integration of information as depicted in Figure 2 holds the promise for clarifying the etiology and biology of ASD. Eventually we foresee identifying ASD-related biological signatures to define subgroups enriched for disruptions in specific pathways and, possibly, to identify subsets of patients amenable to specific treatments. For brain and blood samples it is also now possible to interrogate epigenetic modifications, mechanisms that are likely to play a substantial role in ASD. Other potentially uncharacterized risks include rare disruption in the mitochondrial genome and the nature of the microbiome. The microbiome, thought to contribute as much as 10% of the metabolites in the bloodstream, has recently been shown to affect behavior in model systems. If it is a mediator of ASD risk, it would be particularly amenable to intervention.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.