ASD are characterized by pervasive impairment in language and communication, social reciprocity, and having restricted interests or stereotyped behaviors
1. Several new candidate loci for ASD have recently been identified using genome-wide approaches that discover individually rare events of major effect
2. A number of genetic syndromes with features of the ASD phenotype, collectively referred to as syndromic autism, have also been described
4. Despite this progress, the genetic basis for the vast majority of ASD cases remains unknown. Several observations support the hypothesis that the genetic basis for ASD in sporadic cases may differ from that of families with multiple affected individuals, with the former more likely to result from
de novo mutation events rather than inherited variants
1,5–7. In this study, we sequenced the protein-coding regions of the genome (the exome)
8 to test the hypothesis that
de novo protein-altering mutations substantially contribute to the genetic basis of sporadic ASD. In contrast with array-based analysis of large
de novo copy number variants (CNVs), this approach has greater potential to implicate single genes in ASD.
We selected 20 trios with idiopathic ASD, each consistent with sporadic ASD based on clinical evaluations (
Supplementary Table 1), pedigree structure, familial phenotypic evaluation, family history, and/or elevated parental age. Each family was initially screened by array comparative genomic hybridization (CGH) using a customized microarray
9. We identified no large (>250 kbp)
de novo CNVs but did identify a maternally inherited deletion (~350 kbp) at 15q11.2 in one family (
Supplementary Fig. 1). This deletion has been associated with increased risk for epilepsy
10 and schizophrenia
11,12 but has not been considered as causal for autism.
Similar to Vissers and colleagues
13, who reported exome sequencing on 10 parent-child trios with sporadic cases of moderate to severe intellectual disability (ID), we performed exome sequencing on each of the 60 individuals separately, by subjecting whole-blood derived genomic DNA to in-solution hybrid capture and Illumina sequencing (
Methods). We obtained sufficient coverage to call variants for ~90% of the primary target (26.4 Mb) (). Genotype concordance with SNP microarray data was high (99.7%) (
Supplementary Table 2) and on average 96% of proband variant sites were also called in both parents (
Supplementary Table 3). Given the expected rarity of true
de novo events in the targeted exome (<1/trio) (
Supplementary Table 4)
14, we reasoned that most apparently
de novo variants would result from undercalling in parents or systematic false positive calls in the proband. We therefore filtered variants previously observed in dbSNP, 1000 Genomes Pilot Project data
15, and 1490 other exomes sequenced at the University of Washington (
Supplementary Fig. 2). We performed Sanger sequencing on the remaining
de novo candidates (<5/trio), validating 18 events within coding sequence and three additional events mapping to 3′ untranslated regions (). A list of predicted variant sites within these genes from the 1000 Genomes Pilot Project data
15 is provided for comparison (
Supplementary Table 5).
| Table 1Summary of the exome sequencing results from of 20 sporadic ASD probands |
| Table 2Summary of confirmed de novo mutation events |
We observed subtle differences with respect to mutation rate and characteristics when compared to Vissers and colleagues
13 (
Supplementary Note). The overall protein-coding
de novo rate (0.9 events/trio) was slightly higher than expected
14 (0.59 events/trio), suggesting that we are identifying the majority of
de novo events in these trios (
Supplementary Table 4). The transition to transversion ratio was highly skewed (18:2), with eight transitions mapping to hypermutable CpG dinucleotides
14. The proportion of synonymous events was higher than expected based on a neutral model and may reflect selection against embryonic lethal nonsynonymous variants. We successfully determined the parent of origin for seven events, six of which occurred on the paternal haplotype (). Notably, the eight probands with two or more validated
de novo events corresponded to families with higher parental age (Mann–Whitney U, Combined Age, One-Sided P<0.004).
Eleven of the 18 coding
de novo events are predicted to alter protein function. Each of these mutations occurred at a different gene, precluding a statistical assessment for any specific locus despite their deleterious nature (e.g. PolyPhen-2
16). We assessed whether proband
de novo mutations were enriched in the aggregate for disruptive events by considering two independent quantitative measures: the nature of the amino-acid replacement (Grantham matrix score
17) and the degree of nucleotide-level evolutionary conservation (Genomic Evolutionary Rate Profiling (GERP)
18,19) (). For comparison, we sequenced 20 exomes from unrelated ethnically matched controls (HapMap) and applied the same filters to identify coding-sequence mutations that were common or private to each of the samples. These control DNA were isolated from immortalized lymphoblasts; however, the counts of private variants in the cases and controls were highly similar suggesting that suggesting that the contribution of novel somatic events is likely minimal (
Supplementary Fig. 3).
We determined by simulation the expected mean GERP and Grantham distributions for 10 randomly selected common or private control single nucleotide variants (SNVs) (Methods). When we compared the observed means of the 10 de novo protein-altering ASD proband variants to the distribution of common control SNVs (), they corresponded to more highly conserved (GERP: p<0.001) and disruptive amino acid mutations (Grantham: p=0.015). If we limited the analysis to the private control SNVs, which serve as a proxy for evolutionarily young mutation events (), we again found the de novo events were at the right tail of these distributions. Only the mean GERP score, however, remained significant (GERP: p=0.02, Grantham: p=0.115). In total, these results suggest that these de novo mutation sites are subjected to stronger selection and likely to have functional impact.
We identified a subset of trios (4/20) with disruptive
de novo mutations that are potentially causative, including genes previously associated with autism, ID, and epilepsy ( and
Supplementary Note). We examined the available clinical data for each of these four families and found they were among the most severely affected individuals in our study based on intelligence quotient (IQ) measures and on calibrated severity score
20 (CSS), which is largely independent from IQ and focuses specifically on autistic features with a score of 10 being most severe (). For example, in proband 12681 we identified a single-base substitution (IVS9-2A>G, CCDS8662.1) at the canonical 3′ splice site of exon 10 in
Glutamate receptor, ionotropic, N-methyl D-aspartate 2B (
GRIN2B) (
Supplementary Fig. 4a,b). She is severely affected (CSS 9), with evidence of early onset, possible regression, and comorbid for mild ID. Expression and association studies have suggested that glutamatergic neurotransmission may play a role in ASD
4. Recently, Endele and colleagues
21 described
GRIN2A and
GRIN2B as sites of recurrent
de novo mutations in individuals with mild to moderate ID and/or epilepsy suggesting variable expressivity. Our data suggest that
de novo mutations in
GRIN2B may also lead to an ASD presentation.
Proband 12499 has a missense variant (p.P1894L, CCDS33316.1) predicted to be functionally deleterious and at a highly conserved position in
Sodium channel, voltage-gated, type I, alpha subunit (
SCN1A) (
Supplementary Fig. 4c). He is severely affected (CSS 8) with evidence of early onset, possible regression, language delay, a diagnosis of epilepsy and mild ID.
SCN1A was previously associated with epilepsy and suggested as an ASD candidate
22,23, although limited screening has been conducted in idiopathic ASD. Hundreds of disease-associated mutations have been described in epilepsy and typically patients with
de novo events show more severe phenotypes
24. The proband also carries the maternally inherited 15q11.2 deletion increasing the risk for epilepsy
10.
Proband 11666 has a missense variant (p.D399G, CCDS6938.1) predicted to be functionally deleterious and at a highly conserved position within the second laminin-type epidermal growth factor-like domain of
Laminin, gamma 3 (
LAMC3) (
Supplementary Fig. 4d). He is severely affected (CSS 10) with evidence of early onset and moderate ID.
LAMC3 is not known to be involved in neuronal development; however, human microarray data have shown expression in many areas of the cortex and limbic system
25. Additional study is warranted since laminins have structural similarities to the neurexin and contactin-associated families of proteins, both of which have been associated with ASD
2.
The fourth example of a potentially causative mutation is a single-base insertion in
Forkhead box P1 (
FOXP1), introducing a frameshift and premature stop codon (p.A339SfsX4, CCDS2914.1) in proband 12817 (). He is severely affected (CSS 8) with evidence for regression, language delay, and comorbidity for moderate ID and nonfebrile seizures. Recently, rare occurrences of large
de novo deletions and a nonsense variant disrupting
FOXP1 were reported in individuals with mild to moderate ID and language defects, with or without ASD features
26,27.
FOXP1 encodes a member of the forkhead-box family of transcription factors and is closely related to
FOXP2, a gene implicated in rare monogenic forms of speech and language disorder
28–31. Functional evidence of heterodimer formation and overlapping neural expression patterns suggests that FOXP1 and FOXP2 can co-regulate gene expression in the brain
32,33. We assessed relative levels of the mutant transcript in proband derived lymphoblasts finding strong evidence for nonsense-mediated decay (NMD) (
Supplementary Fig. 5a). HEK293T cell-based functional assays further demonstrated that, if translated, the protein would be truncated and mislocalized from the nucleus to the cytoplasm—similar to results obtained with FOXP2 mutations
31 (
Supplementary Fig. 5b,c).
Remarkably, in addition to the
FOXP1 mutation, proband 12817 also carried an inherited missense variant (p.H275A, CCDS5889.1) in
Contactin associated protein-like 2 (
CNTNAP2) predicted to be functionally deleterious and at a highly conserved position. This variant is likely to be extremely rare or private as it was not observed in 942 previously sequenced controls
34 or in 1490 other exomes.
CNTNAP2 is directly downregulated by FOXP2
35 and has been independently associated with ASD and specific language impairment
34–37. In HEK293T cells, we found that wild-type FOXP1 significantly reduced expression of
CNTNAP2 (p=0.0005), while the truncated protein was associated with a three-fold expression increase (p=0.0056) (
Supplementary Note, Fig. 5d). Overall, we hypothesize that FOXP1 haploinsufficiency (due to NMD), combined with dysfunction of FOXP1 mutant proteins that escape this process, may yield overexpression of CNTNAP2 proteins, amplifying any deleterious effects of p.H275A in the proband.
Among the ~110 (85 SNVs, 25 indels) novel inherited protein-altering variants in each proband, we identified several rare inherited variants in genes overlapping the SFARI Gene
38, a curated database of potential ASD candidate loci, but no excessive burden in cases relative to controls (
Supplementary Table 6). While the numbers from our pilot study are few, we do observe two cases with a significant
de novo event and a potential inherited risk variant (12817p1:
FOXP1/
CNTNAP2 and 12499.p1:
SCN1A/15q11.2 deletion) highlighting that in some sporadic families a multihit model may be playing a role
3 (
Supplementary Table 7). In the future, this hypothesis could be further explored by comparing burden in a much larger number of affected/unaffected sibling pairs.
The probands with the four potentially causative
de novo events met strict criteria for a diagnosis of autistic disorder (
Supplementary Note). Our finding of
de novo events in genes that have also been disrupted in children with ID without ASD, ID with ASD features, and epilepsy provides further evidence that these genetic pathways may lead to a spectrum of neurodevelopmental outcomes depending on the genetic and environmental context
2,4. Recent data suggest that CNVs may also blur these lines with diverse conditions all showing association to the same loci
2,4. Distinguishing primary from secondary effects will require a better understanding of the underlying biology and identification of interacting genetic and environmental factors within the phenotypic context of the family. The identification of
de novo events along with disruptive inherited mutations underlying “sporadic” ASD has the potential to fundamentally transform our understanding of the genetic basis of ASD.