Autism spectrum disorders (ASDs) [Mendelian Inheritance in Man (MIM) 209850] are characterized by language impairments, social deficits, and repetitive behaviors. The onset of symptoms occurs by the age of 3 and usually requires extensive support for the lifetime of the afflicted. The prevalence of ASD is estimated to be 1 in 166 (1
), making it a major burden to society.
Genetics plays a major role in the etiology of autism. The concordance rates in monozygotic twins are 70% for autism and 90% for ASD, whereas the concordance rates in dizygotic twins are 5% and 10%, respectively. Previous studies suggest autism displays a high degree of genetic heterogeneity. Efforts to map disease genes using linkage analysis have found evidence for autism loci on 20 different chromosomes. Regions implicated by multiple studies include 1p, 5q, 7q, 15q, 16p, 17q, 19p, and Xq (2
). Moreover, microscopy studies have identified cytogenetic abnormalities in >5% of affected children, involving many different loci on all chromosomes (3
). In some rare syndromic forms of autism, such as Rett syndrome (4
) and tuberous sclerosis (5
), mutations in a single gene have been identified. Otherwise, neither linkage nor cytogenetics has unambiguously identified specific genes involved.
Genetic heterogeneity poses a considerable challenge to traditional approaches for gene mapping (6
). Some of these limitations are overcome by methods that rely on the direct detection of functional variants, which in most cases are de novo events. New array-based technologies can detect differences in DNA copy number at much higher resolution than cytogenetic methods (7
) and, hence, might reveal spontaneous mutations that were previously unidentified. These techniques have shown an abundance of copy number variants (CNVs) in humans (8
), and the same methods have been used to find de novo chromosome aberrations below the resolution of microscopy in children with mental retardation and dysmorphic features (10
), including patients with syndromic forms of autism (15
). Yet, the association of spontaneous CNVs in idiopathic autism has not been systematically investigated. Thus, a large-scale study of genome copy number variation in ASD was needed. We have performed high-resolution genomic microarray analysis on a sample of 264 families to determine the rate of de novo copy number mutation in unaffected and affected children.
Our study focused on a sample of 264 families, including 118 “simplex” families containing a single child with autism, 47 “multiplex” families with multiple affected siblings, and 99 control families with no diagnoses of autism. The majority of patients came from the Autism Genetic Resource Exchange (AGRE) and from the National Institute of Mental Health (NIMH) Center for Collaborative Genetic Studies of Mental Disorders. Additional families were obtained through the authors (T.C.G., J.S.S., J.B., and D.S). Efforts were made at all of the collecting sites to exclude cases of syndromic autism (i.e., those with severe mental retardation or other congenital anomalies) and to exclude known cytogenetic abnormalities. Identities of all subjects and their parents were coded so that analysis could be done blind to affected status while maintaining knowledge of the parent-child relations. We performed whole-genome scans on all parents, patients, and unaffected children. Affected or unaffected siblings of many patients were included in the study as independent cases or controls, respectively; thus, the entire sample yielded a complete parent-child “trio” for each of 195 patients and 196 healthy individuals. (See supplementary methods
and table S1
for more extensive details on the patient sample.)
We analyzed DNA samples, prepared from either whole blood or Epstein-Barr virus (EBV) immortalized B cells or both, collected from subjects and their biological parents. Genome scans were performed by ROMA, a form of comparative genomic hybridization described previously (8
). We performed two-color assays by cohybridizing each sample to an oligonucleotide array, and we used a standard reference genome, SKN1, for comparison. Assays were performed in duplicate with dye-swap. The array consisted of 85,000 probes, providing a mean resolution of one probe every 35 kb. Log intensity ratios from duplicate scans were averaged, and normalized ratio data were segmented by a Hidden Markov Model to define CNVs relative to reference (8
) (with minor modifications).
Detecting copy number variation from array data is an error-prone process, and so procedures were followed to ensure that events we detected in subjects were in fact de novo: not false-positive in the subject, and not false-negative in either biological parent. A flow chart of our procedure for finding and testing de novo mutations is depicted in . CNV regions detected in subjects were considered only if they involved at least three consecutive probes and had an overall likelihood measure >0.9 (8
). Then, CNVs were disregarded if they were 60% similar in probe content to a variant detected in the set of all parents, where similarity between two CNVs is defined as (the number of common probes)/(the total number of probes in either CNV). This step was done in order to simultaneously filter out any CNVs present in the biological parents and to eliminate common polymorphic loci that would incorrectly appear to be de novo. The latter can occur, for example, when the parents and the reference are all heterozygous for a deletion and 0 or 2 copies are transmitted to the child. These two procedures greatly reduce the total number of candidates requiring validation.
Procedure for the detection of de novo CNVs. The flow diagram describes the step-by-step procedure for identifying regions of altered copy number that are present in a child and not in the biological parents.
We then further examined each candidate variant by a more careful assessment of the parents for the presence of the CNV, using a relaxed set of criteria for its presence (see legend to ), to rule out false-negatives. If at that point the variant in the subject still appeared to be de novo, that is, present in the child but not in either parent, we tested parentage using multiple informative genetic markers. We then conducted additional validation of the suspect de novo lesion in parents and subjects, including Dpn II–ROMA using 390K arrays, CGH using Agilent 244K arrays, cytogenetics, and micro-satellite genotyping. When de novo mutations were detected in DNA derived from an EBV-immortalized cell line, we sought to repeat analysis on DNA derived from an independent blood sample and found confirmation in 11 out of 12 available cases.
Fig. 2 Detection and validation of a spontaneous deletion in a patient with Asperger syndrome. CNVs were detected in patient scans using the standard HMM. Parents were ascertained and determined to have no change in copy number using an algorithm with relaxed (more ...)
One example of the detection and confirmation of a de novo CNV is illustrated in . We detected a 1.1-Mb deletion of 20p13 in a child with the diagnosis of Asperger syndrome. This deletion involves ~27 genes, including the oxytocin gene OXT,
a particularly noteworthy candidate in light of studies in humans and rodents that find evidence for the role of oxytocin in regulating social cognition (17
). All validated de novo subject variants are listed in with a description of each type of mutation, its methods of validation, genomic location, gene content, and information on the subject’s affected status and family type (simplex, multiplex, or control). Additional details regarding these and other variants detected in this study are provided in table S2
. Initially, we detected 19 de novo CNVs in 17 individuals. In one family, subsequent analysis of the parental chromosomes by fluorescence in situ hybridization (FISH) determined that the two de novo events detected (a duplication and deletion) were the result of an unbalanced translocation inherited from an unaffected father who carries the balanced reciprocal translocation. In conclusion, 17 CNVs were confirmed to be de novo in 16 individuals (), consisting of 14 patients and 2 controls. The majority of these mutations are novel, and only the largest of them (all CNVs >4 Mb in size) have been reported previously in the literature (19
Table 1 Spontaneous CNVs detected by ROMA. A description of 17 de novo CNVs in 16 subjects is provided, along with the methods used for its validation. The number of unique RefSeq genes within each CNV region is indicated, and when the locus apparently encompasses (more ...)
These data show that spontaneous copy number changes are more frequent in patients with ASD (14 out of 195) than in unaffected individuals (2 out of 196), with an association that is statistically significant (P = 0.0005). The frequency of spontaneous mutation was 10% (12 out of 118) in our sample of sporadic cases and 3% (2 out of 77) in our sample of cases from multiplex families (). The frequency of spontaneous mutation in unaffected individuals was 1% (2 out of 196). Most mutations in persons with autism were deletions (12 out of 15); however, the two mutations detected in controls were both duplications.
Table 2 Increased frequency of de novo CNVs in autism. The numbers of de novo events are listed for our autism sample and for each category of family separately (simplex, multiplex, and nonautism control). The difference between cases and controls was examined, (more ...)
The strong association of de novo CNVs with ASD is consistent with such mutations being a primary cause in most cases rather than merely contributory. A further line of evidence to support this claim is the higher proportion of females among cases with de novo mutations, where the genders of patients consisted of 9 males and 5 females (1.8:1) compared with 163 males and 32 females (5:1) in our overall sample. This reduced gender ratio suggests that de novo CNVs that are detectable by our method have increased penetrance and, thus, contribute to disease more equally in females and males.
A lower rate of de novo mutation in multiplex families is also consistent with a causal role for the mutations reported in this study. An alternative hypothesis is that de novo CNVs are associated with autism indirectly, the consequence of a “fragile-genome disorder” in which many lesions in addition to the ones we detected occur due to an unknown environmental or heritable factor. We regard this alternative as unlikely; first, because we would expect evidence for such a disorder to be present equally in multiplex or simplex families. Another observation that is inconsistent with this alternative hypothesis is that we do not see patients with copy number mutations littered throughout the genome. Instead, de novo CNVs typically involve a single mutational event.
Two of the patients mentioned in have a formal diagnosis of Asperger syndrome, which suggests that spontaneous chromosomal imbalances are common across the whole spectrum of the disorder. We examined whether there were many cases of mental retardation [defined as having a nonverbal intelligence quotient (IQ) less than 70] among patients in whom we detected de novo mutations. Clinical data on 60 patients were obtained, and these data included five of the patients in . The average nonverbal IQ of five cases was 85 and the minimum was 70. Although this average was lower than the average for all patients (100), these data indicate that the de novo mutations identified in our study were not found generally in patients with mental retardation.
Because the difference in rate between autism and control is so marked, we can make a fair presumption that many of the lesions we observed contribute to the disorder. However, the observation of a de novo mutation in a single family is not sufficient evidence to prove that a mutation is causal, nor does it provide unequivocal evidence for the involvement of a specific gene in autism. When an individual gene candidate can be identified, because the mutation affects a single gene or a small number of functional candidates, a straightforward path to validation can be planned, involving sequencing and higher-resolution CNV analysis in additional samples. The principle is illustrated in the recent study by Durand et al
. where an intensive survey of variation in a candidate gene, SHANK3
, revealed multiple additional variants, including de novo and inherited mutations (22
is one of the genes within the 4.3-MB deletion on chromosome 22q13 that we reported in , and this region is also a site of recurrent deletions in autism (20
). Thus, an aggregate of deletions and coding variants that occurs in patients and not in controls can provide further evidence of a gene’s role in disease once that candidate gene is identified by copy number mutation.
Some of the genes contained within the de novo CNVs we identified are good candidates for further study. A list of all RefSeq genes that overlap with the de novo mutations identified in this study is shown in table S3
. Five of the de novo events we detected involved only a single gene and are worthy of mention. A spontaneous deletion was identified involving exons 2–8 of the putative sterol desaturase FLJ16237
. Little is known about the function of FLJ16237
, but its expression has been detected by in situ hybridization in the superior temporal gyrus of fetal brain (D.H.G.). In a pair of monozygotic twins concordant for autism, we detected a spontaneous deletion of exon 1 of the putative sodium bicarbonate cotransporter SLC4A10
. Mutations of the related gene, SLC4A4
, are associated with renal tubular acidosis and mental retardation (23
). Other single-gene mutations were detected affecting ataxin 2–binding protein 1 (A2BP1
) and the fragile histidine triad gene (FHIT
) (). A2BP1 is known to interact with the SCA2, the gene for spinocerebellar ataxia type 2, and A2BP1
mutations have been identified in mental retardation and epilepsy (24
). We observed two independent spontaneous mutations of FHIT
, a locus that is one of the most fragile sites in the human genome (25
), but one of these was not detected in an extract of the original blood from which the cell line was derived.
All five single-gene mutations we detect involve unusually large genes, the smallest of which (SLC4A10
) spans 359 kb of the genome. All four target genes rank among the top 3% of human genes by length. This is consistent with previous observations that large genes are frequently located within unstable regions of the genome (26
). Our simulations, made by randomly permuting the location of our CNV regions, indicate that this result may simply reflect that large genes, by virtue of their size alone, are more likely to be affected by random rearrangements. Whatever the explanation, large genes do play prominent roles in spontaneous genetic disorders in humans, such as Duchenne muscular dystrophy (27
), retinoblastoma (28
), and neurofibromatosis (29
); and the same could be true for autism.
These studies do not address the mechanisms by which structural mutations of genes contribute to autism. Changes in dosage or structure of genes within a lesion could have quantitative effects on gene function, including haploinsufficiency or altered transcription patterns. Alternatively, hemizygous deletion could result in total loss of function if it is compounded by recessive mutation or monoallelic exclusion of the remaining allele. A genomic rearrangement may also disrupt regulatory elements that influence the expression of neighboring genes; thus, in some cases, a gene related to autism may lie adjacent to the deletion or duplication.
Our findings have implications for an understanding of the genetic basis for ASDs. An important feature of the de novo CNVs we report is that each is individually rare in the population of patients. None of the genomic variants we detected were observed more than twice in our sample, and most were seen but once. Although our sample size is small, these results suggest that lesions at many different loci can contribute to autism, a result consistent with the findings from cytogenetics, as well as consistent with the failure to find common heritable variants with a major effect on disease risk. Lack of recurrence may in fact reflect an underlying reality that autistic behavior can result from many different genetic defects. This would be consistent with the hypothesis that the common features of autism such as failure to develop social skills and repetitive and obsessive behavior may in fact be the consequence of a reaction to many different cognitive impairments, drawing their “commonality” from a normal but maladaptive programmed response of humans early in development to those diverse impairments.
We do not know the full contribution of spontaneous mutation to autism. Population studies divide autism into sporadic and familial or “multiplex.” Our work provides clear evidence that these two classes are indeed genetically distinct. The rate of de novo mutation in multiplex families was significantly lower than for sporadic cases (, P = 0.04), as would be expected if there were two different genetic mechanisms contributing to risk: spontaneous mutation and inheritance, with the latter being more frequent in families that have multiple affected children.
The rate of spontaneous mutation that we detect in autism is an underestimate. Adding the known rate of cytogenetically visible abnormalities, the total frequency of de novo variation detectable in sporadic cases is ~15% at our current resolution. Because of the limited resolution of genome microarray scans, we expect that we fail to detect the vast majority of CNVs. Much smaller deletions or even point mutations can produce the same consequences as the larger, more easily detectable events. As technology for discovering spontaneous germline mutation in children improves, the proportion of autism cases with detectable events is bound to rise.
We can incorporate a high rate of spontaneous mutation in a genetic model that accounts for both sporadic and familial forms of the disease, based on new mutations that cause autism by haploinsufficiency but have incomplete penetrance, especially in females. Such individuals who escape the phenotypic consequences can then pass on the mutation in an apparently dominant fashion to their children. This model makes very clear predictions that can be tested in the short term.
Our findings highlight how methods for directly detecting CNVs genomewide provide a powerful alternative to traditional gene-mapping approaches for discovering genetic risk factors in autism and in other disorders of complex etiology. Improved technologies for mutation detection, such as high-throughput DNA sequencing and tiling-resolution oligonucleotide arrays, promise to improve our power to identify new mutations associated with disease.