The next-generation sequencing revolution has substantially increased our understanding of the mutated genes that underlie complex neurodevelopmental disease. Exome sequencing has enabled us to estimate the number of genes involved in the etiology of neurodevelopmental disease, whereas targeted sequencing approaches have provided the means for quick and cost-effective sequencing of thousands of patient samples to assess the significance of individual genes. By leveraging such technologies and clinical exome sequencing, a genotype-first approach has emerged in which patients with a common genotype are first identified and then clinically reassessed as a group. This approach has proven a powerful methodology for refining disease subtypes. We propose that the molecular characterization of these genetic subtypes has important implications for diagnostics and also for future drug development. Classifying patients into subgroups with a common genetic etiology and applying treatments tailored to the specific molecular defect they carry is likely to improve management of neurodevelopmental disease in the future.
To assess the relative impact of inherited and de novo variants on autism risk, we generated a comprehensive set of exonic single nucleotide variants (SNVs) and copy number variants (CNVs) from 2,377 autism families. We find that private, inherited truncating SNVs in conserved genes are enriched in probands (odds ratio=1.14, p=0.0002) compared to unaffected siblings, an effect with significant maternal transmission bias to sons. We also observe a bias for inherited CNVs, specifically for small (<100 kbp), maternally inherited events (p=0.01) that are enriched in CHD8 target genes (p=7.4×10−3). Using a logistic regression model, we show that private truncating SNVs and rare, inherited CNVs are statistically independent autism risk factors, with odds ratios of 1.11 (p=0.0002) and 1.23 (p=0.01), respectively. This analysis identifies a second class of candidate genes (e.g., RIMS1, CUL7, and LZTR1) where transmitted mutations may create a sensitized background but are unlikely to be completely penetrant.
The human genome is arguably the most complete mammalian reference assembly1–3 yet more than 160 euchromatic gaps remain4–6 and aspects of its structural variation remain poorly understood ten years after its completion7–9. In order to identify missing sequence and genetic variation, we sequenced and analyzed a haploid human genome (CHM1) using single-molecule, real-time (SMRT) DNA sequencing10. We closed or extended 55% of the remaining interstitial gaps in the human GRCh37 reference genome—78% of which carried long runs of degenerate short tandem repeats (STRs) often multiple kilobases in length embedded within GC-rich genomic regions. We resolved the complete sequence of 26,079 euchromatic structural variants at the basepair level, including inversions, complex insertions, and long tracts of tandem repeats. Most have not been previously reported with the greatest increases in sensitivity occurring for events less than 5 kbp in size. Compared to the human reference, we find a significant insertional bias (3:1) in regions corresponding to complex insertions and long STRs. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology.
human genome sequence; sequence gaps; genome assembly; PacBio; single-molecule real-time sequencing; structural variation
Autism spectrum disorder (ASD) is a heterogeneous disease where efforts to define subtypes behaviorally have met with limited success. Hypothesizing that genetically based subtype identification may prove more productive, we resequenced the ASD-associated gene CHD8 in 3,730 children with developmental delay or ASD. We identified a total of 15 independent mutations; no truncating events were identified in 8,792 controls, including 2,289 unaffected siblings. In addition to a high likelihood of an ASD diagnosis among patients bearing CHD8 mutations, characteristics enriched in this group included macrocephaly, distinct faces, and gastrointestinal complaints. chd8 disruption in zebrafish recapitulates features of the human phenotype, including increased head size as a result of expansion of the forebrain/midbrain and impairment of gastrointestinal motility due to a reduction in post-mitotic enteric neurons. Our findings indicate that CHD8 disruptions define a distinct ASD subtype and reveal unexpected comorbidities between brain development and enteric innervation.
Autism spectrum disorder; autism subtypes; dysmorphology; macrocephaly; gastrointestinal defect; zebrafish modeling; enteric neurons; forebrain/midbrain expansion
Advances in genome sequencing technologies have begun to revolutionize neurogenetics allowing the full spectrum of genetic variation to be better understood in relationship to disease. Exome sequencing of hundreds to thousands of samples from patients with autism spectrum disorder, intellectual disability, epilepsy, and schizophrenia provide strong evidence of the importance of de novo and gene-disruptive events. There are now several hundred new candidate genes and targeted resequencing technologies that allow screening of dozens of genes in tens of thousands of individuals with high specificity and sensitivity. The decision of which genes to pursue depends on numerous factors including recurrence, prior evidence of overlap with pathogenic copy number variants, the position of the mutation within the protein, the mutational burden among healthy individuals, and membership of the candidate gene within disease-implicated protein networks. We discuss these emerging criteria for gene prioritization and the potential impact on the field of neuroscience.
Recurrent deletions of chromosome 15q13.3 associate with intellectual disability, schizophrenia, autism and epilepsy. To gain insight into its instability, we sequenced the region in patients, normal individuals and nonhuman primates. We discovered five structural configurations of the human chromosome 15q13.3 region ranging in size from 2 to 3 Mbp. These configurations arose recently (~0.5–0.9 million years ago) as a result of human-specific expansions of segmental duplications and two independent inversion events. All inversion breakpoints map near GOLGA8 core duplicons—a ~14 kbp primate-specific chromosome 15 repeat that became organized into larger palindromic structures. GOLGA8-flanked palindromes also demarcate the breakpoints of recurrent 15q13.3 microdeletions, the expansion of chromosome 15 segmental duplications in the human lineage, and independent structural changes in apes. The significant clustering (p=0.002) of breakpoints provides mechanistic evidence for the role of this core duplicon and its palindromic architecture in promoting evolutionary and disease-related instability of chromosome 15.
Copy number variants (CNVs) are associated with many neurocognitive disorders; however, these events are typically large and the underlying causative gene is unclear. We created an expanded CNV morbidity map from 29,085 children with developmental delay versus 19,584 healthy controls, identifying 70 significant CNVs. We resequenced 26 candidate genes in 4,716 additional cases with developmental delay or autism and 2,193 controls. An integrated analysis of CNV and single-nucleotide variant (SNV) data pinpointed ten genes enriched for putative loss of function. Patient follow-up on a subset identified new clinical subtypes of pediatric disease and the genes responsible for disease-associated CNVs. This includes haploinsufficiency of SETBP1 associated with intellectual disability and loss of expressive language and truncations of ZMYND11 in patients with autism, aggression and complex neuropsychiatric features. This combined CNV and SNV approach facilitates the rapid discovery of new syndromes and neuropsychiatric disease genes despite extensive genetic heterogeneity.
Infantile spasms (IS) and Lennox–Gastaut syndrome (LGS) are epileptic encephalopathies characterized by early onset, intractable seizures, and poor developmental outcomes. De novo sequence mutations and copy number variants (CNVs) are causative in a subset of cases. We used exome sequence data in 349 trios with IS or LGS to identify putative de novo CNVs. We confirm 18 de novo CNVs in 17 patients (4.8%), 10 of which are likely pathogenic, giving a firm genetic diagnosis for 2.9% of patients. Confirmation of exome‐predicted CNVs by array‐based methods is still required due to false‐positive rates of prediction algorithms. Our exome‐based results are consistent with recent array‐based studies in similar cohorts and highlight novel candidate genes for IS and LGS. Ann Neurol 2015;78:323–328
Epidemiological data have suggested maternal infection and fever to be associated with increased risk of ASD. Animal studies show that gestational infections perturb fetal brain development and result in offspring with the core features of autism and have demonstrated that behavioral effects of maternal immune activation (MIA) are dependent on genetic susceptibility. The goal of this study was to explore the impact of ASD-associated CNVs and prenatal maternal infection on clinical severity of ASD within a dataset of prenatal history and complete genetic and phenotypic findings.
We analyzed data from the Simons Simplex Collection sample including 1971 children with a diagnosis of ASD aged 4 to 18 years who underwent array CGH screening. Information on infection and febrile episodes during pregnancy was collected through parent interview. ASD severity was clinically measured through parent-report interview and questionnaires.
We found significant interactive effects between presence of CNVs and maternal infection during pregnancy on autistic symptomatology, such that individuals with CNVs and history of maternal infection demonstrated increased rates of social communicative impairments and repetitive/restricted behaviors. In contrast, no significant interactions were found between presence of CNVs and prenatal infections on cognitive and adaptive functioning of individuals with ASD.
Our findings support a gene-environment interaction model of autism impairment, in that individuals with ASD-associated CNVs are more susceptible to the effects of maternal infection and febrile episodes in pregnancy on behavioral outcomes, and suggest that these effects are specific to ASD rather than to global neurodevelopment.
autism; autism spectrum disorders; fever; gene-environment; infection; pregnancy
Mountain gorillas are an endangered great ape subspecies and a prominent focus for conservation, yet we know little about their genomic diversity and evolutionary past. We sequenced whole genomes from multiple wild individuals and compared the genomes of all four Gorilla subspecies. We found that the two eastern subspecies have experienced a prolonged population decline over the past 100,000 years, resulting in very low genetic diversity and an increased overall burden of deleterious variation. A further recent decline in the mountain gorilla population has led to extensive inbreeding, such that individuals are typically homozygous at 34% of their sequence, leading to the purging of severely deleterious recessive mutations from the population. We discuss the causes of their decline and the consequences for their future survival.
To study the evolutionary dynamics of regulatory DNA, we mapped >1.3 million DNase I hypersensitive sites (DHSs) in 45 mouse cell and tissue types, and systematically compared these with human DHS maps from orthologous compartments. The mouse and human genomes have undergone extensive cis-regulatory rewiring that combines branch-specific evolutionary innovation and loss with widespread repurposing of conserved DHSs to alternative cell fates mediated by turnover of transcription factor (TF) recognition elements. Despite pervasive evolutionary remodeling of the location and content of individual cis-regulatory regions, within orthologous mouse and human cell types the global fraction of regulatory DNA bases encoding recognition sites for each TF has been strictly conserved. Our findings provide new insights into the evolutionary forces shaping mammalian regulatory DNA landscapes.
Despite detailed clinical definition and refinement of neurodevelopmental disorders and neuropsychiatric conditions, the underlying genetic etiology has proved elusive. Recent genetic studies have revealed some common themes: considerable locus heterogeneity, variable expressivity for the same mutation, and a role for multiple disruptive events in the same individual affecting genes in common pathways. Recurrent copy number variation (CNV), in particular, has emphasized the importance of either de novo or essentially private mutations creating imbalances for multiple genes. CNVs have foreshadowed a model where the distinction between milder neuropsychiatric conditions from those of severe developmental impairment may be a consequence of increased mutational burden affecting more genes.
copy number variants; variable penetrance; genomic disorders; autism; schizophrenia; intellectual disability
Here we report inherited dysregulation of protein phosphatase activity as a cause of intellectual disability (ID). De novo missense mutations in 2 subunits of serine/threonine (Ser/Thr) protein phosphatase 2A (PP2A) were identified in 16 individuals with mild to severe ID, long-lasting hypotonia, epileptic susceptibility, frontal bossing, mild hypertelorism, and downslanting palpebral fissures. PP2A comprises catalytic (C), scaffolding (A), and regulatory (B) subunits that determine subcellular anchoring, substrate specificity, and physiological function. Ten patients had mutations within a highly conserved acidic loop of the PPP2R5D-encoded B56δ regulatory subunit, with the same E198K mutation present in 6 individuals. Five patients had mutations in the PPP2R1A-encoded scaffolding Aα subunit, with the same R182W mutation in 3 individuals. Some Aα cases presented with large ventricles, causing macrocephaly and hydrocephalus suspicion, and all cases exhibited partial or complete corpus callosum agenesis. Functional evaluation revealed that mutant A and B subunits were stable and uncoupled from phosphatase activity. Mutant B56δ was A and C binding–deficient, while mutant Aα subunits bound B56δ well but were unable to bind C or bound a catalytically impaired C, suggesting a dominant-negative effect where mutant subunits hinder dephosphorylation of B56δ-anchored substrates. Moreover, mutant subunit overexpression resulted in hyperphosphorylation of GSK3β, a B56δ-regulated substrate. This effect was in line with clinical observations, supporting a correlation between the ID degree and biochemical disturbance.
Investigations of noninvasive prenatal screening for aneuploidy by analysis of circulating cell-free DNA (cfDNA) have shown high sensitivity and specificity in both high-risk and low-risk cohorts. However, the overall low incidence of aneuploidy limits the positive predictive value of these tests. Currently, the causes of false positive results are poorly understood. We investigated four pregnancies with discordant prenatal test results and found in two cases that maternal duplications on chromosome 18 were the likely cause of the discordant results. Modeling based on population-level copy-number variation supports the possibility that some false positive results of noninvasive prenatal screening may be attributable to large maternal copy-number variants. (Funded by the National Institutes of Health and others.)
All genetic variation arises via new mutations, and therefore determining the rate and biases for different classes of mutation is essential for understanding the genetics of human disease and evolution. Decades of mutation rate analyses have focused on a relatively small number of loci because of technical limitations. However, advances in sequencing technology have allowed for empirical assessments of genome-wide rates of mutation. Recent studies have shown that 76% of new mutations originate in the paternal lineage and provide unequivocal evidence for an increase in mutation with paternal age. Although most analyses have been focused on single nucleotide variants (SNVs), studies have begun to provide insight into the mutation rate for other classes of variation, including copy number variants (CNVs), microsatellites, and mobile element insertions. Here, we review the genome-wide analyses for the mutation rate of several types of variants and suggest areas for future research.
germline mutation rate; de novo mutation; paternal bias; paternal age; genome-wide
In order to explore the diversity and selective signatures of duplication and deletion human copy number variants (CNVs), we sequenced 236 individuals from 125 distinct human populations. We observed that duplications exhibit fundamentally different population genetic and selective signatures than deletions and are more likely to be stratified between human populations. Through reconstruction of the ancestral human genome, we identify megabases of DNA lost in different human lineages and pinpoint large duplications that introgressed from the extinct Denisova lineage now found at high frequency exclusively in Oceanic populations. We find that the proportion of CNV base pairs to single nucleotide variant base pairs is greater among non-Africans than it is among African populations, but we conclude that this difference is likely due to unique aspects of non-African population history as opposed to differences in CNV load.
Germline variation at immunoglobulin (IG) loci is critical for pathogen-mediated immunity, but establishing complete haplotype sequences in these regions has been problematic because of complex sequence architecture and diploid source DNA. We sequenced BAC clones from the effectively haploid human hydatidiform mole cell line, CHM1htert, across the light chain IG loci, kappa (IGK) and lambda (IGL), creating single haplotype representations of these regions. The IGL haplotype generated here is 1.25 Mb of contiguous sequence, including four novel V alleles and one novel C allele and an 11.9 kb insertion. The CH17 IGK haplotype consists of two 644 kb proximal and 466 kb distal contigs separated by a large gap of unknown size; these assemblies added 49 kb of unique sequence extending into this gap. Our analysis also resulted in the characterization of seven novel IGKV alleles and a 16.7 kb region exhibiting signatures of interlocus sequence exchange between distal and proximal IGKV gene clusters. Genetic diversity in IGK/IGL was compared to that of the IG heavy chain (IGH) locus within the same haploid genome, revealing 3-fold (IGK) and 6-fold (IGL) higher diversity in the IGH locus, potentially associated with increased levels of segmental duplication and the telomeric location of IGH.
Human Immunoglobulin; Antibody variable gene; Genetic polymorphism; IGH; IGK; IGL; Hydatidiform Mole
de novo SNV mutation; autozygosity; mutation rate
The genetic basis of neurodevelopmental and neuropsychiatric diseases has been advanced by the discovery of large and recurrent copy number variants significantly enriched in cases when compared to controls. The pattern of this variation strongly implies that rare variants contribute significantly to neurological disease; that different genes will be responsible for similar diseases in different families; and that the same “primary” genetic lesions can result in a different disease outcome depending potentially on the genetic background. Next-generation sequencing technologies are beginning to broaden the spectrum of disease-causing variation and provide specificity by pinpointing both genes and pathways for future diagnostics and therapeutics.
The most common recurrent copy number variants associated with autism, developmental delay, and epilepsy are flanked by segmental duplications. Complete genetic characterization of these events is challenging because their breakpoints often occur within high-identity, copy number polymorphic paralogous sequences that cannot be specifically assayed using hybridization-based methods. Here, we provide a protocol for breakpoint resolution with sequence-level precision. Massively parallel sequencing is performed on libraries generated from haplotype-resolved chromosomes, genomic DNA, or molecular inversion probe–captured breakpoint-informative regions harboring paralog-distinguishing variants. Quantifying sequencing depth over informative sites enables breakpoint localization, typically within several kilobases to tens of kilobases. Depending on the approach employed, the sequencing platform, and the accuracy and completeness of the reference genome sequence, this protocol takes from a few days to several months to complete. Once established for a specific genomic disorder, it is possible to process thousands of DNA samples within as little as 3–4 weeks.
breakpoint; segmental duplication; paralog; nonallelic homologous recombination; NAHR; whole-genome sequencing; WGS; molecular inversion probe; MIP; genomic disorder; sequencing; recurrent deletion; recurrent duplication; copy number variant; CNV
As the premier model organism in biomedical research, the laboratory mouse shares the majority of protein-coding genes with humans, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications, and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of other sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.
We sequenced exomes from more than 2,500 simplex families each having a child with an autistic spectrum disorder (ASD). By comparing affected to unaffected siblings, we estimate that 13% of de novo (DN) missense mutations and 42% of DN likely gene-disrupting (LGD) mutations contribute to 12% and 9% of diagnoses, respectively. Including copy number variants, coding DN mutations contribute to about 30% of all simplex and 45% of female diagnoses. Virtually all LGD mutations occur opposite wild-type alleles. LGD targets in affected females significantly overlap the targets in males of lower IQ, but neither overlaps significantly with targets in males of higher IQ. We estimate that LGD mutation in about 400 genes can contribute to the joint class of affected females and males of lower IQ, with an overlapping and similar number of genes vulnerable to causative missense mutation. LGD targets in the joint class overlap with published targets for intellectual disability and schizophrenia, and are enriched for chromatin modifiers, FMRP-associated genes and embryonically expressed genes. Virtually all significance for the latter comes from affected females.
Gene duplication is an important source of phenotypic change and adaptive evolution. We use a novel genomic approach to identify highly identical sequence missing from the reference genome, confirming the cortical development gene Slit-Robo Rho GTPase activating protein 2 (SRGAP2) duplicated three times in humans. We show that the promoter and first nine exons of SRGAP2 duplicated from 1q32.1 (SRGAP2A) to 1q21.1 (SRGAP2B) ~3.4 million years ago (mya). Two larger duplications later copied SRGAP2B to chromosome 1p12 (SRGAP2C) and to proximal 1q21.1 (SRGAP2D), ~2.4 and ~1 mya, respectively. Sequence and expression analysis shows SRGAP2C is the most likely duplicate to encode a functional protein and among the most fixed human-specific duplicate genes. Our data suggest a mechanism where incomplete duplication created a novel function —at birth, antagonizing parental SRGAP2 function 2–3 mya a time corresponding to the transition from Australopithecus to Homo and the beginning of neocortex expansion.
It is well established that autism spectrum disorders (ASD) have a strong genetic component. However, for at least 70% of cases, the underlying genetic cause is unknown1. Under the hypothesis that de novo mutations underlie a substantial fraction of the risk for developing ASD in families with no previous history of ASD or related phenotypes—so-called sporadic or simplex families2,3, we sequenced all coding regions of the genome, i.e. the exome, for parent-child trios exhibiting sporadic ASD, including 189 new trios and 20 previously reported4. Additionally, we also sequenced the exomes of 50 unaffected siblings corresponding to these new (n = 31) and previously reported trios (n = 19)4, for a total of 677 individual exomes from 209 families. Here we show de novo point mutations are overwhelmingly paternal in origin (4:1 bias) and positively correlated with paternal age, consistent with the modest increased risk for children of older fathers to develop ASD5. Moreover, 39% (49/126) of the most severe or disruptive de novo mutations map to a highly interconnected beta-catenin/chromatin remodeling protein network ranked significantly for autism candidate genes. In proband exomes, recurrent protein-altering mutations were observed in two genes, CHD8 and NTNG1. Mutation screening of six candidate genes in 1,703 ASD probands identified additional de novo, protein-altering mutations in GRIN2B, LAMC3, and SCN1A. Combined with copy number variant (CNV) data, these results suggest extreme locus heterogeneity but also provide a target for future discovery, diagnostics, and therapeutics.