A key component of genetic architecture is the allelic spectrum influencing trait variability. For autism spectrum disorder (henceforth autism) the nature of its allelic spectrum is uncertain. Individual risk genes have been identified from rare variation, especially de novo mutations1–8. From this evidence one might conclude that rare variation dominates its allelic spectrum, yet recent studies show that common variation, individually of small effect, has substantial impact en masse9,10. At issue is how much of an impact relative to rare variation. Using a unique epidemiological sample from Sweden, novel methods that distinguish total narrow-sense heritability from that due to common variation, and by synthesizing results from other studies, we reach several conclusions about autism’s genetic architecture: its narrow-sense heritability is ≈54% and most traces to common variation; rare de novo mutations contribute substantially to individuals’ liability; still their contribution to variance in liability, 2.6%, is modest compared to heritable variation.
Whole-exome sequencing (WES) studies have demonstrated the contribution of de novo loss-of-function single nucleotide variants to autism spectrum disorders (ASD). However, challenges in the reliable detection of de novo insertions and deletions (indels) have limited inclusion of these variants in prior analyses. Through the application of a robust indel detection method to WES data from 787 ASD families (2,963 individuals), we demonstrate that de novo frameshift indels contribute to ASD risk (OR=1.6; 95%CI=1.0-2.7; p=0.03), are more common in female probands (p=0.02), are enriched among genes encoding FMRP targets (p=6×10−9), and arise predominantly on the paternal chromosome (p<0.001). Based on mutation rates in probands versus unaffected siblings, de novo frameshift indels contribute to risk in approximately 3.0% of individuals with ASD. Finally, through observing clustering of mutations in unrelated probands, we report two novel ASD-associated genes: KMT2E (MLL5), a chromatin regulator, and RIMS1, a regulator of synaptic vesicle release.
Autism spectrum disorder (ASD) is a complex developmental syndrome of unknown etiology. Recent studies employing exome- and genome-wide sequencing have identified nine high-confidence ASD (hcASD) genes. Working from the hypothesis that ASD-associated mutations in these biologically pleiotropic genes will disrupt intersecting developmental processes to contribute to a common phenotype, we have attempted to identify time periods, brain regions, and cell types in which these genes converge. We have constructed coexpression networks based on the hcASD “seed” genes, leveraging a rich expression data set encompassing multiple human brain regions across human development and into adulthood. By assessing enrichment of an independent set of probable ASD (pASD) genes, derived from the same sequencing studies, we demonstrate a key point of convergence in midfetal layer 5/6 cortical projection neurons. This approach informs when, where, and in what cell types mutations in these specific genes may be productively studied to clarify ASD pathophysiology.
Brain development follows a different trajectory in children with Autism Spectrum Disorders (ASD) than in typically developing children. A proxy for neurodevelopment could be head circumference (HC), but studies assessing HC and its clinical correlates in ASD have been inconsistent. This study investigates HC and clinical correlates in the Simons Simplex Collection cohort.
We used a mixed linear model to estimate effects of covariates and the deviation from the expected HC given parental HC (genetic deviation). After excluding individuals with incomplete data, 7225 individuals in 1891 families remained for analysis. We examined the relationship between HC/genetic deviation of HC and clinical parameters.
Gender, age, height, weight, genetic ancestry and ASD status were significant predictors of HC (estimate of the ASD effect=0.2cm). HC was approximately normally distributed in probands and unaffected relatives, with only a few outliers. Genetic deviation of HC was also normally distributed, consistent with a random sampling of parental genes. Whereas larger HC than expected was associated with ASD symptom severity and regression, IQ decreased with the absolute value of the genetic deviation of HC.
Measured against expected values derived from covariates of ASD subjects, statistical outliers for HC were uncommon. HC is a strongly heritable trait and population norms for HC would be far more accurate if covariates including genetic ancestry, height and age were taken into account. The association of diminishing IQ with absolute deviation from predicted HC values suggests HC could reflect subtle underlying brain development and warrants further investigation.
head circumference; body metrics; genetic ancestry; IQ; autism spectrum disorder; ASD
Two common sources of DNA for whole exome sequencing (WES) are whole blood (WB) and immortalized lymphoblastoid cell line (LCL). However, it is possible that LCLs have a substantially higher rate of mutation than WB, causing concern for their use in sequencing studies. We compared results from paired WB and LCL DNA samples for 16 subjects, using LCLs of low passage number (<5). Using a standard analysis pipeline we detected a large number of discordant genotype calls (approximately 50 per subject) that we segregated into categories of “confidence” based on read-level quality metrics. From these categories and validation by Sanger sequencing, we estimate that the vast majority of the candidate differences were false positives and that our categories were effective in predicting valid sequence differences, including LCLs with putative mosaicism for the non-reference allele (3–4 per exome). These results validate the use of DNA from LCLs of low passage number for exome sequencing.
graphical diagnostics; lymphoblastoid cell line; mosaicism; sequence variant call; strand bias; somatic mutation
The liability to addiction has been shown to be highly genetically correlated across drug classes, suggesting nondrug-specific mechanisms.
In 757 subjects, we performed association analysis between 1536 single nucleotide polymorphisms (SNPs) in 106 candidate genes and a drug use disorder diagnosis (DUD).
Associations (p ≤ .0008) were detected with three SNPs in the arginine vasopressin 1A receptor gene, AVPR1A, with a gene-wise p value of 3 × 10−5. Bioinformatic evidence points to a role for rs11174811 (microRNA binding site disruption) in AVPR1A function. Based on literature implicating AVPR1A in social bonding, we tested spousal satisfaction as a mediator of the association of rs11174811 with the DUD. Spousal satisfaction was significantly associated with DUD in males (p <.0001). The functional AVPR1A SNP, rs11174811, was associated with spousal satisfaction in males (p = .007). Spousal satisfaction was a significant mediator of the relationship between rs11174811 and DUD. We also present replication of the association in males between rs11174811 and substance use in one clinically ascertained (n = 1399) and one epidemiologic sample (n = 2231). The direction of the association is consistent across the clinically-ascertained samples but reversed in the epidemiologic sample. Lastly, we found a significant impact of rs11174811 genotype on AVPR1A expression in a postmortem brain sample.
The findings of this study call for expansion of research into the role of the arginine vasopressin and other neuropeptide system variation in DUD liability.
Addiction; alcoholism; gene systems; genetic association; social relationships; vasopressin
We previously reported genome-wide significant evidence for linkage between chromosome 6q and bipolar I disorder (BPI) by performing a meta-analysis of original genotype data from 11 genome scan linkage studies. We now present follow-up linkage disequilibrium mapping of the linked region utilizing 3,047 single nucleotide polymorphism (SNP) markers in a case–control sample (N = 530 cases, 534 controls) and family-based sample (N = 256 nuclear families, 1,301 individuals). The strongest single SNP result (rs6938431, P=6.72× 10−5) was observed in the case–control sample, near the solute carrier family 22, member 16 gene (SLC22A16). In a replication study, we genotyped 151 SNPs in an independent sample (N = 622 cases, 1,181 controls) and observed further evidence of association between variants at SLC22A16 and BPI. Although consistent evidence of association with any single variant was not seen across samples, SNP-wise and gene-based test results in the three samples provided convergent evidence for association with SLC22A16, a carnitine transporter, implicating this gene as a novel candidate for BPI risk. Further studies in larger samples are warranted to clarify which, if any, genes in the 6q region confer risk for bipolar disorder.
bipolar disorder; genetic; association; SLC22A16; 6q
De novo loss-of-function (dnLoF) mutations are found twofold more often in autism spectrum disorder (ASD) probands than their unaffected siblings. Multiple independent dnLoF mutations in the same gene implicate the gene in risk and hence provide a systematic, albeit arduous, path forward for ASD genetics. It is likely that using additional non-genetic data will enhance the ability to identify ASD genes.
To accelerate the search for ASD genes, we developed a novel algorithm, DAWN, to model two kinds of data: rare variations from exome sequencing and gene co-expression in the mid-fetal prefrontal and motor-somatosensory neocortex, a critical nexus for risk. The algorithm casts the ensemble data as a hidden Markov random field in which the graph structure is determined by gene co-expression and it combines these interrelationships with node-specific observations, namely gene identity, expression, genetic data and the estimated effect on risk.
Using currently available genetic data and a specific developmental time period for gene co-expression, DAWN identified 127 genes that plausibly affect risk, and a set of likely ASD subnetworks. Validation experiments making use of published targeted resequencing results demonstrate its efficacy in reliably predicting ASD genes. DAWN also successfully predicts known ASD genes, not included in the genetic data used to create the model.
Validation studies demonstrate that DAWN is effective in predicting ASD genes and subnetworks by leveraging genetic and gene expression data. The findings reported here implicate neurite extension and neuronal arborization as risks for ASD. Using DAWN on emerging ASD sequence data and gene expression data from other brain regions and tissues would likely identify novel ASD genes. DAWN can also be used for other complex disorders to identify genes and subnetworks in those disorders.
Autism; Risk prediction; Gene discovery; Weighted gene co-expression network analysis; Network; Hidden Markov random field; Neurite extension; Neuronal arborization
Given prior evidence for the contribution of rare copy number variations (CNVs) to autism spectrum disorders (ASD), we studied these events in 4,457 individuals from 1,174 simplex families, composed of parents, a proband and, in most kindreds, an unaffected sibling. We find significant association of ASD with de novo duplications of 7q11.23, where the reciprocal deletion causes Williams-Beuren syndrome, featuring a highly social personality. We identify rare recurrent de novo CNVs at five additional regions including two novel ASD loci, 16p13.2 (including the genes USP7 and C16orf72) and Cadherin13, and implement a rigorous new approach to evaluating the statistical significance of these observations. Overall, we find large de novo CNVs carry substantial risk (OR=3.55; CI =2.16-7.46, p=6.9 × 10−6); estimate the presence of 130-234 distinct ASD-related CNV intervals across the genome; and, based on data from multiple studies, present compelling evidence for the association of rare de novo events at 7q11.23, 15q11.2-13.1, 16p11.2, and Neurexin1.
Recent technological advances coupled with large sample sets have uncovered many factors underlying the genetic basis of traits and the predisposition to complex disease, but much is left to discover. A common thread to most genetic investigations is familial relationships. Close relatives can be identified from family records, and more distant relatives can be inferred from large panels of genetic markers. Unfortunately these empirical estimates can be noisy, especially regarding distant relatives. We propose a new method for denoising genetically—inferred relationship matrices by exploiting the underlying structure due to hierarchical groupings of correlated individuals. The approach, which we call Treelet Covariance Smoothing, employs a multiscale decomposition of covariance matrices to improve estimates of pairwise relationships. On both simulated and real data, we show that smoothing leads to better estimates of the relatedness amongst distantly related individuals. We illustrate our method with a large genome-wide association study and estimate the “heritability” of body mass index quite accurately. Traditionally heritability, defined as the fraction of the total trait variance attributable to additive genetic effects, is estimated from samples of closely related individuals using random effects models. We show that by using smoothed relationship matrices we can estimate heritability using population-based samples. Finally, while our methods have been developed for refining genetic relationship matrices and improving estimates of heritability, they have much broader potential application in statistics. Most notably, for error-in-variables random effects models and settings that require regularization of matrices with block or hierarchical structure.
Covariance estimation; cryptic relatedness; genome-wide association; heritability; kinship
To characterize the role of rare complete human knockouts in autism spectrum disorders (ASD), we identify genes with homozygous or compound heterozygous loss-of-function (LoF) variants (defined as nonsense and essential splice sites) from exome sequencing of 933 cases and 869 controls. We identify a two-fold increase in complete knockouts of autosomal genes with low rates of LoF variation (≤5% frequency) in cases and estimate a 3% contribution to ASD risk by these events, confirming this observation in an independent set of 563 probands and 4,605 controls. Outside the pseudo-autosomal regions on the X-chromosome, we similarly observe a significant 1.5-fold increase in rare hemizygous knockouts in males, contributing to another 2% of ASDs in males. Taken together these results provide compelling evidence that rare autosomal and X-chromosome complete gene knockouts are important inherited risk factors for ASD.
Genome-wide association studies (GWAS) implicate single nucleotide polymorphisms (SNPs) on chromosome 6p21.3-22.1, the human leukocyte antigen (HLA) region, as common risk factors for schizophrenia (SZ). Other studies implicate viral and protozoan exposure. Our study tests chromosome 6p SNPs for effects on SZ risk with and without exposure. Method: GWAS-significant SNPs and ancestry-informative marker SNPs were analyzed among African American patients with SZ (n = 604) and controls (n = 404). Exposure to herpes simplex virus, type 1 (HSV-1), cytomegalovirus (CMV), and Toxoplasma gondii (TOX) was assayed using specific antibody assays. Results: Five SNPs were nominally associated with SZ, adjusted for population admixture (P < .05, uncorrected for multiple comparisons). These SNPs were next analyzed in relation to infectious exposure. Multivariate analysis indicated significant association between rs3130297 genotype and HSV-1 exposure; the associated allele was different from the SZ risk allele. Conclusions: We propose a model for the genesis of SZ incorporating genomic variation in the HLA region and neurotropic viral exposure for testing in additional, independent African American samples.
HLA; gene; HSV-1; cytomegalovirus; schizophrenia; African American; kwd>
We evaluated the hypothesis that dopaminergic polymorphisms are risk factors for schizophrenia (SZ). In stage I, we screened 18 dopamine-related genes in two independent US Caucasian samples: 150 trios and 328 cases/501 controls. The most promising associations were detected with SLC6A3 (alias DAT), DRD3, COMT and SLC18A2 (alias VMAT2). In stage II, we comprehensively evaluated these four genes by genotyping 68 SNPs in all 478 cases and 501 controls from stage I. Fifteen (23.1%) significant associations were found (p ≤ 0.05). We sought epistasis between pairs of SNPs providing evidence of a main effect and observed 17 significant interactions (169 tests); 41.2% of significant interactions involved rs3756450 (5′ near promoter) or rs464049 (intron 4) at SLC6A3. In stage III, we confirmed our findings by genotyping 65 SNPs among 659 Bulgarian trios. Both SLC6A3 variants implicated in the US interactions were overtransmitted in this cohort (rs3756450, p = 0.035; rs464049, p = 0.011). Joint analyses from stages II and III identified associations at all four genes (pjoint < 0.05). We tested 29 putative interactions from stage II and detected replication between seven locus pairs (p ≤ 0.05). Simulations suggested our stage II and stage III interaction results were unlikely to have occurred by chance (p = 0.008 and 0.001, respectively). In stage IV we evaluasted rs464049 and rs3756450 for functional effects and found significant allele-specific differences at rs3756450 using electrophoretic mobility shift assays and dualluciferase promoter assays. Our data suggest that a network of dopaminergic polymorphisms increase risk for SZ.
Supported by National Institute of Mental Health (NIMH), this 12-site international collaboration seeks to identify genetic variants that affect risk for anorexia nervosa (AN).
Four hundred families will be ascertained with two or more individuals affected with AN. The assessment battery produces a rich set of phenotypes comprising eating disorder diagnoses and psychological and personality features known to be associated with vulnerability to eating disorders.
We report attributes of the first 200 families, comprising 200 probands and 232 affected relatives.
These results provide context for the genotyping of the first 200 families by the Center for Inherited Disease Research. We will analyze our first 200 families for linkage, complete recruitment of roughly 400 families, and then perform final linkage analyses on the complete cohort. DNA, genotypes, and phenotypes will form a national eating disorder repository maintained by NIMH and available to qualified investigators.
anorexia nervosa; eating disorders; bulimia nervosa; psychiatric disorders; genetics; linkage analysis; genomics
De novo mutations affect risk for many diseases and disorders, especially those with early-onset. An example is autism spectrum disorders (ASD). Four recent whole-exome sequencing (WES) studies of ASD families revealed a handful of novel risk genes, based on independent de novo loss-of-function (LoF) mutations falling in the same gene, and found that de novo LoF mutations occurred at a twofold higher rate than expected by chance. However successful these studies were, they used only a small fraction of the data, excluding other types of de novo mutations and inherited rare variants. Moreover, such analyses cannot readily incorporate data from case-control studies. An important research challenge in gene discovery, therefore, is to develop statistical methods that accommodate a broader class of rare variation. We develop methods that can incorporate WES data regarding de novo mutations, inherited variants present, and variants identified within cases and controls. TADA, for Transmission And De novo Association, integrates these data by a gene-based likelihood model involving parameters for allele frequencies and gene-specific penetrances. Inference is based on a Hierarchical Bayes strategy that borrows information across all genes to infer parameters that would be difficult to estimate for individual genes. In addition to theoretical development we validated TADA using realistic simulations mimicking rare, large-effect mutations affecting risk for ASD and show it has dramatically better power than other common methods of analysis. Thus TADA's integration of various kinds of WES data can be a highly effective means of identifying novel risk genes. Indeed, application of TADA to WES data from subjects with ASD and their families, as well as from a study of ASD subjects and controls, revealed several novel and promising ASD candidate genes with strong statistical support.
The genetic underpinnings of autism spectrum disorder (ASD) have proven difficult to determine, despite a wealth of evidence for genetic causes and ongoing effort to identify genes. Recently investigators sequenced the coding regions of the genomes from ASD children along with their unaffected parents (ASD trios) and identified numerous new candidate genes by pinpointing spontaneously occurring (de novo) mutations in the affected offspring. A gene with a severe (de novo) mutation observed in more than one individual is immediately implicated in ASD; however, the majority of severe mutations are observed only once per gene. These genes create a short list of candidates, and our results suggest about 50% are true risk genes. To strengthen our inferences, we develop a novel statistical method (TADA) that utilizes inherited variation transmitted to affected offspring in conjunction with (de novo) mutations to identify risk genes. Through simulations we show that TADA dramatically increases power. We apply this approach to nearly 1000 ASD trios and 2000 subjects from a case-control study and identify several promising genes. Through simulations and application we show that TADA's integration of sequencing data can be a highly effective means of identifying risk genes.
Progressive supranuclear palsy (PSP) is a neurodegenerative disorder pathologically characterized by intracellular tangles of hyperphosphorylated tau protein distributed throughout the neocortex, basal ganglia, and brainstem. A genome-wide association study identified EIF2AK3 as a risk factor for PSP. EIF2AK3 encodes PERK, part of the endoplasmic reticulum’s (ER) unfolded protein response (UPR). PERK is an ER membrane protein that senses unfolded protein accumulation within the ER lumen. Recently, several groups noted UPR activation in Alzheimer’s disease (AD), Parkinson’s disease (PD), amyotrophic lateral sclerosis, multiple system atrophy, and in the hippocampus and substantia nigra of PSP subjects. Here, we evaluate UPR PERK activation in the pons, medulla, midbrain, hippocampus, frontal cortex and cerebellum in subjects with PSP, AD, and in normal controls.
We found UPR activation primarily in disease-affected brain regions in both disorders. In PSP, the UPR was primarily activated in the pons and medulla and to a much lesser extent in the hippocampus. In AD, the UPR was extensively activated in the hippocampus. We also observed UPR activation in the hippocampus of some elderly normal controls, severity of which positively correlated with both age and tau pathology but not with Aβ plaque burden. Finally, we evaluated EIF2AK3 coding variants that influence PERK activation. We show that a haplotype associated with increased PERK activation is genetically associated with increased PSP risk.
The UPR is activated in disease affected regions in PSP and the genetic evidence shows that this activation increases risk for PSP and is not a protective response.
Progressive supranuclear palsy; PERK; Unfolded protein response; EIF2AK3; Alzheimer’s disease
Psychotic symptoms occur in approximately 40% of subjects with Alzheimer’s disease (AD) and are associated with more rapid cognitive decline and increased functional deficits. They show heritability up to 61% and have been proposed as a marker for a disease subtype suitable for gene mapping efforts. We undertook a combined analysis of three genome-wide association studies (GWAS) to identify loci that a) increase susceptibility to an AD and subsequent psychotic symptoms; or b) modify risk of psychotic symptoms in the presence of neurodegeneration caused by AD. 1299 AD cases with psychosis (AD+P), 735 AD cases without psychosis (AD-P) and 5659 controls were drawn from GERAD1, the NIA-LOAD family study and the University of Pittsburgh ADRC GWAS. Unobserved genotypes were imputed to provide data on > 1.8 million SNPs. Analyses in each dataset were completed comparing a) AD+P to AD-P cases, and b) AD+P cases with controls (GERAD1, ADRC only). Aside from the APOE locus, the strongest evidence for association was observed in an intergenic region on chromosome 4 (rs753129; ‘AD+PvAD-P’ P=2.85 × 10−7; ‘AD+PvControls’ P=1.11 × 10−4). SNPs upstream of SLC2A9 (rs6834555, P=3.0×10−7) and within VSNL1 (rs4038131, P=5.9×10−7) showed strongest evidence for association with AD+P when compared to controls. These findings warrant further investigation in larger, appropriately powered samples in which the presence of psychotic symptoms in AD has been well characterised.
Alzheimer’s disease; psychosis; behavioural symptoms; genome-wide association study; genetic
Pancreatitis is a complex, progressively destructive inflammatory disorder. Alcohol was long thought to be the primary causative agent, but genetic contributions have been of interest since the discovery that rare PRSS1, CFTR, and SPINK1 variants were associated with pancreatitis risk. We now report two significant genome-wide associations identified and replicated at PRSS1-PRSS2 (1×10-12) and x-linked CLDN2 (p < 1×10-21) through a two-stage genome-wide study (Stage 1, 676 cases and 4507 controls; Stage 2, 910 cases and 4170 controls). The PRSS1 variant affects susceptibility by altering expression of the primary trypsinogen gene. The CLDN2 risk allele is associated with atypical localization of claudin-2 in pancreatic acinar cells. The homozygous (or hemizygous male) CLDN2 genotype confers the greatest risk, and its alleles interact with alcohol consumption to amplify risk. These results could partially explain the high frequency of alcohol-related pancreatitis in men – male hemizygous frequency is 0.26, female homozygote is 0.07.
Multiple studies have confirmed the contribution of rare de novo copy number variations (CNVs) to the risk for Autism Spectrum Disorders (ASD).1-3 While de novo single nucleotide variants (SNVs) have been identified in affected individuals,4 their contribution to risk has yet to be clarified. Specifically, the frequency and distribution of these mutations has not been well characterized in matched unaffected controls, data that are vital to the interpretation of de novo coding mutations observed in probands. Here we show, via whole-exome sequencing of 928 individuals, including 200 phenotypically discordant sibling pairs, that highly disruptive (nonsense and splice-site) de novo mutations in brain-expressed genes are associated with ASD and carry large effects (OR=5.65; CI: 1.44-22.2; p=0.01 asymptotic test). Based on mutation rates in unaffected individuals, we demonstrate that multiple independent de novo SNVs in the same gene among unrelated probands reliably identifies risk alleles, providing a clear path forward for gene discovery. Among a total of 279 identified de novo coding mutations, there is a single instance in probands, and none in siblings, in which two independent nonsense variants disrupt the same gene, SCN2A (Sodium Channel, Voltage-Gated, Type II, Alpha Subunit), a result that is highly unlikely by chance (p=0.005).
We report on results from whole-exome sequencing (WES) of 1,039 subjects diagnosed with autism spectrum disorders (ASD) and 870 controls selected from the NIMH repository to be of similar ancestry to cases. The WES data came from two centers using different methods to produce sequence and to call variants from it. Therefore, an initial goal was to ensure the distribution of rare variation was similar for data from different centers. This proved straightforward by filtering called variants by fraction of missing data, read depth, and balance of alternative to reference reads. Results were evaluated using seven samples sequenced at both centers and by results from the association study. Next we addressed how the data and/or results from the centers should be combined. Gene-based analyses of association was an obvious choice, but should statistics for association be combined across centers (meta-analysis) or should data be combined and then analyzed (mega-analysis)? Because of the nature of many gene-based tests, we showed by theory and simulations that mega-analysis has better power than meta-analysis. Finally, before analyzing the data for association, we explored the impact of population structure on rare variant analysis in these data. Like other recent studies, we found evidence that population structure can confound case-control studies by the clustering of rare variants in ancestry space; yet, unlike some recent studies, for these data we found that principal component-based analyses were sufficient to control for ancestry and produce test statistics with appropriate distributions. After using a variety of gene-based tests and both meta- and mega-analysis, we found no new risk genes for ASD in this sample. Our results suggest that standard gene-based tests will require much larger samples of cases and controls before being effective for gene discovery, even for a disorder like ASD.
This study evaluates association of rare variants and autism spectrum disorders (ASD) in case and control samples sequenced by two centers. Before doing association analyses, we studied how to combine information across studies. We first harmonized the whole-exome sequence (WES) data, across centers, in terms of the distribution of rare variation. Key features included filtering called variants by fraction of missing data, read depth, and balance of alternative to reference reads. After filtering, the vast majority of variants calls from seven samples sequenced at both centers matched. We also evaluated whether one should combine summary statistics from data from each center (meta-analysis) or combine data and analyze it together (mega-analysis). For many gene-based tests, we showed that mega-analysis yields more power. After quality control of data from 1,039 ASD cases and 870 controls and a range of analyses, no gene showed exome-wide evidence of significant association. Our results comport with recent results demonstrating that hundreds of genes affect risk for ASD; they suggest that rare risk variants are scattered across these many genes, and thus larger samples will be required to identify those genes.
Autism spectrum disorders (ASD) are believed to have genetic and environmental origins, yet in only a modest fraction of individuals can specific causes be identified1,2. To identify further genetic risk factors, we assess the role of de novo mutations in ASD by sequencing the exomes of ASD cases and their parents (n= 175 trios). Fewer than half of the cases (46.3%) carry a missense or nonsense de novo variant and the overall rate of mutation is only modestly higher than the expected rate. In contrast, there is significantly enriched connectivity among the proteins encoded by genes harboring de novo missense or nonsense mutations, and excess connectivity to prior ASD genes of major effect, suggesting a subset of observed events are relevant to ASD risk. The small increase in rate of de novo events, when taken together with the connections among the proteins themselves and to ASD, are consistent with an important but limited role for de novo point mutations, similar to that documented for de novo copy number variants. Genetic models incorporating these data suggest that the majority of observed de novo events are unconnected to ASD, those that do confer risk are distributed across many genes and are incompletely penetrant (i.e., not necessarily causal). Our results support polygenic models in which spontaneous coding mutations in any of a large number of genes increases risk by 5 to 20-fold. Despite the challenge posed by such models, results from de novo events and a large parallel case-control study provide strong evidence in favor of CHD8 and KATNAL2 as genuine autism risk factors.
To investigate the underlying phenotypic constructs in autism spectrum disorders (ASD) and to identify genetic loci that are linked to these empirically derived factors.
Exploratory factor analysis was applied to two datasets with 28 selected Autism Diagnostic Interview-Revised (ADI-R) algorithm items. The first dataset was from the Autism Genome Project (AGP) phase I (1,236 ASD subjects from 618 families); the second was from the AGP phase II (804 unrelated ASD subjects). Variables derived from the factor analysis were then used as quantitative traits in genome-wide variance components linkage analyses.
Six factors, joint attention, social interaction and communication, non-verbal communication, repetitive sensory-motor behaviour, peer interaction, and compulsion/restricted interests, were retained for both datasets. There was good agreement between the factor loading patterns from the two datasets. All factors showed familial aggregation. Suggestive evidence for linkage was obtained for the joint attention factor on 11q23. Genome-wide significant evidence for linkage was obtained for the repetitive sensory-motor behaviour factor on 19q13.3.
This study demonstrates that the underlying phenotypic constructs based on the ADI-R algorithm items are replicable in independent datasets; and the empirically derived factors are suitable and informative in genetic studies of ASD.
autism; ADI-R; factor analysis; linkage analysis; quantitative trait
Studies of copy number variation (CNV) have successfully characterized loci and molecular pathways involved in a range of neuropsychiatric conditions. We conducted an analysis of rare CNVs in Tourette Syndrome (TS) to identify novel risk regions and relevant molecular pathways, evaluate the burden of structural variation in cases versus controls, and to assess the overlap of identified variations with those implicated in other neuropsychiatric syndromes.
We conducted a case-control study of 460 individuals with TS, including 148 parent-child trios and 1131 controls. CNV analysis was undertaken using 370K to 1M probe arrays, and genome-wide genotyping data was used to match cases and controls for ancestry. Transmitted and de novo CNVs present in < 1% of the population were evaluated.
While there was no significant increase in the number of de novo or transmitted rare CNVs in cases versus controls, pathway analysis using multiple algorithms showed enrichment of genes within histamine receptor (H1R and H2R) signaling pathways (p=5.8×10-4-1.6×10-2) as well as “axon guidance”, “cell adhesion”, “nervous system development” and “synaptic structure and function” processes. Genes mapping within rare CNVs in TS showed significant overlap with those previously identified in autism spectrum disorders (ASD), but not intellectual disability or schizophrenia. Three large, likely-pathogenic, de novo events were identified, including one disrupting multiple gamma-Aminobutyric acid (GABA) receptor genes.
We identify further evidence supporting recent findings regarding the involvement of histaminergic and GABAergic mechanisms in the etiology of TS and show an overlap of rare CNVs in TS and ASD.
Tourette syndrome; copy number variation; CNV; histamine; GABA; autism
We report on copy number variants (CNVs) found in Palauan subjects ascertained for schizophrenia and related psychotic disorders in extended pedigrees in Palau. We compare CNVs found in this Oceanic population to those seen in other samples, typically of European ancestry. Assessing CNVs in Palauan extended pedigrees yields insight into the evolution of risk CNVs, such as how they arise, are transmitted, and are lost from populations by stochastic or selective processes, none of which is easily measured from case-control samples.
DNA samples from 197 subjects affected with schizophrenia and related psychotic disorders, 185 of their relatives, and 159 controls were successfully characterized for CNVs using Affymetrix Genomewide Human SNP Array 5.0.
CNVs thought to be associated with risk for schizophrenia and related disorders also occur in affected individuals in Palau, specifically 15q11.2 and 1q21.1 deletions, partial duplication of IL1RAPL1 (Xp21.3), and chromosome X duplications (Klinefleter’s syndrome). Partial duplication within A2BP1 appears to convey an 8-fold increased risk in males (95% CI, 0.8–84.4) but not females (OR=0.4, 95% CI, 0.03–4.9). Affected-only linkage analysis using this variant yields a LOD score of 3.5.
This study reveals CNVs that confer risk to schizophrenia and related psychotic disorders in Palau, most of which have been previously observed in samples of European ancestry. Only a few of these CNVs show evidence that they have existed for many generations, consistent with risk variants diminishing reproductive success.
Schizophrenia; Psychotic disorders; Copy Number Variants (CNVs); A2BP1; IL1RAPL1; Palau
Apolipoprotein E (APOE) ε4 alleles increase the risk for late-onset Alzheimer disease (LOAD) and decrease the age of onset. Recently, sequencing the APOE region in a small sample of LOAD subjects identified a variable length poly-T repeat sequence in the nearby gene, TOMM40, which may affect age of onset. We genotyped the TOMM40 poly-T repeat using a novel statistical approach to refine the identification of allele length in 892 LOAD subjects and evaluated its effects on age of onset. Because psychosis in LOAD is a heritable phenotype which has shown conflicting associations with APOE genotype, we also evaluated the association of poly-T repeat length with psychosis. Poly-T repeat lengths had a trimodal distribution which differed between APOE genotype groups. After accounting for APOE ε4 there was no association of poly-T repeat length with age of onset. Neither APOE ε4 nor poly-T repeat length was associated with psychosis. Our findings do not support the association of poly-T repeat length with age of onset in LOAD. The clinical implications of this repeat length polymorphism remain to be elucidated.
Apolipoprotein E (APOE) ε4; late-onset Alzheimer disease (LOAD); psychosis; TOMM40; variable length poly-T repeat sequence