Genome sequencing can identify individuals in the general population who harbor rare coding variants in genes for Mendelian disorders1–7 – and who consequently may have increased disease risk. However, previous studies of rare variants in phenotypically extreme individuals have ascertainment bias and may demonstrate inflated effect size estimates8–12. We sequenced seven genes for maturity-onset diabetes of the young (MODY)13 in well-phenotyped population samples14,15 (n=4,003). Rare variants were filtered according to prediction criteria used to identify disease-causing mutations: i) previously-reported in MODY, and ii) stringent de novo thresholds satisfied (rare, conserved, protein damaging). Approximately 1.5% and 0.5% of randomly selected Framingham and Jackson Heart Study individuals carried variants from these two classes, respectively. However, the vast majority of carriers remained euglycemic through middle age. Accurate estimates of variant effect sizes from population-based sequencing are needed to avoid falsely predicting a significant fraction of individuals as at risk for MODY or other Mendelian diseases.
Loss-of-function mutations protective against human disease provide in vivo validation of therapeutic targets1,2,3, yet none are described for type 2 diabetes (T2D). Through sequencing or genotyping ~150,000 individuals across five ethnicities, we identified 12 rare protein-truncating variants in SLC30A8, which encodes an islet zinc transporter (ZnT8)4 and harbors a common variant (p.Trp325Arg) associated with T2D risk, glucose, and proinsulin levels5–7. Collectively, protein-truncating variant carriers had 65% reduced T2D risk (p=1.7×10−6), and non-diabetic Icelandic carriers of a frameshift variant (p.Lys34SerfsX50) demonstrated reduced glucose levels (−0.17 s.d., p=4.6×10−4). The two most common protein-truncating variants (p.Arg138X and p.Lys34SerfsX50) individually associate with T2D protection and encode unstable ZnT8 proteins. Previous functional study of SLC30A8 suggested reduced zinc transport increases T2D risk8,9, yet phenotypic heterogeneity was observed in rodent Slc30a8 knockouts10–15. Contrastingly, loss-of-function mutations in humans provide strong evidence that SLC30A8 haploinsufficiency protects against T2D, proposing ZnT8 inhibition as a therapeutic strategy in T2D prevention.
Autosomal recessive hypercholesterolemia (ARH) is a rare inherited disorder characterized by extremely high total and low-density lipoprotein cholesterol levels that has been previously linked to mutations in LDLRAP1. We identified a family with ARH not explained by mutations in LDLRAP1 or other genes known to cause monogenic hypercholesterolemia. The aim of this study was to identify the molecular etiology of ARH in this family.
Approach and Results
We used exome sequencing to assess all protein coding regions of the genome in three family members and identified a homozygous exon 8 splice junction mutation (c.894G>A, also known as E8SJM) in LIPA that segregated with the diagnosis of hypercholesterolemia. Since homozygosity for mutations in LIPA is known to cause cholesterol ester storage disease (CESD), we performed directed follow-up phenotyping by non-invasively measuring hepatic cholesterol content. We observed abnormal hepatic accumulation of cholesterol in the homozygote individuals, supporting the diagnosis of CESD. Given previous suggestions of cardiovascular disease risk in heterozygous LIPA mutation carriers, we genotyped E8SJM in >27,000 individuals and found no association with plasma lipid levels or risk of myocardial infarction, confirming a true recessive mode of inheritance.
By integrating observations from Mendelian and population genetics along with directed clinical phenotyping, we diagnosed clinically unapparent CESD in the affected individuals from this kindred and addressed an outstanding question regarding risk of cardiovascular disease in LIPA E8SJM heterozygous carriers.
hypercholesterolemia; genetics; myocardial infarction
Background & Aims
Liver cirrhosis affects 1%–2% of population and is the major risk factor of hepatocellular carcinoma (HCC). Hepatitis C cirrhosis-related HCC is the most rapidly increasing cause of cancer death in the US. Non-invasive methods have been developed to identify patients with asymptomatic, early-stage cirrhosis, increasing the burden of HCC surveillance, but biomarkers are needed to identify patients with cirrhosis who are most in need of surveillance. We investigated whether a liver-derived 186-gene signature previously associated with outcomes of patients with HCC is prognostic for patients newly diagnosed with cirrhosis but without HCC.
We performed gene expression profile analysis of formalin-fixed needle biopsies from the livers of 216 patients with hepatitis C-related early-stage (Child-Pugh class A) cirrhosis who were prospectively followed for a median of 10 years at an Italian center. We evaluated whether the 186-gene signature was associated with death, progression of cirrhosis, and development of HCC.
Fifty-five (25%), 101 (47%), and 60 (28%) patients were classified as having poor-, intermediate-, and good-prognosis signatures, respectively. In multivariable Cox regression modeling, the poor-prognosis signature was significantly associated with death (P=.004), progression to advanced cirrhosis (P<.001), and development of HCC (P=.009). The 10-year rates of survival were 63%, 74%, and 85% and the annual incidences of HCC were 5.8%, 2.2%, and 1.5% for patients with poor-, intermediate-, and good-prognosis signatures, respectively.
A 186-gene signature used to predict outcomes of patients with HCC is also associated with outcomes of patients with hepatitis C-related early-stage cirrhosis. This signature might be used to identify patients with cirrhosis in most need of surveillance and strategies to prevent their development of HCC.
liver cancer prevention; early detection; screening; whole genome gene expression profiling
The analysis of exonic DNA from prostate cancers has identified recurrently mutated genes, but the spectrum of genome-wide alterations has not been profiled extensively in this disease. We sequenced the genomes of 57 prostate tumors and matched normal tissues to characterize somatic alterations and to study how they accumulate during oncogenesis and progression. By modeling the genesis of genomic rearrangements, we identified abundant DNA translocations and deletions that arise in a highly interdependent manner. This phenomenon, which we term “chromoplexy”, frequently accounts for the dysregulation of prostate cancer genes and appears to disrupt multiple cancer genes coordinately. Our modeling suggests that chromoplexy may induce considerable genomic derangement over relatively few events in prostate cancer and other neoplasms, supporting a model of punctuated cancer evolution. By characterizing the clonal hierarchy of genomic lesions in prostate tumors, we charted a path of oncogenic events along which chromoplexy may drive prostate carcinogenesis.
Clonal evolution is a key feature of cancer progression and relapse. We studied intratumoral heterogeneity in 149 chronic lymphocytic leukemia (CLL) cases by integrating whole-exome sequence and copy number to measure the fraction of cancer cells harboring each somatic mutation. We identified driver mutations as predominantly clonal (e.g., MYD88, trisomy 12 and del(13q)) or subclonal (e.g., SF3B1, TP53), corresponding to earlier and later events in CLL evolution. We sampled leukemia cells from 18 patients at two timepoints. Ten of 12 CLL cases treated with chemotherapy (but only 1 of 6 without treatment) underwent clonal evolution, predominantly involving subclones with driver mutations (e.g., SF3B1, TP53) that expanded over time. Furthermore, presence of a subclonal driver mutation was an independent risk factor for rapid disease progression. Our study thus uncovers patterns of clonal evolution in CLL, providing insights into its stepwise transformation, and links the presence of subclones with adverse clinical outcome.
Major international projects are now underway aimed at creating a comprehensive catalog of all genes responsible for the initiation and progression of cancer. These studies involve sequencing of matched tumor–normal samples followed by mathematical analysis to identify those genes in which mutations occur more frequently than expected by random chance. Here, we describe a fundamental problem with cancer genome studies: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds. The list includes many implausible genes (such as those encoding olfactory receptors and the muscle protein titin), suggesting extensive false positive findings that overshadow true driver events. Here, we show that this problem stems largely from mutational heterogeneity and provide a novel analytical methodology, MutSigCV, for resolving the problem. We apply MutSigCV to exome sequences from 3,083 tumor-normal pairs and discover extraordinary variation in (i) mutation frequency and spectrum within cancer types, which shed light on mutational processes and disease etiology, and (ii) mutation frequency across the genome, which is strongly correlated with DNA replication timing and also with transcriptional activity. By incorporating mutational heterogeneity into the analyses, MutSigCV is able to eliminate most of the apparent artefactual findings and allow true cancer genes to rise to attention.
Purine biosynthesis and metabolism, conserved in all living organisms, is essential for cellular energy homeostasis and nucleic acids synthesis. The de novo synthesis of purine precursors is under tight negative feedback regulation mediated by adenosine and guanine nucleotides. We describe a new distinct early-onset neurodegenerative condition resulting from mutations in the adenosine monophosphate deaminase 2 gene (AMPD2). Patients have characteristic brain imaging features of pontocerebellar hypoplasia (PCH), due to loss of brainstem and cerebellar parenchyma. We found that AMPD2 plays an evolutionary conserved role in the maintenance of cellular guanine nucleotide pools by regulating the feedback inhibition of adenosine derivatives on de novo purine synthesis. AMPD2 deficiency results in defective GTP-dependent initiation of protein translation, which can be rescued by administration of purine precursors. These data suggest AMPD2-related PCH as a new, potentially treatable early-onset neurodegenerative disease.
Purine; pyrimidine; deaminase; salvage; translation; GTP; de novo synthesis; neurodegeneration
While genetic lesions responsible for some Mendelian disorders can be rapidly discovered through massively parallel sequencing (MPS) of whole genomes or exomes, not all diseases readily yield to such efforts. We describe the illustrative case of the simple Mendelian disorder medullary cystic kidney disease type 1 (MCKD1), mapped more than a decade ago to a 2-Mb region on chromosome 1. Ultimately, only by cloning, capillary sequencing, and de novo assembly, we found that each of six MCKD1 families harbors an equivalent, but apparently independently arising, mutation in sequence dramatically underrepresented in MPS data: the insertion of a single C in one copy (but a different copy in each family) of the repeat unit comprising the extremely long (~1.5-5 kb), GC-rich (>80%), coding VNTR in the mucin 1 gene. The results provide a cautionary tale about the challenges in identifying genes responsible for Mendelian, let alone more complex, disorders through MPS.
To characterize the role of rare complete human knockouts in autism spectrum disorders (ASD), we identify genes with homozygous or compound heterozygous loss-of-function (LoF) variants (defined as nonsense and essential splice sites) from exome sequencing of 933 cases and 869 controls. We identify a two-fold increase in complete knockouts of autosomal genes with low rates of LoF variation (≤5% frequency) in cases and estimate a 3% contribution to ASD risk by these events, confirming this observation in an independent set of 563 probands and 4,605 controls. Outside the pseudo-autosomal regions on the X-chromosome, we similarly observe a significant 1.5-fold increase in rare hemizygous knockouts in males, contributing to another 2% of ASDs in males. Taken together these results provide compelling evidence that rare autosomal and X-chromosome complete gene knockouts are important inherited risk factors for ASD.
Detection of somatic point substitutions is a key step in characterizing the cancer genome. Mutations in cancer are rare (0.1–100/Mb) and often occur only in a subset of the sequenced cells, either due to contamination by normal cells or due to tumor heterogeneity. Consequently, mutation calling methods need to be both specific, avoiding false positives, and sensitive to detect clonal and sub-clonal mutations. The decreased sensitivity of existing methods for low allelic fraction mutations highlights the pressing need for improved and systematically evaluated mutation detection methods. Here we present MuTect, a method based on a Bayesian classifier designed to detect somatic mutations with very low allele-fractions, requiring only a few supporting reads, followed by a set of carefully tuned filters that ensure high specificity. We also describe novel benchmarking approaches, which use real sequencing data to evaluate the sensitivity and specificity as a function of sequencing depth, base quality and allelic fraction. Compared with other methods, MuTect has higher sensitivity with similar specificity, especially for mutations with allelic fractions as low as 0.1 and below, making MuTect particularly useful for studying cancer subclones and their evolution in standard exome and genome sequencing data.
The incidence of esophageal adenocarcinoma (EAC) has risen 600% over the last 30 years. With a five-year survival rate of 15%, identification of new therapeutic targets for EAC is greatly important. We analyze the mutation spectra from whole exome sequencing of 149 EAC tumors/normal pairs, 15 of which have also been subjected to whole genome sequencing. We identify a mutational signature defined by a high prevalence of A to C transversions at AA dinucleotides. Statistical analysis of exome data identified significantly mutated 26 genes. Of these genes, four (TP53, CDKN2A, SMAD4, and PIK3CA) have been previously implicated in EAC. The novel significantly mutated genes include chromatin modifying factors and candidate contributors: SPG20, TLR4, ELMO1, and DOCK2. Functional analyses of EAC-derived mutations in ELMO1 reveal increased cellular invasion. Therefore, we suggest a new hypothesis about the potential activation of the RAC1 pathway to be a contributor to EAC tumorigenesis.
Prior studies have identified recurrent oncogenic mutations in colorectal adenocarcinoma1 and have surveyed exons of protein-coding genes for mutations in 11 affected individuals2,3. Here we report whole-genome sequencing from nine individuals with colorectal cancer, including primary colorectal tumors and matched adjacent non-tumor tissues, at an average of 30.7× and 31.9× coverage, respectively. We identify an average of 75 somatic rearrangements per tumor, including complex networks of translocations between pairs of chromosomes. Eleven rearrangements encode predicted in-frame fusion proteins, including a fusion of VTI1A and TCF7L2 found in 3 out of 97 colorectal cancers. Although TCF7L2 encodes TCF4, which cooperates with β-catenin4 in colorectal carcinogenesis5,6, the fusion lacks the TCF4 β-catenin–binding domain. We found a colorectal carcinoma cell line harboring the fusion gene to be dependent on VTI1A-TCF7L2 for anchorage-independent growth using RNA interference-mediated knockdown. This study shows previously unidentified levels of genomic rearrangements in colorectal carcinoma that can lead to essential gene fusions and other oncogenic events.
Lung adenocarcinoma, the most common subtype of non-small cell lung cancer, is responsible for over 500,000 deaths per year worldwide. Here, we report exome and genome sequences of 183 lung adenocarcinoma tumor/normal DNA pairs. These analyses revealed a mean exonic somatic mutation rate of 12.0 events/megabase and identified the majority of genes previously reported as significantly mutated in lung adenocarcinoma. In addition, we identified statistically recurrent somatic mutations in the splicing factor gene U2AF1 and truncating mutations affecting RBM10 and ARID1A. Analysis of nucleotide context-specific mutation signatures grouped the sample set into distinct clusters that correlated with smoking history and alterations of reported lung adenocarcinoma genes. Whole genome sequence analysis revealed frequent structural re-arrangements, including in-frame exonic alterations within EGFR and SIK2 kinases. The candidate genes identified in this study are attractive targets for biological characterization and therapeutic targeting of lung adenocarcinoma.
Neuroblastoma is a malignancy of the developing sympathetic nervous system that often presents with widespread metastatic disease, resulting in survival rates of less than 50%1. To determine the spectrum of somatic mutation in high-risk neuroblastoma, we studied 240 cases using a combination of whole exome, genome and transcriptome sequencing as part of the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative. Here we report a low median exonic mutation frequency of 0.60 per megabase (0.48 non-silent), and remarkably few recurrently mutated genes in these tumors. Genes with significant somatic mutation frequencies included ALK (9.2% of cases), PTPN11 (2.9%), ATRX (2.5%, an additional 7.1% had focal deletions), MYCN (1.7%, a recurrent p.Pro44Leu alteration), and NRAS (0.83%). Rare, potentially pathogenic germline variants were significantly enriched in ALK, CHEK2, PINK1, and BARD1. The relative paucity of recurrent somatic mutations in neuroblastoma challenges current therapeutic strategies reliant upon frequently altered oncogenic drivers.
Despite recent insights into melanoma genetics, systematic surveys for driver mutations are challenged by an abundance of passenger mutations caused by carcinogenic ultraviolet (UV) light exposure. We developed a permutation-based framework to address this challenge, employing mutation data from intronic sequences to control for passenger mutational load on a per gene basis. Analysis of large-scale melanoma exome data by this approach discovered six novel melanoma genes (PPP6C, RAC1, SNX31, TACC1, STK19 and ARID2), three of which - RAC1, PPP6C and STK19 - harbored recurrent and potentially targetable mutations. Integration with chromosomal copy number data contextualized the landscape of driver mutations, providing oncogenic insights in BRAF- and NRAS-driven melanoma as well as those without known NRAS/BRAF mutations. The landscape also clarified a mutational basis for RB and p53 pathway deregulation in this malignancy. Finally, the spectrum of driver mutations provided unequivocal genomic evidence for a direct mutagenic role of UV light in melanoma pathogenesis.
Systemic lupus erythematosus (SLE) is a common systemic autoimmune disease with complex etiology but strong clustering in families (λS = ~30). We performed a genome-wide association scan using 317,501 SNPs in 720 women of European ancestry with SLE and in 2,337 controls, and we genotyped consistently associated SNPs in two additional independent sample sets totaling 1,846 affected women and 1,825 controls. Aside from the expected strong association between SLE and the HLA region on chromosome 6p21 and the previously confirmed non-HLA locus IRF5 on chromosome 7q32, we found evidence of association with replication (1.1 × 10−7 < Poverall < 1.6 × 10−23; odds ratio 0.82–1.62)in four regions: 16p11.2 (ITGAM), 11p15.5 (KIAA1542), 3p14.3 (PXK) and 1q25.1 (rs10798269). We also found evidence for association (P < 1 × 10−5) at FCGR2A, PTPN22 and STAT4, regions previously associated with SLE and other autoimmune diseases, as well as at ≥9 other loci (P < 2 × 10−7). Our results show that numerous genes, some with known immune-related functions, predispose to SLE.
As a first step toward understanding how rare variants contribute to risk for complex diseases, we sequenced 15,585 human protein-coding genes to an average median depth of 111× in 2440 individuals of European (n = 1351) and African (n = 1088) ancestry. We identified over 500,000 single-nucleotide variants (SNVs), the majority of which were rare (86% with a minor allele frequency less than 0.5%), previously unknown (82%), and population-specific (82%). On average, 2.3% of the 13,595 SNVs each person carried were predicted to affect protein function of ∼313 genes per genome, and ∼95.7% of SNVs predicted to be functionally important were rare. This excess of rare functional variants is due to the combined effects of explosive, recent accelerated population growth and weak purifying selection. Furthermore, we show that large sample sizes will be required to associate rare variants with complex traits.
Establishing the age of each mutation segregating in contemporary human populations is important to fully understand our evolutionary history1,2 and will help facilitate the development of new approaches for disease gene discovery3. Large-scale surveys of human genetic variation have reported signatures of recent explosive population growth4-6, notable for an excess of rare genetic variants, qualitatively suggesting that many mutations arose recently. To more quantitatively assess the distribution of mutation ages, we resequenced 15,336 genes in 6,515 individuals of European (n=4,298) and African (n=2,217) American ancestry and inferred the age of 1,146,401 autosomal single nucleotide variants (SNVs). We estimate that ~73% of all protein-coding SNVs and ~86% of SNVs predicted to be deleterious arose in the past 5,000-10,000 years. The average age of deleterious SNVs varied significantly across molecular pathways, and disease genes contained a significantly higher proportion of recently arisen deleterious SNVs compared to other genes. Furthermore, European Americans had an excess of deleterious variants in essential and Mendelian disease genes compared to African Americans, consistent with weaker purifying selection due to the out-of-Africa dispersal. Our results better delimit the historical details of human protein-coding variation, illustrate the profound effect recent human history has had on the burden of deleterious SNVs segregating in contemporary populations, and provides important practical information that can be used to prioritize variants in disease gene discovery.
Autism spectrum disorders are a genetically heterogeneous constellation of syndromes characterized by impairments in reciprocal social interaction. Available somatic treatments have limited efficacy. We have identified inactivating mutations in the gene BCKDK (Branched Chain Ketoacid Dehydrogenase Kinase) in consanguineous families with autism, epilepsy, and intellectual disability. The encoded protein is responsible for phosphorylation-mediated inactivation of the E1α subunit of branched-chain ketoacid dehydrogenase (BCKDH). Patients with homozygous BCKDK mutations display reductions in BCKDK messenger RNA and protein, E1α phosphorylation, and plasma branched-chain amino acids. Bckdk knockout mice show abnormal brain amino acid profiles and neurobehavioral deficits that respond to dietary supplementation. Thus, autism presenting with intellectual disability and epilepsy caused by BCKDK mutations represents a potentially treatable syndrome.
The somatic genetic basis of chronic lymphocytic leukemia, a common and clinically heterogeneous leukemia occurring in adults, remains poorly understood.
We obtained DNA samples from leukemia cells in 91 patients with chronic lymphocytic leukemia and performed massively parallel sequencing of 88 whole exomes and whole genomes, together with sequencing of matched germline DNA, to characterize the spectrum of somatic mutations in this disease.
Nine genes that are mutated at significant frequencies were identified, including four with established roles in chronic lymphocytic leukemia (TP53 in 15% of patients, ATM in 9%, MYD88 in 10%, and NOTCH1 in 4%) and five with unestablished roles (SF3B1, ZMYM3, MAPK1, FBXW7, and DDX3X). SF3B1, which functions at the catalytic core of the spliceosome, was the second most frequently mutated gene (with mutations occurring in 15% of patients). SF3B1 mutations occurred primarily in tumors with deletions in chromosome 11q, which are associated with a poor prognosis in patients with chronic lymphocytic leukemia. We further discovered that tumor samples with mutations in SF3B1 had alterations in pre–messenger RNA (mRNA) splicing.
Our study defines the landscape of somatic mutations in chronic lymphocytic leukemia and highlights pre-mRNA splicing as a critical cellular process contributing to chronic lymphocytic leukemia.
Prostate cancer is the second most common cancer in men worldwide and causes over 250,000 deaths each year1. Overtreatment of indolent disease also results in significant morbidity2. Common genetic alterations in prostate cancer include losses of NKX3.1 (8p21)3,4 and PTEN (10q23)5,6, gains of the androgen receptor gene (AR)7,8 and fusion of ETS-family transcription factor genes with androgen-responsive promoters9–11. Recurrent somatic base-pair substitutions are believed to be less contributory in prostate tumorigenesis12,13 but have not been systematically analyzed in large cohorts. Here we sequenced the exomes of 112 prostate tumor/normal pairs. Novel recurrent mutations were identified in multiple genes, including MED12 and FOXA1. SPOP was the most frequently mutated gene, with mutations involving the SPOP substrate binding cleft in 6–15% of tumors across multiple independent cohorts. SPOP-mutant prostate cancers lacked ETS rearrangements and exhibited a distinct pattern of genomic alterations. Thus, SPOP mutations may define a new molecular subtype of prostate cancer.
Neighboring genes are often coordinately expressed within cis-regulatory modules, but evidence that nonparalogous genes share functions in mammals is lacking. Here, we report that mutation of either TMEM138 or TMEM216 causes a phenotypically indistinguishable human ciliopathy, Joubert syndrome. Despite a lack of sequence homology, the genes are aligned in a head-to-tail configuration and joined by chromosomal rearrangement at the amphibian-to-reptile evolutionary transition. Expression of the two genes is mediated by a conserved regulatory element in the noncoding intergenic region. Coordinated expression is important for their interdependent cellular role in vesicular transport to primary cilia. Hence, during vertebrate evolution of genes involved in ciliogenesis, nonparalogous genes were arranged to a functional gene cluster with shared regulatory elements.
We report on results from whole-exome sequencing (WES) of 1,039 subjects diagnosed with autism spectrum disorders (ASD) and 870 controls selected from the NIMH repository to be of similar ancestry to cases. The WES data came from two centers using different methods to produce sequence and to call variants from it. Therefore, an initial goal was to ensure the distribution of rare variation was similar for data from different centers. This proved straightforward by filtering called variants by fraction of missing data, read depth, and balance of alternative to reference reads. Results were evaluated using seven samples sequenced at both centers and by results from the association study. Next we addressed how the data and/or results from the centers should be combined. Gene-based analyses of association was an obvious choice, but should statistics for association be combined across centers (meta-analysis) or should data be combined and then analyzed (mega-analysis)? Because of the nature of many gene-based tests, we showed by theory and simulations that mega-analysis has better power than meta-analysis. Finally, before analyzing the data for association, we explored the impact of population structure on rare variant analysis in these data. Like other recent studies, we found evidence that population structure can confound case-control studies by the clustering of rare variants in ancestry space; yet, unlike some recent studies, for these data we found that principal component-based analyses were sufficient to control for ancestry and produce test statistics with appropriate distributions. After using a variety of gene-based tests and both meta- and mega-analysis, we found no new risk genes for ASD in this sample. Our results suggest that standard gene-based tests will require much larger samples of cases and controls before being effective for gene discovery, even for a disorder like ASD.
This study evaluates association of rare variants and autism spectrum disorders (ASD) in case and control samples sequenced by two centers. Before doing association analyses, we studied how to combine information across studies. We first harmonized the whole-exome sequence (WES) data, across centers, in terms of the distribution of rare variation. Key features included filtering called variants by fraction of missing data, read depth, and balance of alternative to reference reads. After filtering, the vast majority of variants calls from seven samples sequenced at both centers matched. We also evaluated whether one should combine summary statistics from data from each center (meta-analysis) or combine data and analyze it together (mega-analysis). For many gene-based tests, we showed that mega-analysis yields more power. After quality control of data from 1,039 ASD cases and 870 controls and a range of analyses, no gene showed exome-wide evidence of significant association. Our results comport with recent results demonstrating that hundreds of genes affect risk for ASD; they suggest that rare risk variants are scattered across these many genes, and thus larger samples will be required to identify those genes.