Parkinson disease (PD) is a progressive neurodegenerative disease for which susceptibility is linked to genetic and environmental risk factors.
To identify genetic variants contributing to disease risk in familial PD.
DESIGN, SETTING, AND PARTICIPANTS
A 2-stage study design that included a discovery cohort of families with PD and a replication cohort of familial probands was used. In the discovery cohort, rare exonic variants that segregated in multiple affected individuals in a family and were predicted to be conserved or damaging were retained. Genes with retained variants were prioritized if expressed in the brain and located within PD-relevant pathways. Genes in which prioritized variants were observed in at least 4 families were selected as candidate genes for replication in the replication cohort. The setting was among individuals with familial PD enrolled from academic movement disorder specialty clinics across the United States. All participants had a family history of PD.
MAIN OUTCOMES AND MEASURES
Identification of genes containing rare, likely deleterious, genetic variants in individuals with familial PD using a 2-stage exome sequencing study design.
The 93 individuals from 32 families in the discovery cohort (49.5% [46 of 93] female) had a mean (SD) age at onset of 61.8 (10.0) years. The 49 individuals with familial PD in the replication cohort (32.6% [16 of 49] female) had a mean (SD) age at onset of 50.1 (15.7) years. Discovery cohort recruitment dates were 1999 to 2009, and replication cohort recruitment dates were 2003 to 2014. Data analysis dates were 2011 to 2015. Three genes containing a total of 13 rare and potentially damaging variants were prioritized in the discovery cohort. Two of these genes (TNK2 and TNR) also had rare variants that were predicted to be damaging in the replication cohort. All 9 variants identified in the 2 replicated genes in 12 families across the discovery and replication cohorts were confirmed via Sanger sequencing.
CONCLUSIONS AND RELEVANCE
TNK2 and TNR harbored rare, likely deleterious, variants in individuals having familial PD, with similar findings in an independent cohort. To our knowledge, these genes have not been previously associated with PD, although they have been linked to critical neuronal functions. Further studies are required to confirm a potential role for these genes in the pathogenesis of PD.
Progressive encephalopathy with edema, hypsarrhythmia and optic atrophy (PEHO) syndrome is a distinct neurodevelopmental disorder. Patients without optic nerve atrophy and brain imaging abnormalities but fulfilling other PEHO criteria are often described as a PEHO-like syndrome. The molecular bases of both clinically defined conditions remain unknown in spite of the widespread application of genome analyses in both clinic and research.
We enrolled two patients with a prior diagnosis of PEHO and two individuals with PEHO-like syndrome. All four individuals subsequently underwent whole-exome sequencing and comprehensive genomic analysis.
We identified disease-causing mutations in known genes associated with neurodevelopmental disorders including GNAO1 and CDKL5 in two of four individuals. One patient with PEHO syndrome and a de novo GNAO1 mutation was found to have an additional de novo mutation in HESX1 that is associated with optic atrophy.
We hypothesize that PEHO and PEHO-like syndrome may represent a severe end of the spectrum of the early-onset encephalopathies and, in some instances, its complex phenotype may result from an aggregated effect of mutations at two loci.
PEHO; encephalopathy; whole-exome sequencing; optic atrophy; neurodevelopmental disorder
Blood levels of amino acids are important biomarkers of disease and are influenced by synthesis, protein degradation, and gene–environment interactions. Whole genome sequence analysis of amino acid levels may establish a paradigm for analyzing quantitative risk factors.
In a discovery cohort of 1872 African Americans and a replication cohort of 1552 European Americans we sequenced exons and whole genomes and measured serum levels of 70 amino acids. Rare and low-frequency variants (minor allele frequency ≤5%) were analyzed by three types of aggregating motifs defined by gene exons, regulatory regions, or genome-wide sliding windows. Common variants (minor allele frequency >5%) were analyzed individually. Over all four analysis strategies, 14 gene–amino acid associations were identified and replicated. The 14 loci accounted for an average of 1.8% of the variance in amino acid levels, which ranged from 0.4 to 9.7%. Among the identified locus–amino acid pairs, four are novel and six have been reported to underlie known Mendelian conditions. These results suggest that there may be substantial genetic effects on amino acid levels in the general population that may underlie inborn errors of metabolism. We also identify a predicted promoter variant in AGA (the gene that encodes aspartylglucosaminidase) that is significantly associated with asparagine levels, with an effect that is independent of any observed coding variants.
These data provide insights into genetic influences on circulating amino acid levels by integrating -omic technologies in a multi-ethnic population. The results also help establish a paradigm for whole genome sequence analysis of quantitative traits.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-016-1106-x) contains supplementary material, which is available to authorized users.
Amino acids; Whole genome sequence; Metabolomics; Rare variants; Multi-ethnic
Rett Syndrome (RTT) is a neurodevelopmental disorder caused primarily by de novo mutations (DNMs) in MECP2 and sometimes in CDKL5 and FOXG1. However, some RTT cases lack mutations in these genes.
Twenty-two RTT cases without apparent MECP2, CDKL5, and FOXG1 mutations were subjected to both whole exome sequencing and single nucleotide polymorphism array-based copy number variant (CNV) analyses.
Three cases had MECP2 mutations initially missed by clinical testing. Of the remaining 19 cases, 17 (89.5%) had 29 other likely pathogenic intragenic mutations and/or CNVs (10 cases had two or more). Interestingly, 13 cases had mutations in a gene/region previously reported in other NDDs, thereby providing a potential diagnostic yield of 68.4%. These mutations were significantly enriched in chromatin regulators (corrected p = 0.0068) and moderately in postsynaptic cell membrane molecules (corrected p = 0.076) implicating glutamate receptor signaling.
The genetic etiology of RTT without MECP2, CDKL5, and FOXG1 mutations is heterogeneous, overlaps with other NDDs, and complex due to high mutation burden. Dysregulation of chromatin structure and abnormal excitatory synaptic signaling may form two common pathological bases of RTT.
Rett syndrome; chromatin regulation; glutamate signaling; exome sequencing; CNV
Relatively little is known about the genomic basis and evolution of wood-feeding in beetles. We undertook genome sequencing and annotation, gene expression assays, studies of plant cell wall degrading enzymes, and other functional and comparative studies of the Asian longhorned beetle, Anoplophora glabripennis, a globally significant invasive species capable of inflicting severe feeding damage on many important tree species. Complementary studies of genes encoding enzymes involved in digestion of woody plant tissues or detoxification of plant allelochemicals were undertaken with the genomes of 14 additional insects, including the newly sequenced emerald ash borer and bull-headed dung beetle.
The Asian longhorned beetle genome encodes a uniquely diverse arsenal of enzymes that can degrade the main polysaccharide networks in plant cell walls, detoxify plant allelochemicals, and otherwise facilitate feeding on woody plants. It has the metabolic plasticity needed to feed on diverse plant species, contributing to its highly invasive nature. Large expansions of chemosensory genes involved in the reception of pheromones and plant kairomones are consistent with the complexity of chemical cues it uses to find host plants and mates.
Amplification and functional divergence of genes associated with specialized feeding on plants, including genes originally obtained via horizontal gene transfer from fungi and bacteria, contributed to the addition, expansion, and enhancement of the metabolic repertoire of the Asian longhorned beetle, certain other phytophagous beetles, and to a lesser degree, other phytophagous insects. Our results thus begin to establish a genomic basis for the evolutionary success of beetles on plants.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-016-1088-8) contains supplementary material, which is available to authorized users.
Chemoperception; Detoxification; Glycoside hydrolase; Horizontal gene transfer; Phytophagy; Xylophagy
Development of the human nervous system involves complex interactions between fundamental cellular processes and requires a multitude of genes, many of which remain to be associated with human disease. We applied whole exome sequencing to 128 mostly consanguineous families with neurogenetic disorders that often included brain malformations. Rare variant analyses for both single nucleotide variant (SNV) and copy number variant (CNV) alleles allowed for identification of 45 novel variants in 43 known disease genes, 41 candidate genes, and CNVs in 10 families, with an overall potential molecular cause identified in >85% of families studied. Among the candidate genes identified, we found PRUNE, VARS, and DHX37 in multiple families, and homozygous loss of function variants in AGBL2, SLC18A2, SMARCA1, UBQLN1, and CPLX1. Neuroimaging and in silico analysis of functional and expression proximity between candidate and known disease genes allowed for further understanding of genetic networks underlying specific types of brain malformations.
Mitochondrial presequence proteases perform fundamental functions as they process about 70 % of all mitochondrial preproteins that are encoded in the nucleus and imported posttranslationally. The mitochondrial intermediate presequence protease MIP/Oct1, which carries out precursor processing, has not yet been established to have a role in human disease.
Whole exome sequencing was performed on four unrelated probands with left ventricular non-compaction (LVNC), developmental delay (DD), seizures, and severe hypotonia. Proposed pathogenic variants were confirmed by Sanger sequencing or array comparative genomic hybridization. Functional analysis of the identified MIP variants was performed using the model organism Saccharomyces cerevisiae as the protein and its functions are highly conserved from yeast to human.
Biallelic single nucleotide variants (SNVs) or copy number variants (CNVs) in MIPEP, which encodes MIP, were present in all four probands, three of whom had infantile/childhood death. Two patients had compound heterozygous SNVs (p.L582R/p.L71Q and p.E602*/p.L306F) and one patient from a consanguineous family had a homozygous SNV (p.K343E). The fourth patient, identified through the GeneMatcher tool, a part of the Matchmaker Exchange Project, was found to have inherited a paternal SNV (p.H512D) and a maternal CNV (1.4-Mb deletion of 13q12.12) that includes MIPEP. All amino acids affected in the patients’ missense variants are highly conserved from yeast to human and therefore S. cerevisiae was employed for functional analysis (for p.L71Q, p.L306F, and p.K343E). The mutations p.L339F (human p.L306F) and p.K376E (human p.K343E) resulted in a severe decrease of Oct1 protease activity and accumulation of non-processed Oct1 substrates and consequently impaired viability under respiratory growth conditions. The p.L83Q (human p.L71Q) failed to localize to the mitochondria.
Our findings reveal for the first time the role of the mitochondrial intermediate peptidase in human disease. Loss of MIP function results in a syndrome which consists of LVNC, DD, seizures, hypotonia, and cataracts. Our approach highlights the power of data exchange and the importance of an interrelationship between clinical and research efforts for disease gene discovery.
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-016-0360-6) contains supplementary material, which is available to authorized users.
Smith-Magenis syndrome (SMS) is a developmental disability/multiple congenital anomaly disorder resulting from haploinsufficiency of RAI1. It is characterized by distinctive facial features, brachydactyly, sleep disturbances, and stereotypic behaviors.
We investigated a cohort of 15 individuals with a clinical suspicion of SMS who showed neither deletion in the SMS critical region nor damaging variants in RAI1 using whole exome sequencing. A combination of network analysis (co-expression and biomedical text mining), transcriptomics, and circularized chromatin conformation capture (4C-seq) was applied to verify whether modified genes are part of the same disease network as known SMS-causing genes.
Potentially deleterious variants were identified in nine of these individuals using whole-exome sequencing. Eight of these changes affect KMT2D, ZEB2, MAP2K2, GLDC, CASK, MECP2, KDM5C, and POGZ, known to be associated with Kabuki syndrome 1, Mowat-Wilson syndrome, cardiofaciocutaneous syndrome, glycine encephalopathy, mental retardation and microcephaly with pontine and cerebellar hypoplasia, X-linked mental retardation 13, X-linked mental retardation Claes-Jensen type, and White-Sutton syndrome, respectively. The ninth individual carries a de novo variant in JAKMIP1, a regulator of neuronal translation that was recently found deleted in a patient with autism spectrum disorder. Analyses of co-expression and biomedical text mining suggest that these pathologies and SMS are part of the same disease network. Further support for this hypothesis was obtained from transcriptome profiling that showed that the expression levels of both Zeb2 and Map2k2 are perturbed in Rai1
–/– mice. As an orthogonal approach to potentially contributory disease gene variants, we used chromatin conformation capture to reveal chromatin contacts between RAI1 and the loci flanking ZEB2 and GLDC, as well as between RAI1 and human orthologs of the genes that show perturbed expression in our Rai1
–/– mouse model.
These holistic studies of RAI1 and its interactions allow insights into SMS and other disorders associated with intellectual disability and behavioral abnormalities. Our findings support a pan-genomic approach to the molecular diagnosis of a distinctive disorder.
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-016-0359-z) contains supplementary material, which is available to authorized users.
Diagnostic; Intellectual disability; Chromatin conformation; Text mining; Disease network
To comprehensively evaluate a European–American child with severe hypertension, whole-exome sequencing (WES) was performed on the child and parents, which identified causal variation of the proband's early-onset disease. The proband's hypertension was resistant to treatment, requiring a multiple drug regimen including amiloride, spironolactone, and hydrochlorothiazide. We suspected a monogenic form of hypertension because of the persistent hypokalemia with low plasma levels of renin and aldosterone. To address this, we focused on rare functional variants and indels, and performed gene-based tests incorporating linkage scores and allele frequency and filtered on deleterious functional mutations. Drawing upon clinical presentation, 27 genes were selected evidenced to cause monogenic hypertension and matched to the gene-based results. This resulted in the identification of a stop-gain mutation in an epithelial sodium channel (ENaC), SCNN1B, an established Liddle syndrome gene, shared by the child and her father. Interestingly, the father also harbored a missense mutation (p.Trp552Arg) in the α-subunit of the ENaC trimer, SCNN1A, possibly pointing to pseudohypoaldosteronism type I. This case is unique in that we present the early-onset disease and treatment response caused by a canonical stop-gain mutation (p.Arg566*) as well as ENaC digenic hits in the father, emphasizing the utility of WES informing precision medicine.
elevated diastolic blood pressure; elevated mean arterial pressure; elevated systolic blood pressure
Aortic dissection causes splitting of the aortic wall layers, allowing blood to enter a ‘false lumen’ (FL). For type B dissection, a significant predictor of patient outcomes is patency or thrombosis of the FL. Yet, no methods are currently available to assess the chances of FL thrombosis. In this study, we present a new computational model that is capable of predicting thrombus formation, growth and its effects on blood flow under physiological conditions. Predictions of thrombus formation and growth are based on fluid shear rate, residence time and platelet distribution, which are evaluated through convection–diffusion–reaction transport equations. The model is applied to a patient-specific type B dissection for which multiple follow-up scans are available. The predicted thrombus formation and growth patterns are in good qualitative agreement with clinical data, demonstrating the potential applicability of the model in predicting FL thrombosis for individual patients. Our results show that the extent and location of thrombosis are strongly influenced by aortic dissection geometry that may change over time. The high computational efficiency of our model makes it feasible for clinical applications. By predicting which aortic dissection patient is more likely to develop FL thrombosis, the model has great potential to be used as part of a clinical decision-making tool to assess the need for early endovascular intervention for individual dissection patients.
aortic dissection; thrombus formation and growth; blood flow; computational model
There are few better examples of the need for data sharing than in the rare
disease community, where patients, physicians, and researchers must search for “the
needle in a haystack” to uncover rare, novel causes of disease within the genome.
Impeding the pace of discovery has been the existence of many small siloed datasets within
individual research or clinical laboratory databases and/or disease-specific
organizations, hoping for serendipitous occasions when two distant investigators happen to
learn they have a rare phenotype in common and can “match” these cases to
build evidence for causality. However, serendipity has never proven to be a reliable or
scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to
provide a robust and systematic approach to rare disease gene discovery through the
creation of a federated network connecting databases of genotypes and rare phenotypes
using a common application programming interface (API). The core building blocks of the
MME have been defined and assembled. Three MME services have now been connected through
the API and are available for community use. Additional databases that support internal
matching are anticipated to join the MME network as it continues to grow.
Matchmaking; rare disease; genomic API; gene discovery; Matchmaker Exchange; GA4GH; IRDiRC
The Mediterranean fruit fly (medfly), Ceratitis capitata, is a major destructive insect pest due to its broad host range, which includes hundreds of fruits and vegetables. It exhibits a unique ability to invade and adapt to ecological niches throughout tropical and subtropical regions of the world, though medfly infestations have been prevented and controlled by the sterile insect technique (SIT) as part of integrated pest management programs (IPMs). The genetic analysis and manipulation of medfly has been subject to intensive study in an effort to improve SIT efficacy and other aspects of IPM control.
The 479 Mb medfly genome is sequenced from adult flies from lines inbred for 20 generations. A high-quality assembly is achieved having a contig N50 of 45.7 kb and scaffold N50 of 4.06 Mb. In-depth curation of more than 1800 messenger RNAs shows specific gene expansions that can be related to invasiveness and host adaptation, including gene families for chemoreception, toxin and insecticide metabolism, cuticle proteins, opsins, and aquaporins. We identify genes relevant to IPM control, including those required to improve SIT.
The medfly genome sequence provides critical insights into the biology of one of the most serious and widespread agricultural pests. This knowledge should significantly advance the means of controlling the size and invasive potential of medfly populations. Its close relationship to Drosophila, and other insect species important to agriculture and human health, will further comparative functional and structural studies of insect genomes that should broaden our understanding of gene family evolution.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-016-1049-2) contains supplementary material, which is available to authorized users.
Medfly genome; Tephritid genomics; Insect orthology; Gene family evolution; Chromosomal synteny; Insect invasiveness; Insect adaptation; Medfly integrated pest management (IPM)
Four patients from three Norwegian families presented with a common skin phenotype of warts, molluscum contagiosum, and dermatitis since early childhood, and various other immunological features. Warts are a common manifestation of human papilloma virus (HPV), but when they are overwhelming, disseminated and/or persistent, and presenting together with other immunological features, a primary immunodeficiency disease (PIDD) may be suspected.
Methods and results
The four patients were exome sequenced as part of a larger study for detecting genetic causes of primary immunodeficiencies. No disease‐causing variants were identified in known primary immunodeficiency genes or in other disease‐related OMIM genes. However, the same homozygous missense variant in CARMIL2 (also known as RLTPR) was identified in all four patients. In each family, the variant was located within a narrow region of homozygosity, representing a potential region of autozygosity. CARMIL2 is a protein of undetermined function. A role in T‐cell activation has been suggested and the mouse protein homolog (Rltpr) is essential for costimulation of T‐cell activation via CD28, and for the development of regulatory T cells. Immunophenotyping demonstrated reduced regulatory, CD4+ memory, and CD4+ follicular T cells in all four patients. In addition, they all seem to have a deficiency in IFN
γ ‐synthesis in CD4+ T cells and NK cells.
We report a novel primary immunodeficiency, and a differential molecular diagnosis to CXCR4‐,DOCK8‐,GATA2‐,MAGT1‐,MCM4‐,STK4‐,RHOH‐,TMC6‐, and TMC8‐related diseases. The specific variant may represent a Norwegian founder variant segregating on a population‐specific haplotype.
Absence of heterozygosity; CARMIL2; exome sequencing; founder variant; lymphocyte function; lymphocyte subpopulation; molluscum contagiosum; primary immunodeficiency; RLTPR; warts
CphA is a Zn2+-dependent metallo-β-lactamase that efficiently hydrolyzes only carbapenem antibiotics. To understand the sequence requirements for CphA function, single codon random mutant libraries were constructed for residues in and near the active site and mutants were selected for E. coli growth on increasing concentrations of imipenem, a carbapenem antibiotic. At high concentrations of imipenem that select for phenotypically wild-type mutants, the active-site residues exhibit stringent sequence requirements in that nearly all residues in positions that contact zinc, the substrate, or the catalytic water do not tolerate amino acid substitutions. In addition, at high imipenem concentrations a number of residues that do not directly contact zinc or substrate are also essential and do not tolerate substitutions. Biochemical analysis confirmed that amino acid substitutions at essential positions decreased the stability or catalytic activity of the CphA enzyme. Therefore, the CphA active - site is fragile to substitutions, suggesting active-site residues are optimized for imipenem hydrolysis. These results also suggest that resistance to inhibitors targeted to the CphA active site would be slow to develop because of the strong sequence constraints on function.
The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies.
We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms.
Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low coverage data can be accomplished in a scalable, cost effective and fast manner by using heterogeneous computing platforms without compromising on the quality of variants.
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-016-1211-6) contains supplementary material, which is available to authorized users.
WGS; SNV; Variant calling; Joint calling; Supercomputer; Cloud AWS; Scalable; Big data; Ensemble calling
Atrial fibrillation (AF) is a morbid and heritable arrhythmia. Over 35 genes have been reported to underlie AF, most of which were described in small candidate gene association studies. Replication remains lacking for most, and therefore the contribution of coding variation to AF susceptibility remains poorly understood. We examined whole exome sequencing data in a large community-based sample of 1,734 individuals with and 9,423 without AF from the Framingham Heart Study, Cardiovascular Health Study, Atherosclerosis Risk in Communities Study, and NHLBI-GO Exome Sequencing Project and meta-analyzed the results. We also examined whether genetic variation was enriched in suspected AF genes (N = 37) in AF cases versus controls. The mean age ranged from 59 to 73 years; 8,656 (78%) were of European ancestry. None of the 99,404 common variants evaluated was significantly associated after adjusting for multiple testing. Among the most significantly associated variants was a common (allele frequency = 86%) missense variant in SYNPO2L (rs3812629, p.Pro707Leu, [odds ratio 1.27, 95% confidence interval 1.13–1.43, P = 6.6x10-5]) which lies at a known AF susceptibility locus and is in linkage disequilibrium with a top marker from prior analyses at the locus. We did not observe significant associations between rare variants and AF in gene-based tests. Individuals with AF did not display any statistically significant enrichment for common or rare coding variation in previously implicated AF genes. In conclusion, we did not observe associations between coding genetic variants and AF, suggesting that large-effect coding variation is not the predominant mechanism underlying AF. A coding variant in SYNPO2L requires further evaluation to determine whether it is causally related to AF. Efforts to identify biologically meaningful coding variation underlying AF may require large sample sizes or populations enriched for large genetic effects.
Atrial fibrillation is a common and morbid cardiac arrhythmia. Atrial fibrillation is heritable, and numerous genome-wide susceptibility loci have been identified, predominantly in non-coding regions. Over 35 genes also have been implicated in atrial fibrillation pathogenesis mostly through prior smaller scale candidate gene association studies, which generally did not have robust replication to support the associations. Therefore, the role of coding variation in the biology of atrial fibrillation is unclear. We examined whole exome sequencing data from 1,734 individuals with and 9,423 without atrial fibrillation, and did not observe any significant associations between coding variation and the arrhythmia. Furthermore, we did not observe any enrichment for association in previously implicated atrial fibrillation genes. In aggregate, our findings suggest that large effect coding variation is unlikely to be a predominant mechanism of common forms of atrial fibrillation encountered in the community.
Loss-of-function variants, which often lead to greatly truncated protein product, influence human metabolite levels.
The metabolome is a collection of small molecules resulting from multiple cellular and biological processes that can act as biomarkers of disease, and African-Americans exhibit high levels of genetic diversity. Exome sequencing of a sample of deeply phenotyped African-Americans allowed us to analyze the effects of annotated loss-of-function (LoF) mutations on 308 serum metabolites measured by untargeted liquid and gas chromatography coupled with mass spectrometry. In an independent sample, we identified and replicated four genes harboring six LoF mutations that significantly affected five metabolites. These sites were related to a 19 to 45% difference in geometric mean metabolite levels, with an average effect size of 25%. We show that some of the affected metabolites are risk predictors or diagnostic biomarkers of disease and, using the principle of Mendelian randomization, are in the causal pathway of disease. For example, LoF mutations in SLCO1B1 elevate the levels of hexadecanedioate, a fatty acid significantly associated with increased blood pressure levels and risk of incident heart failure in both African-Americans and an independent sample of European-Americans. We show that SLCO1B1 LoF mutations significantly increase the risk of incident heart failure, thus implicating the metabolite in the causal pathway of disease. These results reveal new avenues into gene function and the understanding of disease etiology by integrating -omic technologies into a deeply phenotyped population study.
Metabolomics; metabolite; loss-of-function; rare genetic variant
Charcot-Marie-Tooth (CMT) disease is a clinically and genetically heterogeneous distal symmetric polyneuropathy. Whole-exome sequencing (WES) of 40 individuals from 37 unrelated families with CMT-like peripheral neuropathy refractory to molecular diagnosis identified apparent causal mutations in ~45% (17/37) of families. Three candidate disease genes are proposed, supported by a combination of genetic and in vivo studies. Aggregate analysis of mutation data revealed a significantly increased number of rare variants across 58 neuropathy associated genes in subjects versus controls; confirmed in a second ethnically discrete neuropathy cohort, suggesting mutation burden potentially contributes to phenotypic variability. Neuropathy genes shown to have highly penetrant Mendelizing variants (HMPVs) and implicated by burden in families were shown to interact genetically in a zebrafish assay exacerbating the phenotype established by the suppression of single genes. Our findings suggest that the combinatorial effect of rare variants contributes to disease burden and variable expressivity.
The ampulla of Vater is a complex cellular environment from which adenocarcinomas arise to form a group of histopathologically heterogenous tumors. To evaluate the molecular features of these tumors, 98 ampullary adenocarcinomas, were evaluated and compared to 44 distal bile duct and 18 duodenal adenocarcinomas. Genomic analyses revealed mutations in the WNT signaling pathway among half of the patients and in all three adenocarcinomas irrespective of their origin and histological morphology. These tumors were characterized by a high frequency of inactivating mutations of ELF3, a high rate of microsatellite instability, and common focal deletions and amplifications, suggesting common attributes in the molecular pathogenesis are at play in these tumors. The high frequency of WNT pathway activating mutation, coupled with small molecule inhibitors of beta catenin in clinical trials, suggests future treatment decisions for these patients may be guided by genomic analysis.
Focal cortical dysplasia (FCD), hemimegalencephaly (HMEG) and megalencephaly constitute a spectrum of malformations of cortical development with shared neuropathologic features. Collectively, these disorders are associated with significant childhood morbidity and mortality. FCD, in particular, represents the most frequent cause of intractable focal epilepsy in children.
To identify the underlying molecular etiology of FCD, HMEG, and diffuse megalencephaly.
Design, Setting and Participants
We performed whole exome sequencing (WES) on eight children with FCD or HMEG using standard depth (~50-60X) sequencing in peripheral samples (blood, saliva or skin) from the affected child and their parents, and deep (~150-180X) sequencing in affected brain tissue. We used both targeted sequencing and WES to screen a cohort of 93 children with molecularly unexplained diffuse or focal brain overgrowth (42 with FCD-HMEG, and 51 with diffuse megalencephaly). Histopathological and functional assays of PI3K-AKT-MTOR pathway activity in resected brain tissue and cultured neurons were performed to validate mutations.
Main Outcomes and Measures
Whole exome sequencing and targeted sequencing identified variants associated with this spectrum of developmental brain disorders.
We identified low-level mosaic mutations of MTOR in brain tissue in four children with FCD type 2a with alternative allele fractions ranging from 0.012–0.086. We also identified intermediate level mosaic mutation of MTOR (p.Thr1977Ile) in three unrelated children with diffuse megalencephaly and pigmentary mosaicism in skin that resembles hypomelanosis of Ito. Finally, we identified a constitutional de novo mutation of MTOR (p.Glu1799Lys) in three unrelated children with diffuse megalencephaly and intellectual disability. Molecular and functional analysis in two children with FCD type 2a from whom multiple affected brain tissue samples were available revealed a gradient of alternate allele fractions with an epicenter in the most epileptogenic area. When expressed in cultured neurons, all MTOR mutations identified here drive constitutive activation of mTORC1 and enlarged neuronal size, establishing a link between the MTOR mutations and neuronal hypertrophy found in patients. The mTORC1 inhibitor RAD001 ameliorated these phenotypes.
Conclusions and Relevance
Our data show that mutations of MTOR are associated with a spectrum of brain overgrowth phenotypes extending from FCD type 2 to diffuse megalencephaly, distinguished by different mutations and levels of mosaicism. These mutations are sufficient to cause cellular hypertrophy in cultured neurons. Our data also provide a compelling demonstration of the pattern of mosaicism in brain, and substantiate the link between mosaic mutations of MTOR and pigmentary mosaicism in skin.
megalencephaly; focal cortical dysplasia; MTOR; mosaicism; epilepsy
The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the last two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host-symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human-bed bug and symbiont-bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite.
Echinoderm genome sequences are a corpus of useful information about a clade of animals that serve as research models in fields ranging from marine ecology to cell and developmental biology. Genomic information from echinoids has contributed to insights into the gene interactions that drive the developmental process at the molecular level. Such insights often rely heavily on genomic information and the kinds of questions that can be asked thus depend on the quality of the sequence information. Here we describe the history of echinoderm genomic sequence assembly and present details about the quality of the data obtained. All of the sequence information discussed here is posted on the echinoderm information web system, Echinobase.org.
To investigate the genetic cause of nonobstructive azoospermia (NOA) in a consanguineous Turkish family through homozygosity mapping followed by targeted exon/whole-exome sequencing to identify genetic variations.
We sequenced the exomes of two siblings in a consanguineous family with NOA.
All variants passing filter criteria were validated with Sanger sequencing to confirm familial segregation and absence in the control population.
Main Outcome Measure
Discovery of a mutation that could potentially cause NOA
A novel non-synonymous mutation in neuronal PAS 2 domain (NPAS2) was identified in a consanguineous family from Turkey. This mutation in exon 14 (chr2: 101592000 C>G) of NPAS2 is likely a disease-causing mutation as it is predicted to be damaging, is a novel variant, and segregates with the disease. Family segregation of the variants showed the presence of homozygous mutation in the three brothers with NOA and heterozygous mutation in mother, one brother and one sister who were both fertile. The mutation is not found in the single nucleotide polymorphism (SNP) database, the 1000 Genomes Project, Baylor College of Medicine cohort of 500 Turkish patients (not a population specific polymorphism) or matching 50 fertile controls.
Using WES, we identified a novel homozygous mutation in NPAS2 as a likely disease-causing variant in a Turkish family diagnosed with NOA. Our data reinforce the clinical role of WES in the molecular diagnosis of highly heterogeneous genetic diseases which conventional genetic approaches have previously failed to conclude a molecular diagnosis.
male infertility; circadian rhythm; spermatogenesis; genome; consanguineous
Left ventricular noncompaction (LVNC) is an autosomal dominant, genetically heterogeneous cardiomyopathy with variable severity, which may co-occur with cardiac hypertrophy.
Methods and Results
Here, we generated whole exome sequence (WES) data from multiple members from five families with LVNC. In four out of five families, the candidate causative mutation segregates with disease in known LVNC genes MYH7 and TPM1. Subsequent sequencing of MYH7 in a larger LVNC cohort identified seven novel likely disease causing variants. In the fifth family, we identified a frameshift mutation in NNT, a nuclear encoded mitochondrial protein, not implicated previously in human cardiomyopathies. Resequencing of NNT in additional LVNC families identified a second likely pathogenic missense allele. Suppression of nnt in zebrafish caused early ventricular malformation and contractility defects, likely driven by altered cardiomyocyte proliferation. In vivo complementation studies showed that mutant human NNT failed to rescue nnt morpholino-induced heart dysfunction, indicating a probable haploinsufficiency mechanism.
Together, our data expand the genetic spectrum of LVNC and demonstrate how the intersection of WES with in vivo functional studies can accelerate the identification of genes that drive human genetic disorders.
noncompaction cardiomyopathy; genetics; human; genomics; left ventricular noncompaction
Neurodevelopment is orchestrated by a wide range of genes, and the genetic causes of neurodevelopmental disorders are thus heterogeneous. We applied whole exome sequencing (WES) for molecular diagnosis and in silico analysis to identify novel disease gene candidates in a cohort from Saudi Arabia with primarily Mendelian neurologic diseases.
We performed WES in 31 mostly consanguineous Arab families and analyzed both single nucleotide and copy number variants (CNVs) from WES data. Interaction/expression network and pathway analyses, as well as paralog studies were utilized to investigate potential pathogenicity and disease association of novel candidate genes. Additional cases for candidate genes were identified through the clinical WES database at Baylor Miraca Genetics Laboratories and GeneMatcher.
We found known pathogenic or novel variants in known disease genes with phenotypic expansion in 6 families, disease-associated CNVs in 2 families, and 12 novel disease gene candidates in 11 families, including KIF5B, GRM7, FOXP4, MLLT1, and KDM2B. Overall, a potential molecular diagnosis was provided by variants in known disease genes in 17 families (54.8 %) and by novel candidate disease genes in an additional 11 families, making the potential molecular diagnostic rate ~90 %.
Molecular diagnostic rate from WES is improved by exome-predicted CNVs. Novel candidate disease gene discovery is facilitated by paralog studies and through the use of informatics tools and available databases to identify additional evidence for pathogenicity.
Electronic supplementary material
The online version of this article (doi:10.1186/s12920-016-0208-3) contains supplementary material, which is available to authorized users.
Whole exome sequencing (WES); Copy Number Variants (CNV); Neurodevelopment; Developmental Delay/Intellectual Disability (DD/ID); GRM7