Mutations in the coding sequence of SOX9 cause campomelic dysplasia (CD), a disorder of skeletal development associated with 46,XY disorders of sex development (DSDs). Translocations, deletions and duplications within a ~2 Mb region upstream of SOX9 can recapitulate the CD-DSD phenotype fully or partially, suggesting the existence of an unusually large cis-regulatory control region. Pierre Robin sequence (PRS) is a craniofacial disorder that is frequently an endophenotype of CD and a locus for isolated PRS at ~1.2-1.5 Mb upstream of SOX9 has been previously reported. The craniofacial regulatory potential within this locus, and within the greater genomic domain surrounding SOX9, remains poorly defined. We report two novel deletions upstream of SOX9 in families with PRS, allowing refinement of the regions harbouring candidate craniofacial regulatory elements. In parallel, ChIP-Seq for p300 binding sites in mouse craniofacial tissue led to the identification of several novel craniofacial enhancers at the SOX9 locus, which were validated in transgenic reporter mice and zebrafish. Notably, some of the functionally validated elements fall within the PRS deletions. These studies suggest that multiple non-coding elements contribute to the craniofacial regulation of SOX9 expression, and that their disruption results in PRS.
SOX9; craniofacial; enhancer; Pierre Robin; long-range regulation; campomelic dysplasia
The GNE gene encodes the rate-limiting, bifunctional enzyme of sialic acid biosynthesis, UDP-N-acetylglucosamine 2-epimerase/N-acetylmannosamine kinase (GNE). Biallelic GNE mutations underlie GNE myopathy, an adult-onset progressive myopathy. GNE myopathy-associated GNE mutations are predominantly missense, resulting in reduced, but not absent, GNE enzyme activities. The exact pathomechanism of GNE myopathy remains unknown, but likely involves aberrant (muscle) sialylation. Here we summarize 154 reported and novel GNE variants associated with GNE myopathy, including 122 missense, 11 nonsense, 14 insertion/deletions and 7 intronic variants. All variants were deposited in the online GNE variation database (http://www.dmd.nl/nmdb2/home.php?select_db=GNE).
We report the predicted effects on protein function of all variants as well as the predicted effects on epimerase and/or kinase enzymatic activities of selected variants. By analyzing exome sequence databases, we identified three frequently occurring, unreported GNE missense variants/polymorphisms, important for future sequence interpretations. Based on allele frequencies, we estimate the world-wide prevalence of GNE myopathy to be ~ 4–21/1,000,000. This previously unrecognized high prevalence confirms suspicions that many patients may escape diagnosis. Awareness among physicians for GNE myopathy is essential for the identification of new patients, which is required for better understanding of the disorder’s pathomechanism and for the success of ongoing treatment trials.
distal myopathy with rimmed vacuoles (DMRV); GNE myopathy; hereditary inclusion body myopathy (HIBM); adult onset muscular dystrophy; N-acetylmannosamine (ManNAc); disease prevalence; sialic acid
Three causal genes for Idiopathic Basal Ganglia Calcification (IBGC) have been identified. Most recently, mutations in PDGFRB, encoding a member of the platelet-derived growth factor receptor family type β, and PDGFB, encoding PDGF-B, the specific ligand of PDGFRβ, were found implicating the PDGF-B/PDGFRβ pathway in abnormal brain calcification. In this study we aimed to identify and study mutations in PDGFRB and PDGFB in a series of 26 patients from the Mayo Clinic Florida Brain Bank with moderate to severe basal ganglia calcification (BCG) of unknown etiology. No mutations in PDGFB were found. However, we identified one mutation in PDGFRB, p.R695C located in the tyrosine kinase domain, in one BGC patient. We further studied the function of p.R695C mutant PDGFRβ and two previously reported mutants, p.L658P and p.R987W PDGFRβ in cell culture. We show that, in response to PDGF-BB stimulation, the p.L658P mutation completely suppresses PDGFRβ autophosphorylation whereas the p.R695C mutation results in partial loss of autophosphorylation. For the p.R987W mutation, our data suggest a different mechanism involving reduced protein levels. These genetic and functional studies provide the first insight into the pathogenic mechanisms associated with PDGFRB mutations and provide further support for a pathogenic role of PDGFRB mutations in BGC.
Basal Ganglia Calcification; BGC; IBGC; Fahr disease; PDGFRB; PDGFB
During the last years, next-generation sequencing (NGS) technologies have accelerated the detection of genetic variants resulting in the rapid discovery of new disease-associated genes. However, the wealth of variation data made available by NGS alone is not sufficient to understand the mechanisms underlying disease pathogenesis and manifestation. Multidisciplinary approaches combining sequence and clinical data with prior biological knowledge are needed to unravel the role of genetic variants in human health and disease. In this context, it is crucial that these data are linked, organized and made readily available through reliable online resources. The Swiss-Prot section of the Universal Protein Knowledgebase (UniProtKB/Swiss-Prot) provides the scientific community with a collection of information on protein functions, interactions, biological pathways, as well as human genetic diseases and variants, all manually reviewed by experts. In this article, we present an overview of the information content of UniProtKB/Swiss-Prot to show how this knowledgebase can support researchers in the elucidation of the mechanisms leading from a molecular defect to a disease phenotype.
UniProtKB/Swiss-Prot; database; manual curation; genetic variants; disease; functional annotation; controlled vocabulary
As whole genome sequencing (WGS) uncovers variants associated with rare and common diseases, an immediate challenge is to minimize false positive findings due to sequencing and variant calling errors. False positives can be reduced by combining results from orthogonal sequencing methods, but costly. Here we present variant filtering approaches using logistic regression (LR) and ensemble genotyping to minimize false positives without sacrificing sensitivity. We evaluated the methods using paired WGS datasets of an extended family prepared using two sequencing platforms and a validated set of variants in NA12878. Using LR or ensemble genotyping based filtering, false negative rates were significantly reduced by 1.1- to 17.8-fold at the same levels of false discovery rates (5.4% for heterozygous and 4.5% for homozygous SNVs; 30.0% for heterozygous and 18.7% for homozygous insertions; 25.2% for heterozygous and 16.6% for homozygous deletions) compared to the filtering based on genotype quality scores. Moreover, ensemble genotyping excluded > 98% (105,080 of 107,167) of false positives while retaining > 95% (897 of 937) of true positives in de novo mutation (DNM) discovery, and performed better than a consensus method using two sequencing platforms. Our proposed methods were effective in prioritizing phenotype-associated variants, and ensemble genotyping would be essential to minimize false positive DNM candidates.
whole-genome sequencing; ensemble genotyping; logistic regression; false positive; incidental finding; de novo mutation discovery
Genotyping efforts in hemophilia A (HA) populations in many countries have identified large numbers of unique mutations in the Factor VIII gene (F8). To assist HA researchers conducting genotyping analyses, we have developed a listing of F8 mutations including those listed in existing locus-specific databases as well as those identified in patient populations and reported in the literature. Each mutation was reviewed and uniquely identified using Human Genome Variation Society (HGVS) nomenclature standards for coding DNA and predicted protein changes as well as traditional nomenclature based on the mature, processed protein. Listings also include the associated hemophilia severity classified by International Society of Thrombosis and Haemostasis (ISTH) criteria, associations of the mutations with inhibitors, and reference information. The mutation list currently contains 2,537 unique mutations known to cause HA. HA severity caused by the mutation is available for 2,022 mutations (80%) and information on inhibitors is available for 1,816 mutations (72%). The CDC Hemophilia A Mutation Project (CHAMP) Mutation List is available at http://www.cdc.gov/hemophiliamutations for download and search and will be updated quarterly based on periodic literature reviews and submitted reports.
hemophilia A; mutation database; locus-specific database; F8 gene
Glycine substitutions in the conserved Gly-X-Y motif in the triple helical domain of collagen VI are the most commonly identified mutations in the collagen VI myopathies including Ullrich congenital muscular dystrophy, Bethlem myopathy, and intermediate phenotypes. We describe clinical and genetic characteristics of 97 individuals with glycine substitutions in the triple helical domain of COL6A1, COL6A2, or COL6A3 and add a review of 97 published cases, for a total of 194 cases. Clinical findings include severe, intermediate, and mild phenotypes even from patients with identical mutations. Intermediate phenotypes were most common, accounting for almost half of patients, emphasizing the importance of intermediate phenotypes to the overall phenotypic spectrum. Glycine substitutions in the triple helical domain are heavily clustered in a short segment N-terminal to the 17th Gly-X-Y triplet, where they are acting as dominants. The most severe cases are clustered in an even smaller region including Gly-X-Y triplets 10 to 15, accounting for only 5% of the triple helical domain. Our findings suggest that clustering of glycine substitutions in the N-terminal region of collagen VI is not based on features of the primary sequence. We hypothesize that this region may represent a functional domain within the triple helix.
collagen VI; Ullrich congenital muscular dystrophy; Bethlem myopathy; genotype-phenotype correlation
The discovery of novel disease-associated variations in genes is often a daunting task in highly heterogeneous disease classes. We seek a generalizable algorithm that integrates multiple publicly available genomic data sources in a machine-learning model for the prioritization of candidates identified in patients with retinal disease. To approach this problem, we generate a set of feature vectors from publicly available microarray, RNA-seq, and ChIP-seq datasets of biological relevance to retinal disease, to observe patterns in gene expression specificity among tissues of the body and the eye, in addition to photoreceptor-specific signals by the CRX transcription factor. Using these features, we describe a novel algorithm, positive and unlabeled learning for prioritization (PULP). This article compares several popular supervised learning techniques as the regression function for PULP. The results demonstrate a highly significant enrichment for previously characterized disease genes using a logistic regression method. Finally, a comparison of PULP with the popular gene prioritization tool ENDEAVOUR shows superior prioritization of retinal disease genes from previous studies.
machine learning; data integration; gene prioritization; prediction
The greatest interpretive challenge of modern medicine may be to functionally annotate the vast variation of human genomes. Demonstrating a proposed approach, we created a library of BRCA2 exon 27 shotgun-mutant plasmids including solitary and multiplex mutations to generate human knockin clones using homologous recombination. This 55-mutation, 13-clone syngeneic variance library (SyVaL) comprised severely affected clones having early-stop nonsense mutations, functionally hypomorphic clones having multiple missense mutations emphasizing the potential to identify and assess hypomorphic mutations in novel proteomic and epidemiologic studies, and neutral clones having multiple missense mutations. Efficient coverage of nonessential amino acids was provided by mutation multiplexing. Severe mutations were distinguished from hypomorphic or neutral changes by chemosensitivity assays (hypersensitivity to mitomycin C and acetaldehyde), by analysis of RAD51 focus formation, and by mitotic multipolarity. A multiplex unbiased approach of generating all-human SyVaLs in medically important genes, with random mutations in native genes, would provide databases of variants that could be functionally annotated without concerns arising from exogenous cDNA constructs or interspecies interactions, as a basis for subsequent proteomic domain mapping or clinical calibration if desired. Such gene-irrelevant approaches could be scaled up for multiple genes of clinical interest, providing distributable cellular libraries linked to public-shared functional databases.
BRCA2; shotgun library; hypomorphs; homologous recombination
Facioscapulohumeral dystrophy (FSHD) is one of the most prevalent muscular dystrophies. The majority of FSHD cases are linked to a decreased copy number of D4Z4 macrosatellite repeats on chromosome 4q (FSHD1). Less than 5% of FSHD cases have no repeat contraction (FSHD2), most of which are associated with mutations of SMCHD1. FSHD is associated with the transcriptional derepression of DUX4 encoded within the D4Z4 repeat, and SMCHD1 contributes to its regulation. We previously found that the loss of heterochromatin mark (i.e., histone H3 lysine 9 trimethylation (H3K9me3)) at D4Z4 is a hallmark of both FSHD1 and FSHD2. However, whether this loss contributes to DUX4 expression was unknown. Furthermore, additional D4Z4 homologs exist on multiple chromosomes, but they are largely uncharacterized and their relationship to 4q/10q D4Z4 was undetermined. We found that the suppression of H3K9me3 results in displacement of SMCHD1 at D4Z4 and increases DUX4 expression in myoblasts. The DUX4 open reading frame (ORF) is disrupted in D4Z4 homologs and their heterochromatin is unchanged in FSHD. The results indicate the significance of D4Z4 heterochromatin in DUX4 gene regulation and reveal the genetic and epigenetic distinction between 4q/10q D4Z4 and the non-4q/10q homologs, highlighting the special role of the 4q/10q D4Z4 chromatin and the DUX4 ORF in FSHD.
FSHD1; FSHD2; DUX4; H3K9me3; SMCHD1; D4Z4; heterochromatin
Primary ciliary dyskinesia (PCD) is an autosomal-recessive disorder characterized by impaired ciliary function that leads to subsequent clinical phenotypes such as chronic sinopulmonary disease. PCD is also a genetically heterogeneous disorder with many single gene mutations leading to similar clinical phenotypes. Here, we present a novel PCD causal gene, coiled-coil domain containing 151 (CCDC151), which has been shown to be essential in motile cilia of many animals and other vertebrates but its effects in humans was not observed until currently. We observed a novel nonsense mutation in a homozygous state in the CCDC151 gene (NM_145045.4:c.925G>T:p.[E309*]) in a clinically diagnosed PCD patient from a consanguineous family of Arabic ancestry. The variant was absent in 238 randomly selected individuals indicating that the variant is rare and likely not to be a founder mutation. Our finding also shows that given prior knowledge from model organisms, even a single whole-exome sequence can be sufficient to discover a novel causal gene.
primary ciliary dyskinesia; CCDC151; respiratory cilia; ciliopathy
Severe congenital neutropenia (SCN) is a rare hematopoietic disorder, with estimated incidence of 1 in 200,000 individuals of European descent, many cases of which are inherited in an autosomal dominant pattern. Despite the fact that several causal genes have been identified, the genetic basis for >30% of cases remains unknown. We report a five generation family segregating a novel single nucleotide variant (SNV) in TCIRG1. There is perfect co-segregation of the SNV with congenital neutropenia in this family; all 11 affected, but none of the unaffected, individuals carry this novel SNV. Western blot analysis show reduced levels of TCIRG1 protein in affected individuals, compared to healthy controls. Two unrelated patients with SCN, identified by independent investigators, are heterozygous for different, rare, highly conserved, coding variants in TCIRG1.
TCIRG1; Congenital neutropenia; SCN; V-ATPase
Laing early onset distal myopathy and myosin storage myopathy are caused by mutations of slow skeletal/β-cardiac myosin heavy chain encoded by the gene MYH7, as is a common form of familial hypertrophic/dilated cardiomyopathy. The mechanisms by which different phenotypes are produced by mutations in MYH7, even in the same region of the gene, are not known. To explore the clinical spectrum and pathobiology we screened the MYH7 gene in 88 patients from 21 previously unpublished families presenting with distal or generalised skeletal muscle weakness, with or without cardiac involvement. Twelve novel mutations have been identified in thirteen families. In one of these families the grandfather of the proband was found to be a mosaic for the MYH7 mutation. In eight cases de novo mutation appeared to have occurred, which was proven in three. The presenting complaint was footdrop, sometimes leading to delayed walking or tripping, in members of 17 families (81%), with other presentations including cardiomyopathy in infancy, generalised floppiness and scoliosis. Cardiac involvement as well as skeletal muscle weakness was identified in 9 of 21 families. Spinal involvement such as scoliosis or rigidity was identified in 12 (57%). This report widens the clinical and pathological phenotypes, and the genetics of MYH7 mutations leading to skeletal muscle diseases.
MYH7; Laing distal myopathy; MPD1
Mutations affecting skeletal muscle isoforms of the tropomyosin genes may cause nemaline myopathy, cap myopathy, core-rod myopathy, congenital fiber-type disproportion, distal arthrogryposes, and Escobar syndrome. We correlate the clinical picture of these diseases with novel (19) and previously reported (31) mutations of the TPM2 and TPM3 genes. Included are altogether 93 families: 53 with TPM2 mutations and 40 with TPM3 mutations. Thirty distinct pathogenic variants of TPM2 and 20 of TPM3 have been published or listed in the Leiden Open Variant Database (http://www.dmd.nl/). Most are heterozygous changes associated with autosomal-dominant disease. Patients with TPM2 mutations tended to present with milder symptoms than those with TPM3 mutations, DA being present only in the TPM2 group. Previous studies have shown that five of the mutations in TPM2 and one in TPM3 cause increased Ca2+ sensitivity resulting in a hypercontractile molecular phenotype. Patients with hypercontractile phenotype more often had contractures of the limb joints (18/19) and jaw (6/19) than those with nonhypercontractile ones (2/22 and 1/22), whereas patients with the non-hypercontractile molecular phenotype more often (19/22) had axial contractures than the hypercontractile group (7/19). Our in silico predictions show that most mutations affect tropomyosin–actin association or tropomyosin head-to-tail binding.
congenital myopathy; genotype–phenotype correlation; TPM2; TPM3; actin; hypercontractile phenotype
Central serous chorioretinopathy (CSC) is characterized by leakage of fluid from the choroid into the subretinal space and, consequently, loss of central vision. The disease is triggered by endogenous and exogenous corticosteroid imbalance and psychosocial stress and is much more prevalent in men. We studied the association of genetic variation in 44 genes from stress response and corticosteroid metabolism pathways with the CSC phenotype in two independent cohorts of 400 CSC cases and 1,400 matched controls. The expression of cadherin 5 (CDH5), the major cell–cell adhesion molecule in vascular endothelium, was downregulated by corticosteroids which may increase permeability of choroidal vasculature, leading to fluid leakage under the retina. We found a significant association of four common CDH5 SNPs with CSC in male patients in both cohorts. Two common intronic variants, rs7499886:A>G and rs1073584:C>T, exhibit strongly significant associations with CSC; P = 0.00012; odds ratio (OR) = 1.5; 95%CI [1.2;1.8], and P = 0.0014; OR = 0.70; 95%CI [0.57;0.87], respectively. A common haplotype was present in 25.4% male CSC cases and in 35.8% controls (P = 0.0002; OR = 0.61, 95% CI [0.47–0.79]). We propose that genetically predetermined variation in CDH5, when combined with triggering events such as corticosteroid treatment or severe hormonal imbalance, underlie a substantial proportion of CSC in the male population.
CDH5; Cadherin 5; central serous chorioretinopathy; genetic association; retinal disease
Hereditary hearing loss (HHL) is extremely heterogeneous. Over 70 genes have been identified to date, and with the advent of massively parallel sequencing, the pace of novel gene discovery has accelerated. In a family segregating progressive autosomal dominant non-syndromic hearing loss (ADNSHL) we used OtoSCOPE® to exclude mutations in known deafness genes and then performed segregation mapping and whole exome sequencing (WES) to identify a unique variant, p.Ser178Leu, in TBC1D24 that segregates with the hearing loss phenotype. TBC1D24 encodes a GTPase-activating protein expressed in the cochlea. Ser178 is highly conserved across vertebrates and its change is predicted to be damaging. Other variants in TBC1D24 have been associated with a panoply of clinical symptoms including autosomal recessive NSHL (ARNSHL), syndromic hearing impairment associated with onychodystrophy, osteodystrophy, mental retardation and seizures (DOORS syndrome), and a wide range of epileptic disorders.
TBC1D24, autosomal dominant; non-syndromic; hearing loss; hearing impairment; pleiotropy; OtoSCOPE ®
ATP-sensitive potassium (KATP) channels, composed of inward-rectifying potassium channel subunits (Kir6.1 and Kir6.2, encoded by KCNJ8 and KCNJ11, respectively) and regulatory sulfonylurea receptor (SUR1 and SUR2, encoded by ABCC8 and ABCC9, respectively), couple metabolism to excitability in multiple tissues. Mutations in ABCC9 cause Cantú syndrome, a distinct multi-organ disease, potentially via enhanced KATP channel activity. We screened KCNJ8 in an ABCC9 mutation-negative patient who also exhibited clinical hallmarks of Cantú syndrome (hypertrichosis, macrosomia, macrocephaly, coarse facial appearance, cardiomegaly, and skeletal abnormalities). We identified a de novo missense mutation encoding Kir6.1[p.Cys176Ser] in the patient. Kir6.1[p.Cys176Ser] channels exhibited markedly higher activity than wild-type channels, as a result of reduced ATP sensitivity, whether co-expressed with SUR1 or SUR2A subunits. Our results identify a novel causal gene in Cantú syndrome, but also demonstrate that the cardinal features of the disease result from gain of KATP channel function, not from Kir6-independent SUR2 function.
KCNJ8; Kir6.1; Cantú syndrome; KATP; hypertrichosis
Diamond-Blackfan Anemia (DBA) is characterized by a defect of erythroid progenitors and, clinically, by anemia and malformations. DBA exhibits an autosomal dominant pattern of inheritance with incomplete penetrance. Currently nine genes, all encoding ribosomal proteins (RP), have been found mutated in approximately 50% of patients. Experimental evidence supports the hypothesis that DBA is primarily the result of defective ribosome synthesis. By means of a large collaboration among six centers, we report here a mutation update that includes nine genes and 220 distinct mutations, 56 of which are new. The DBA Mutation Database now includes data from 355 patients. Of those where inheritance has been examined, 125 patients carry a de novo mutation and 72 an inherited mutation. Mutagenesis may be ascribed to slippage in 65.5% of indels, whereas CpG dinucleotides are involved in 23% of transitions. Using bioinformatic tools we show that gene conversion mechanism is not common in RP genes mutagenesis, notwithstanding the abundance of RP pseudogenes. Genotype–phenotype analysis reveals that malformations are more frequently associated with mutations in RPL5 and RPL11 than in the other genes. All currently reported DBA mutations together with their functional and clinical data are included in the DBA Mutation Database.
Diamond-Blackfan anemia; ribosomal protein; erythropoiesis; ribosome biogenesis
A high density comparative genomic hybridization array was designed to evaluate CNVs in the genomic region of six familial PD genes in 181 PD cases and 67 controls. No CNV was found in PARK7, ATP13A2, PINK1, and LRRK2. Intronic-only CNVs were found in SNCA and PARK2 but were not associated with PD risk. A whole-gene duplication of SNCA was found in one case. The allele frequency of PARK2 exonic CNV is significantly higher in cases than in controls (P = 0.02), higher in early-onset (AAO ≤ 40) than in late-onset cases (P = 0.001), and higher in familial than in sporadic cases (P = 0.005). Except for single exon 2 duplications, all PARK2 exonic CNVs have different breakpoints, even when the same exon(s) were involved. In conclusion, except for SNCA and PARK2, CNVs are not a major contributing mechanism for the familial PD genes examined. The majority of PARK2 exonic CNVs are not recurrent.
CNV; CGH; Parkinson; PARK2
Tumor-derived cell lines play an important role in the investigation of tumor biology and genetics. Across a wide array of studies, they have been tools of choice for the discovery of important genes involved in cancer and for the analysis of the cellular pathways that are impaired by diverse oncogenic events. They are also invaluable for screening novel anticancer drugs. The TP53 protein is a major component of multiple pathways that regulate cellular response to various types of stress. Therefore, TP53 status affects the phenotype of tumor cell lines profoundly and must be carefully ascertained for any experimental project. In the present review, we use the 2014 release of the UMD TP53 database to show that TP53 status is still controversial for numerous cell lines, including some widely used lines from the NCI-60 panel. Our analysis clearly confirms that, despite numerous warnings, the misidentification of cell lines is still present as a silent and neglected issue, and that extreme care must be taken when determining the status of p53, because errors may lead to disastrous experimental interpretations. A novel compendium gathering the TP53 status of 2,500 cell lines has been made available (http://p53.fr). A stand-alone application can be used to browse the database and extract pertinent information on cell lines and associated TP53 mutations. It will be updated regularly to minimize any scientific issues associated with the use of misidentified cell lines (http://p53.fr).
TP53; cancer cell line; cross-contamination; misidentification; recommendation
The wild-type human p53 (TP53) tumor suppressor can be posttranslationally modified at over 60 of its 393 residues. These modifications contribute to changes in TP53 stability and in its activity as a transcription factor in response to a wide variety of intrinsic and extrinsic stresses in part through regulation of protein-protein and protein-DNA interactions. The TP53 gene frequently is mutated in cancers, and in contrast to most other tumor suppressors the mutations are mostly missense often resulting in the accumulation of mutant protein, which may have novel or altered functions. Most mutant TP53s can be posttranslationally modified at the same residues as in wild-type TP53. Strikingly, however, codons for modified residues are rarely mutated in human tumors, suggesting that TP53 modifications are not essential for tumor suppression activity. Nevertheless, these modifications might alter mutant TP53 activity and contribute to a gain-of-function leading to increased metastasis and tumor progression. Furthermore, many of the signal transduction pathways that result in TP53 modifications are altered or disrupted in cancers. Understanding the signaling pathways that result in TP53 modification and the functions of these modifications in both wild-type TP53 and its many mutant forms may contribute to more effective cancer therapies.
TP53; p53; phosphorylation; acetylation; methylation; ubiquitylation; transcription
Methylmalonyl-CoA mutase (MUT) is an essential enzyme in propionate catabolism that requires adenosylcobalamin as a cofactor. Almost 250 inherited mutations in the MUT gene are known to cause the devastating disorder methylmalonic aciduria; however, the mechanism of dysfunction of these mutations, more than half of which are missense changes, has not been thoroughly investigated. Here, we examined 23 patient missense mutations covering a spectrum of exonic/structural regions, clinical phenotypes, and ethnic populations in order to determine their influence on protein stability, using two recombinant expression systems and a thermostability assay, and enzymatic function by measuring MUT activity and affinity for its cofactor and substrate. Our data stratify MUT missense mutations into categories of biochemical defects, including (1) reduced protein level due to misfolding, (2) increased thermolability, (3) impaired enzyme activity, and (4) reduced cofactor response in substrate turnover. We further demonstrate the stabilization of wild-type and thermolabile mutants by chemical chaperones in vitro and in bacterial cells. This in-depth mutation study illustrates the tools available for MUT enzyme characterization, guides future categorization of further missense mutations, and supports the development of alternative, chaperone-based therapy for patients not responding to current treatment.
methylmalonic aciduria; methylmalonyl-CoA mutase; MUT; cobalamin; thermolability
Next generation sequencing (NGS) technologies have been rapidly applied in biomedical and biological research since its advent only a few years ago, and they are expected to advance at an unprecedented pace in the following years. To provide the research community with a comprehensive NGS resource, we have developed the database Next Generation Sequencing Catalog (NGS Catalog, http://bioinfo.mc.vanderbilt.edu/NGS/index.html), a continually updated database that collects, curates and manages available human NGS data obtained from published literature. NGS Catalog deposits publication information of NGS studies and their mutation characteristics (SNVs, small insertions/deletions, copy number variations, and structural variants), as well as mutated genes and gene fusions detected by NGS. Other functions include user data upload, NGS general analysis pipelines, and NGS software. NGS Catalog is particularly useful for investigators who are new to NGS but would like to take advantage of these powerful technologies for their own research. Finally, based on the data deposited in NGS Catalog, we summarized features and findings from whole exome sequencing, whole genome sequencing, and transcriptome sequencing studies for human diseases or traits.
next generation sequencing (NGS); exome sequencing; whole genome sequencing; RNA sequencing; disease genome; gene fusion; database
Assessment of the functional consequences of variants near splice sites is a major challenge in the diagnostic laboratory. To address this issue, we created expression minigenes (EMGs) to determine the RNA and protein products generated by splice site variants (n = 10) implicated in cystic fibrosis (CF). Experimental results were compared with the splicing predictions of eight in silico tools. EMGs containing the full-length Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) coding sequence and flanking intron sequences generated wild-type transcript and fully processed protein in Human Embryonic Kidney (HEK293) and CF bronchial epithelial (CFBE41o-) cells. Quantification of variant induced aberrant mRNA isoforms was concordant using fragment analysis and pyrosequencing. The splicing patterns of c.1585−1G>A and c.2657+5G>A were comparable to those reported in primary cells from individuals bearing these variants. Bioinformatics predictions were consistent with experimental results for 9/10 variants (MES), 8/10 variants (NNSplice), and 7/10 variants (SSAT and Sroogle). Programs that estimate the consequences of mis-splicing predicted 11/16 (HSF and ASSEDA) and 10/16 (Fsplice and SplicePort) experimentally observed mRNA isoforms. EMGs provide a robust experimental approach for clinical interpretation of splice site variants and refinement of in silico tools.
expression minigene; splicing; CFTR; in silico tools
Whole genome sequencing (WGS) studies are uncovering disease-associated variants in both rare and non-rare diseases. Utilizing the next-generation sequencing for WGS requires a series of computational methods for alignment, variant detection, and annotation, and the accuracy and reproducibility of annotation results are essential for clinical implementation. However, annotating WGS with up to date genomic information is still challenging for biomedical researchers. Here we present one of the fastest and highly scalable annotation, filtering, and analysis pipeline –gNOME – to prioritize phenotype-associated variants while minimizing false positive findings. Intuitive graphical user interface of gNOME facilitates the selection of phenotype associated variants, and the result summaries are provided at variant-, gene-, and genome-levels. Moreover, the enrichment results of specific variants, genes, and gene sets between two groups or compared to population scale WGS datasets that is already integrated in the pipeline can help the interpretation. We found a small number of discordant results between annotation software tools in part due to different reporting strategies for the variants with complex impacts. Using two published whole exome datasets of uveal melanoma and bladder cancer, we demonstrated gNOME's accuracy of variant annotation and the enrichment of loss of function variants in known cancer pathways. gNOME web-server and source codes are freely available to the academic community.
whole genome sequences; variant annotation; disease gene discovery; analysis pipeline