PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-10 (10)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Choice of transcripts and software has a large effect on variant annotation 
Genome Medicine  2014;6(3):26.
Background
Variant annotation is a crucial step in the analysis of genome sequencing data. Functional annotation results can have a strong influence on the ultimate conclusions of disease studies. Incorrect or incomplete annotations can cause researchers both to overlook potentially disease-relevant DNA variants and to dilute interesting variants in a pool of false positives. Researchers are aware of these issues in general, but the extent of the dependency of final results on the choice of transcripts and software used for annotation has not been quantified in detail.
Methods
This paper quantifies the extent of differences in annotation of 80 million variants from a whole-genome sequencing study. We compare results using the RefSeq and Ensembl transcript sets as the basis for variant annotation with the software Annovar, and also compare the results from two annotation software packages, Annovar and VEP (Ensembl’s Variant Effect Predictor), when using Ensembl transcripts.
Results
We found only 44% agreement in annotations for putative loss-of-function variants when using the RefSeq and Ensembl transcript sets as the basis for annotation with Annovar. The rate of matching annotations for loss-of-function and nonsynonymous variants combined was 79% and for all exonic variants it was 83%. When comparing results from Annovar and VEP using Ensembl transcripts, matching annotations were seen for only 65% of loss-of-function variants and 87% of all exonic variants, with splicing variants revealed as the category with the greatest discrepancy. Using these comparisons, we characterised the types of apparent errors made by Annovar and VEP and discuss their impact on the analysis of DNA variants in genome sequencing studies.
Conclusions
Variant annotation is not yet a solved problem. Choice of transcript set can have a large effect on the ultimate variant annotations obtained in a whole-genome sequencing study. Choice of annotation software can also have a substantial effect. The annotation step in the analysis of a genome sequencing study must therefore be considered carefully, and a conscious choice made as to which transcript set and software are used for annotation.
doi:10.1186/gm543
PMCID: PMC4062061  PMID: 24944579
2.  Clinical whole-genome sequencing in severe early-onset epilepsy reveals new genes and improves molecular diagnosis 
Human Molecular Genetics  2014;23(12):3200-3211.
In severe early-onset epilepsy, precise clinical and molecular genetic diagnosis is complex, as many metabolic and electro-physiological processes have been implicated in disease causation. The clinical phenotypes share many features such as complex seizure types and developmental delay. Molecular diagnosis has historically been confined to sequential testing of candidate genes known to be associated with specific sub-phenotypes, but the diagnostic yield of this approach can be low. We conducted whole-genome sequencing (WGS) on six patients with severe early-onset epilepsy who had previously been refractory to molecular diagnosis, and their parents. Four of these patients had a clinical diagnosis of Ohtahara Syndrome (OS) and two patients had severe non-syndromic early-onset epilepsy (NSEOE). In two OS cases, we found de novo non-synonymous mutations in the genes KCNQ2 and SCN2A. In a third OS case, WGS revealed paternal isodisomy for chromosome 9, leading to identification of the causal homozygous missense variant in KCNT1, which produced a substantial increase in potassium channel current. The fourth OS patient had a recessive mutation in PIGQ that led to exon skipping and defective glycophosphatidyl inositol biosynthesis. The two patients with NSEOE had likely pathogenic de novo mutations in CBL and CSNK1G1, respectively. Mutations in these genes were not found among 500 additional individuals with epilepsy. This work reveals two novel genes for OS, KCNT1 and PIGQ. It also uncovers unexpected genetic mechanisms and emphasizes the power of WGS as a clinical tool for making molecular diagnoses, particularly for highly heterogeneous disorders.
doi:10.1093/hmg/ddu030
PMCID: PMC4030775  PMID: 24463883
3.  Mutations of TCF12, encoding a basic-helix-loop-helix partner of TWIST1, are a frequent cause of coronal craniosynostosis 
Nature genetics  2013;45(3):304-307.
Craniosynostosis, the premature fusion of the cranial sutures, is a heterogeneous disorder with a prevalence of ~1 in 2,200 (refs. 1,2). A specific genetic etiology can be identified in ~21% of cases3, including mutations of TWIST1, which encodes a class II basic helix-loop-helix (bHLH) transcription factor, and causes Saethre-Chotzen syndrome, typically associated with coronal synostosis4-6. Starting with an exome sequencing screen, we identified 38 heterozygous TCF12 mutations in 347 samples from unrelated individuals with craniosynostosis. The mutations predominantly occurred in patients with coronal synostosis and accounted for 32% and 10% of subjects with bilateral and unilateral pathology, respectively. TCF12 encodes one of three class I E-proteins that heterodimerize with class II bHLH proteins such as TWIST1. We show that TCF12 and TWIST1 act synergistically in a transactivation assay, and that mice doubly heterozygous for loss-of-function mutations in Tcf12 and Twist1 exhibit severe coronal synostosis. Hence, the dosage of TCF12/TWIST1 heterodimers is critical for coronal suture development.
doi:10.1038/ng.2531
PMCID: PMC3647333  PMID: 23354436
4.  Recessive Mutations in SPTBN2 Implicate β-III Spectrin in Both Cognitive and Motor Development 
PLoS Genetics  2012;8(12):e1003074.
β-III spectrin is present in the brain and is known to be important in the function of the cerebellum. Heterozygous mutations in SPTBN2, the gene encoding β-III spectrin, cause Spinocerebellar Ataxia Type 5 (SCA5), an adult-onset, slowly progressive, autosomal-dominant pure cerebellar ataxia. SCA5 is sometimes known as “Lincoln ataxia,” because the largest known family is descended from relatives of the United States President Abraham Lincoln. Using targeted capture and next-generation sequencing, we identified a homozygous stop codon in SPTBN2 in a consanguineous family in which childhood developmental ataxia co-segregates with cognitive impairment. The cognitive impairment could result from mutations in a second gene, but further analysis using whole-genome sequencing combined with SNP array analysis did not reveal any evidence of other mutations. We also examined a mouse knockout of β-III spectrin in which ataxia and progressive degeneration of cerebellar Purkinje cells has been previously reported and found morphological abnormalities in neurons from prefrontal cortex and deficits in object recognition tasks, consistent with the human cognitive phenotype. These data provide the first evidence that β-III spectrin plays an important role in cortical brain development and cognition, in addition to its function in the cerebellum; and we conclude that cognitive impairment is an integral part of this novel recessive ataxic syndrome, Spectrin-associated Autosomal Recessive Cerebellar Ataxia type 1 (SPARCA1). In addition, the identification of SPARCA1 and normal heterozygous carriers of the stop codon in SPTBN2 provides insights into the mechanism of molecular dominance in SCA5 and demonstrates that the cell-specific repertoire of spectrin subunits underlies a novel group of disorders, the neuronal spectrinopathies, which includes SCA5, SPARCA1, and a form of West syndrome.
Author Summary
β-III spectrin is present in the brain and is known to be important in the function of the cerebellum. Mutations in β-III spectrin cause spinocerebellar ataxia type 5 (SCA5), sometimes called Lincoln ataxia because it was first described in the relatives of United States President Abraham Lincoln. This is generally an adult-onset progressive cerebellar disorder. Recessive mutations have not previously been described in any of the brain spectrins. We identified a homozygous mutation in SPTBN2, which causes a more severe disorder than SCA5, with a developmental cerebellar ataxia, which is present from childhood; in addition there is marked cognitive impairment. We call this novel condition SPARCA1 (Spectrin-associated Autosomal Recessive Cerebellar Ataxia type 1). This condition could be caused by two separate gene mutations; but we show, using a combination of genome-wide mapping, whole-genome sequencing, and detailed behavioural and neuropathological analysis of a β-III spectrin mouse knockout, that both the ataxia and cognitive impairment are caused by the recessive mutations in β-III spectrin. SPARCA1 is one of a family of neuronal spectrinopathies and illustrates the importance of spectrins in brain development and function.
doi:10.1371/journal.pgen.1003074
PMCID: PMC3516553  PMID: 23236289
5.  Projection of gene-protein networks to the functional space of the proteome and its application to analysis of organism complexity 
BMC Genomics  2010;11(Suppl 1):S4.
We consider the problem of biological complexity via a projection of protein-coding genes of complex organisms onto the functional space of the proteome. The latter can be defined as a set of all functions committed by proteins of an organism. Alternative splicing (AS) allows an organism to generate diverse mature RNA transcripts from a single mRNA strand and thus it could be one of the key mechanisms of increasing of functional complexity of the organism's proteome and a driving force of biological evolution. Thus, the projection of transcription units (TU) and alternative splice-variant (SV) forms onto proteome functional space could generate new types of relational networks (e.g. SV-protein function networks, SFN) and lead to discoveries of novel evolutionarily conservative functional modules. Such types of networks might provide new reliable characteristics of organism complexity and a better understanding of the evolutionary integration and plasticity of interconnection of genome-transcriptome-proteome functions.
Results
We use the InterPro and UniProt databases to attribute descriptive features (keywords) to protein sequences. UniProt database includes a controlled and curated vocabulary of specific descriptors or keywords. The keywords have been assigned to a protein sequence via conserved domains or via similarity with annotated sequences. Then we consider the unique combinations of keywords as the protein functional labels (FL), which characterize the biological functions of the given protein and construct the contingency tables and graphs providing the projections of transcription units (TU) and alternative splice-variants (SV) onto all FL of the proteome of a given organism. We constructed SFNs for organisms with different evolutionary history and levels of complexity, and performed detailed statistical parameterization of the networks.
Conclusions
The application of the algorithm to organisms with different evolutionary history and level of biological complexity (nematode, fruit fly, vertebrata) reveals that the parameters describing SFN correlate with the complexity of a given organism. Using statistical analysis of the links of the functional networks, we propose new features of evolution of protein function acquisition. We reveal a group of genes and corresponding functions, which could be attributed to an early conservative part of the cellular machinery essential for cell viability and survival. We identify and provide characteristics of functional switches in the polyform group of TUs in different organisms. Based on comparison of mouse and human SFNs, a role of alternative splicing as a necessary source of evolution towards more complex organisms is demonstrated.
The entire set of FL across many organisms could be used as a draft of the catalogue of the functional space of the proteome world.
doi:10.1186/1471-2164-11-S1-S4
PMCID: PMC2822532  PMID: 20158875
6.  New developments in the InterPro database 
Nucleic Acids Research  2007;35(Database issue):D224-D228.
InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (), and for download by anonymous FTP (). The InterProScan search tool is now also available via a web service at .
doi:10.1093/nar/gkl841
PMCID: PMC1899100  PMID: 17202162
7.  The Proteome Analysis database: a tool for the in silico analysis of whole proteomes 
Nucleic Acids Research  2003;31(1):414-417.
The Proteome Analysis database (http://www.ebi.ac.uk/proteome/) has been developed by the Sequence Database Group at EBI utilizing existing resources and providing comparative analysis of the predicted protein coding sequences of the complete genomes of bacteria, archeae and eukaryotes. Three main projects are used, InterPro, CluSTr and GO Slim, to give an overview on families, domains, sites, and functions of the proteins from each of the complete genomes. Complete proteome analysis is available for a total of 89 proteome sets. A specifically designed application enables InterPro proteome comparisons for any one proteome against any other one or more of the proteomes in the database.
PMCID: PMC165552  PMID: 12520037
8.  The InterPro Database, 2003 brings increased coverage and new features 
Nucleic Acids Research  2003;31(1):315-318.
InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases individually. The latest release of InterPro contains 5629 entries describing 4280 families, 1239 domains, 95 repeats and 15 post-translational modifications. Currently, the combined signatures in InterPro cover more than 74% of all proteins in SWISS-PROT and TrEMBL, an increase of nearly 15% since the inception of InterPro. New features of the database include improved searching capabilities and enhanced graphical user interfaces for visualisation of the data. The database is available via a webserver (http://www.ebi.ac.uk/interpro) and anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).
PMCID: PMC165493  PMID: 12520011
9.  Proteome Analysis Database: online application of InterPro and CluSTr for the functional classification of proteins in whole genomes 
Nucleic Acids Research  2001;29(1):44-48.
The SWISS-PROT group at EBI has developed the Proteome Analysis Database utilising existing resources and providing comparative analysis of the predicted protein coding sequences of the complete genomes of bacteria, archaea and eukaryotes (http://www.ebi.ac.uk/proteome/). The two main projects used, InterPro and CluSTr, give a new perspective on families, domains and sites and cover 31–67% (InterPro statistics) of the proteins from each of the complete genomes. CluSTr covers the three complete eukaryotic genomes and the incomplete human genome data. The Proteome Analysis Database is accompanied by a program that has been designed to carry out InterPro proteome comparisons for any one proteome against any other one or more of the proteomes in the database.
PMCID: PMC29822  PMID: 11125045
10.  Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones 
Imanishi, Tadashi | Itoh, Takeshi | Suzuki, Yutaka | O'Donovan, Claire | Fukuchi, Satoshi | Koyanagi, Kanako O | Barrero, Roberto A | Tamura, Takuro | Yamaguchi-Kabata, Yumi | Tanino, Motohiko | Yura, Kei | Miyazaki, Satoru | Ikeo, Kazuho | Homma, Keiichi | Kasprzyk, Arek | Nishikawa, Tetsuo | Hirakawa, Mika | Thierry-Mieg, Jean | Thierry-Mieg, Danielle | Ashurst, Jennifer | Jia, Libin | Nakao, Mitsuteru | Thomas, Michael A | Mulder, Nicola | Karavidopoulou, Youla | Jin, Lihua | Kim, Sangsoo | Yasuda, Tomohiro | Lenhard, Boris | Eveno, Eric | Suzuki, Yoshiyuki | Yamasaki, Chisato | Takeda, Jun-ichi | Gough, Craig | Hilton, Phillip | Fujii, Yasuyuki | Sakai, Hiroaki | Tanaka, Susumu | Amid, Clara | Bellgard, Matthew | de Fatima Bonaldo, Maria | Bono, Hidemasa | Bromberg, Susan K | Brookes, Anthony J | Bruford, Elspeth | Carninci, Piero | Chelala, Claude | Couillault, Christine | de Souza, Sandro J. | Debily, Marie-Anne | Devignes, Marie-Dominique | Dubchak, Inna | Endo, Toshinori | Estreicher, Anne | Eyras, Eduardo | Fukami-Kobayashi, Kaoru | R. Gopinath, Gopal | Graudens, Esther | Hahn, Yoonsoo | Han, Michael | Han, Ze-Guang | Hanada, Kousuke | Hanaoka, Hideki | Harada, Erimi | Hashimoto, Katsuyuki | Hinz, Ursula | Hirai, Momoki | Hishiki, Teruyoshi | Hopkinson, Ian | Imbeaud, Sandrine | Inoko, Hidetoshi | Kanapin, Alexander | Kaneko, Yayoi | Kasukawa, Takeya | Kelso, Janet | Kersey, Paul | Kikuno, Reiko | Kimura, Kouichi | Korn, Bernhard | Kuryshev, Vladimir | Makalowska, Izabela | Makino, Takashi | Mano, Shuhei | Mariage-Samson, Regine | Mashima, Jun | Matsuda, Hideo | Mewes, Hans-Werner | Minoshima, Shinsei | Nagai, Keiichi | Nagasaki, Hideki | Nagata, Naoki | Nigam, Rajni | Ogasawara, Osamu | Ohara, Osamu | Ohtsubo, Masafumi | Okada, Norihiro | Okido, Toshihisa | Oota, Satoshi | Ota, Motonori | Ota, Toshio | Otsuki, Tetsuji | Piatier-Tonneau, Dominique | Poustka, Annemarie | Ren, Shuang-Xi | Saitou, Naruya | Sakai, Katsunaga | Sakamoto, Shigetaka | Sakate, Ryuichi | Schupp, Ingo | Servant, Florence | Sherry, Stephen | Shiba, Rie | Shimizu, Nobuyoshi | Shimoyama, Mary | Simpson, Andrew J | Soares, Bento | Steward, Charles | Suwa, Makiko | Suzuki, Mami | Takahashi, Aiko | Tamiya, Gen | Tanaka, Hiroshi | Taylor, Todd | Terwilliger, Joseph D | Unneberg, Per | Veeramachaneni, Vamsi | Watanabe, Shinya | Wilming, Laurens | Yasuda, Norikazu | Yoo, Hyang-Sook | Stodolsky, Marvin | Makalowski, Wojciech | Go, Mitiko | Nakai, Kenta | Takagi, Toshihisa | Kanehisa, Minoru | Sakaki, Yoshiyuki | Quackenbush, John | Okazaki, Yasushi | Hayashizaki, Yoshihide | Hide, Winston | Chakraborty, Ranajit | Nishikawa, Ken | Sugawara, Hideaki | Tateno, Yoshio | Chen, Zhu | Oishi, Michio | Tonellato, Peter | Apweiler, Rolf | Okubo, Kousaku | Wagner, Lukas | Wiemann, Stefan | Strausberg, Robert L | Isogai, Takao | Auffray, Charles | Nomura, Nobuo | Gojobori, Takashi | Sugano, Sumio
PLoS Biology  2004;2(6):e162.
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology.
An international team has systematically validated and annotated just over 21,000 human genes using full-length cDNA, thereby providing a valuable new resource for the human genetics community
doi:10.1371/journal.pbio.0020162
PMCID: PMC393292  PMID: 15103394

Results 1-10 (10)