PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-11 (11)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Community gene annotation in practice 
Manual annotation of genomic data is extremely valuable to produce an accurate reference gene set but is expensive compared with automatic methods and so has been limited to model organisms. Annotation tools that have been developed at the Wellcome Trust Sanger Institute (WTSI, http://www.sanger.ac.uk/.) are being used to fill that gap, as they can be used remotely and so open up viable community annotation collaborations. We introduce the ‘Blessed’ annotator and ‘Gatekeeper’ approach to Community Annotation using the Otterlace/ZMap genome annotation tool. We also describe the strategies adopted for annotation consistency, quality control and viewing of the annotation.
Database URL: http://vega.sanger.ac.uk/index.html
doi:10.1093/database/bas009
PMCID: PMC3308165  PMID: 22434843
2.  The Vertebrate Genome Annotation browser 10 years on 
Nucleic Acids Research  2013;42(Database issue):D771-D779.
The Vertebrate Genome Annotation (VEGA) database (http://vega.sanger.ac.uk), initially designed as a community resource for browsing manual annotation of the human genome project, now contains five reference genomes (human, mouse, zebrafish, pig and rat). Its introduction pages have been redesigned to enable the user to easily navigate between whole genomes and smaller multi-species haplotypic regions of interest such as the major histocompatibility complex. The VEGA browser is unique in that annotation is updated via the Human And Vertebrate Analysis aNd Annotation (HAVANA) update track every 2 weeks, allowing single gene updates to be made publicly available to the research community quickly. The user can now access different haplotypic subregions more easily, such as those from the non-obese diabetic mouse, and display them in a more intuitive way using the comparative tools. We also highlight how the user can browse manually annotated updated patches from the Genome Reference Consortium (GRC).
doi:10.1093/nar/gkt1241
PMCID: PMC3964964  PMID: 24316575
3.  Current status and new features of the Consensus Coding Sequence database  
Nucleic Acids Research  2013;42(Database issue):D865-D872.
The Consensus Coding Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) is a collaborative effort to maintain a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assemblies by the National Center for Biotechnology Information (NCBI) and Ensembl genome annotation pipelines. Identical annotations that pass quality assurance tests are tracked with a stable identifier (CCDS ID). Members of the collaboration, who are from NCBI, the Wellcome Trust Sanger Institute and the University of California Santa Cruz, provide coordinated and continuous review of the dataset to ensure high-quality CCDS representations. We describe here the current status and recent growth in the CCDS dataset, as well as recent changes to the CCDS web and FTP sites. These changes include more explicit reporting about the NCBI and Ensembl annotation releases being compared, new search and display options, the addition of biologically descriptive information and our approach to representing genes for which support evidence is incomplete. We also present a summary of recent and future curation targets.
doi:10.1093/nar/gkt1059
PMCID: PMC3965069  PMID: 24217909
4.  The non-obese diabetic mouse sequence, annotation and variation resource: an aid for investigating type 1 diabetes 
Model organisms are becoming increasingly important for the study of complex diseases such as type 1 diabetes (T1D). The non-obese diabetic (NOD) mouse is an experimental model for T1D having been bred to develop the disease spontaneously in a process that is similar to humans. Genetic analysis of the NOD mouse has identified around 50 disease loci, which have the nomenclature Idd for insulin-dependent diabetes, distributed across at least 11 different chromosomes. In total, 21 Idd regions across 6 chromosomes, that are major contributors to T1D susceptibility or resistance, were selected for finished sequencing and annotation at the Wellcome Trust Sanger Institute. Here we describe the generation of 40.4 mega base-pairs of finished sequence from 289 bacterial artificial chromosomes for the NOD mouse. Manual annotation has identified 738 genes in the diabetes sensitive NOD mouse and 765 genes in homologous regions of the diabetes resistant C57BL/6J reference mouse across 19 candidate Idd regions. This has allowed us to call variation consequences between homologous exonic sequences for all annotated regions in the two mouse strains. We demonstrate the importance of this resource further by illustrating the technical difficulties that regions of inter-strain structural variation between the NOD mouse and the C57BL/6J reference mouse can cause for current next generation sequencing and assembly techniques. Furthermore, we have established that the variation rate in the Idd regions is 2.3 times higher than the mean found for the whole genome assembly for the NOD/ShiLtJ genome, which we suggest reflects the fact that positive selection for functional variation in immune genes is beneficial in regard to host defence. In summary, we provide an important resource, which aids the analysis of potential causative genes involved in T1D susceptibility.
Database URLs: http://www.sanger.ac.uk/resources/mouse/nod/; http://vega-previous.sanger.ac.uk/info/data/mouse_regions.html
doi:10.1093/database/bat032
PMCID: PMC3668384  PMID: 23729657
5.  Structural and functional annotation of the porcine immunome 
BMC Genomics  2013;14:332.
Background
The domestic pig is known as an excellent model for human immunology and the two species share many pathogens. Susceptibility to infectious disease is one of the major constraints on swine performance, yet the structure and function of genes comprising the pig immunome are not well-characterized. The completion of the pig genome provides the opportunity to annotate the pig immunome, and compare and contrast pig and human immune systems.
Results
The Immune Response Annotation Group (IRAG) used computational curation and manual annotation of the swine genome assembly 10.2 (Sscrofa10.2) to refine the currently available automated annotation of 1,369 immunity-related genes through sequence-based comparison to genes in other species. Within these genes, we annotated 3,472 transcripts. Annotation provided evidence for gene expansions in several immune response families, and identified artiodactyl-specific expansions in the cathelicidin and type 1 Interferon families. We found gene duplications for 18 genes, including 13 immune response genes and five non-immune response genes discovered in the annotation process. Manual annotation provided evidence for many new alternative splice variants and 8 gene duplications. Over 1,100 transcripts without porcine sequence evidence were detected using cross-species annotation. We used a functional approach to discover and accurately annotate porcine immune response genes. A co-expression clustering analysis of transcriptomic data from selected experimental infections or immune stimulations of blood, macrophages or lymph nodes identified a large cluster of genes that exhibited a correlated positive response upon infection across multiple pathogens or immune stimuli. Interestingly, this gene cluster (cluster 4) is enriched for known general human immune response genes, yet contains many un-annotated porcine genes. A phylogenetic analysis of the encoded proteins of cluster 4 genes showed that 15% exhibited an accelerated evolution as compared to 4.1% across the entire genome.
Conclusions
This extensive annotation dramatically extends the genome-based knowledge of the molecular genetics and structure of a major portion of the porcine immunome. Our complementary functional approach using co-expression during immune response has provided new putative immune response annotation for over 500 porcine genes. Our phylogenetic analysis of this core immunome cluster confirms rapid evolutionary change in this set of genes, and that, as in other species, such genes are important components of the pig’s adaptation to pathogen challenge over evolutionary time. These comprehensive and integrated analyses increase the value of the porcine genome sequence and provide important tools for global analyses and data-mining of the porcine immune response.
doi:10.1186/1471-2164-14-332
PMCID: PMC3658956  PMID: 23676093
Immune response; Porcine; Genome annotation; Co-expression network; Phylogenetic analysis; Accelerated evolution
6.  Sequencing and comparative analysis of the gorilla MHC genomic sequence 
Major histocompatibility complex (MHC) genes play a critical role in vertebrate immune response and because the MHC is linked to a significant number of auto-immune and other diseases it is of great medical interest. Here we describe the clone-based sequencing and subsequent annotation of the MHC region of the gorilla genome. Because the MHC is subject to extensive variation, both structural and sequence-wise, it is not readily amenable to study in whole genome shotgun sequence such as the recently published gorilla genome. The variation of the MHC also makes it of evolutionary interest and therefore we analyse the sequence in the context of human and chimpanzee. In our comparisons with human and re-annotated chimpanzee MHC sequence we find that gorilla has a trimodular RCCX cluster, versus the reference human bimodular cluster, and additional copies of Class I (pseudo)genes between Gogo-K and Gogo-A (the orthologues of HLA-K and -A). We also find that Gogo-H (and Patr-H) is coding versus the HLA-H pseudogene and, conversely, there is a Gogo-DQB2 pseudogene versus the HLA-DQB2 coding gene. Our analysis, which is freely available through the VEGA genome browser, provides the research community with a comprehensive dataset for comparative and evolutionary research of the MHC.
doi:10.1093/database/bat011
PMCID: PMC3626023  PMID: 23589541
7.  Analyses of pig genomes provide insight into porcine demography and evolution 
Groenen, Martien A. M. | Archibald, Alan L. | Uenishi, Hirohide | Tuggle, Christopher K. | Takeuchi, Yasuhiro | Rothschild, Max F. | Rogel-Gaillard, Claire | Park, Chankyu | Milan, Denis | Megens, Hendrik-Jan | Li, Shengting | Larkin, Denis M. | Kim, Heebal | Frantz, Laurent A. F. | Caccamo, Mario | Ahn, Hyeonju | Aken, Bronwen L. | Anselmo, Anna | Anthon, Christian | Auvil, Loretta | Badaoui, Bouabid | Beattie, Craig W. | Bendixen, Christian | Berman, Daniel | Blecha, Frank | Blomberg, Jonas | Bolund, Lars | Bosse, Mirte | Botti, Sara | Bujie, Zhan | Bystrom, Megan | Capitanu, Boris | Silva, Denise Carvalho | Chardon, Patrick | Chen, Celine | Cheng, Ryan | Choi, Sang-Haeng | Chow, William | Clark, Richard C. | Clee, Christopher | Crooijmans, Richard P. M. A. | Dawson, Harry D. | Dehais, Patrice | De Sapio, Fioravante | Dibbits, Bert | Drou, Nizar | Du, Zhi-Qiang | Eversole, Kellye | Fadista, João | Fairley, Susan | Faraut, Thomas | Faulkner, Geoffrey J. | Fowler, Katie E. | Fredholm, Merete | Fritz, Eric | Gilbert, James G. R. | Giuffra, Elisabetta | Gorodkin, Jan | Griffin, Darren K. | Harrow, Jennifer L. | Hayward, Alexander | Howe, Kerstin | Hu, Zhi-Liang | Humphray, Sean J. | Hunt, Toby | Hornshøj, Henrik | Jeon, Jin-Tae | Jern, Patric | Jones, Matthew | Jurka, Jerzy | Kanamori, Hiroyuki | Kapetanovic, Ronan | Kim, Jaebum | Kim, Jae-Hwan | Kim, Kyu-Won | Kim, Tae-Hun | Larson, Greger | Lee, Kyooyeol | Lee, Kyung-Tai | Leggett, Richard | Lewin, Harris A. | Li, Yingrui | Liu, Wansheng | Loveland, Jane E. | Lu, Yao | Lunney, Joan K. | Ma, Jian | Madsen, Ole | Mann, Katherine | Matthews, Lucy | McLaren, Stuart | Morozumi, Takeya | Murtaugh, Michael P. | Narayan, Jitendra | Nguyen, Dinh Truong | Ni, Peixiang | Oh, Song-Jung | Onteru, Suneel | Panitz, Frank | Park, Eung-Woo | Park, Hong-Seog | Pascal, Geraldine | Paudel, Yogesh | Perez-Enciso, Miguel | Ramirez-Gonzalez, Ricardo | Reecy, James M. | Zas, Sandra Rodriguez | Rohrer, Gary A. | Rund, Lauretta | Sang, Yongming | Schachtschneider, Kyle | Schraiber, Joshua G. | Schwartz, John | Scobie, Linda | Scott, Carol | Searle, Stephen | Servin, Bertrand | Southey, Bruce R. | Sperber, Goran | Stadler, Peter | Sweedler, Jonathan V. | Tafer, Hakim | Thomsen, Bo | Wali, Rashmi | Wang, Jian | Wang, Jun | White, Simon | Xu, Xun | Yerle, Martine | Zhang, Guojie | Zhang, Jianguo | Zhang, Jie | Zhao, Shuhong | Rogers, Jane | Churcher, Carol | Schook, Lawrence B.
Nature  2012;491(7424):393-398.
For 10,000 years pigs and humans have shared a close and complex relationship. From domestication to modern breeding practices, humans have shaped the genomes of domestic pigs. Here we present the assembly and analysis of the genome sequence of a female domestic Duroc pig (Sus scrofa) and a comparison with the genomes of wild and domestic pigs from Europe and Asia. Wild pigs emerged in South East Asia and subsequently spread across Eurasia. Our results reveal a deep phylogenetic split between European and Asian wild boars ~1 million years ago, and a selective sweep analysis indicates selection on genes involved in RNA processing and regulation. Genes associated with immune response and olfaction exhibit fast evolution. Pigs have the largest repertoire of functional olfactory receptor genes, reflecting the importance of smell in this scavenging animal. The pig genome sequence provides an important resource for further improvements of this important livestock species, and our identification of many putative disease-causing variants extends the potential of the pig as a biomedical model.
doi:10.1038/nature11622
PMCID: PMC3566564  PMID: 23151582
8.  Manual annotation and analysis of the defensin gene cluster in the C57BL/6J mouse reference genome 
BMC Genomics  2009;10:606.
Background
Host defense peptides are a critical component of the innate immune system. Human alpha- and beta-defensin genes are subject to copy number variation (CNV) and historically the organization of mouse alpha-defensin genes has been poorly defined. Here we present the first full manual genomic annotation of the mouse defensin region on Chromosome 8 of the reference strain C57BL/6J, and the analysis of the orthologous regions of the human and rat genomes. Problems were identified with the reference assemblies of all three genomes. Defensins have been studied for over two decades and their naming has become a critical issue due to incorrect identification of defensin genes derived from different mouse strains and the duplicated nature of this region.
Results
The defensin gene cluster region on mouse Chromosome 8 A2 contains 98 gene loci: 53 are likely active defensin genes and 22 defensin pseudogenes. Several TATA box motifs were found for human and mouse defensin genes that likely impact gene expression. Three novel defensin genes belonging to the Cryptdin Related Sequences (CRS) family were identified. All additional mouse defensin loci on Chromosomes 1, 2 and 14 were annotated and unusual splice variants identified. Comparison of the mouse alpha-defensins in the three main mouse reference gene sets Ensembl, Mouse Genome Informatics (MGI), and NCBI RefSeq reveals significant inconsistencies in annotation and nomenclature. We are collaborating with the Mouse Genome Nomenclature Committee (MGNC) to establish a standardized naming scheme for alpha-defensins.
Conclusions
Prior to this analysis, there was no reliable reference gene set available for the mouse strain C57BL/6J defensin genes, demonstrating that manual intervention is still critical for the annotation of complex gene families and heavily duplicated regions. Accurate gene annotation is facilitated by the annotation of pseudogenes and regulatory elements. Manually curated gene models will be incorporated into the Ensembl and Consensus Coding Sequence (CCDS) reference sets. Elucidation of the genomic structure of this complex gene cluster on the mouse reference sequence, and adoption of a clear and unambiguous naming scheme, will provide a valuable tool to support studies on the evolution, regulatory mechanisms and biological functions of defensins in vivo.
doi:10.1186/1471-2164-10-606
PMCID: PMC2807441  PMID: 20003482
9.  Dynamic instability of the major urinary protein gene family revealed by genomic and phenotypic comparisons between C57 and 129 strain mice 
Genome Biology  2008;9(5):R91.
Targeted sequencing, manual genome annotation, phylogenetic analysis and mass spectrometry were used to characterise major urinary proteins (MUPs) and the Mup clusters of two strains of inbred mice.
Background
The major urinary proteins (MUPs) of Mus musculus domesticus are deposited in urine in large quantities, where they bind and release pheromones and also provide an individual 'recognition signal' via their phenotypic polymorphism. Whilst important information about MUP functionality has been gained in recent years, the gene cluster is poorly studied in terms of structure, genic polymorphism and evolution.
Results
We combine targeted sequencing, manual genome annotation and phylogenetic analysis to compare the Mup clusters of C57BL/6J and 129 strains of mice. We describe organizational heterogeneity within both clusters: a central array of cassettes containing Mup genes highly similar at the protein level, flanked by regions containing Mup genes displaying significantly elevated divergence. Observed genomic rearrangements in all regions have likely been mediated by endogenous retroviral elements. Mup loci with coding sequences that differ between the strains are identified - including a gene/pseudogene pair - suggesting that these inbred lineages exhibit variation that exists in wild populations. We have characterized the distinct MUP profiles in the urine of both strains by mass spectrometry. The total MUP phenotype data is reconciled with our genomic sequence data, matching all proteins identified in urine to annotated genes.
Conclusion
Our observations indicate that the MUP phenotypic polymorphism observed in wild populations results from a combination of Mup gene turnover coupled with currently unidentified mechanisms regulating gene expression patterns. We propose that the structural heterogeneity described within the cluster reflects functional divergence within the Mup gene family.
doi:10.1186/gb-2008-9-5-r91
PMCID: PMC2441477  PMID: 18507838
10.  Variation analysis and gene annotation of eight MHC haplotypes: The MHC Haplotype Project 
Immunogenetics  2008;60(1):1-18.
The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine.
doi:10.1007/s00251-007-0262-2
PMCID: PMC2206249  PMID: 18193213
Major histocompatibility complex; Haplotype; Polymorphism; Retroelement; Genetic predisposition to disease; Population genetics
11.  Lessons learned from the initial sequencing of the pig genome: comparative analysis of an 8 Mb region of pig chromosome 17 
Genome Biology  2007;8(8):R168.
The sequencing, annotation and comparative analysis of an 8Mb region of pig chromosome 17 allows the coverage and quality of the pig genome sequencing project to be assessed
Background
We describe here the sequencing, annotation and comparative analysis of an 8 Mb region of pig chromosome 17, which provides a useful test region to assess coverage and quality for the pig genome sequencing project. We report our findings comparing the annotation of draft sequence assembled at different depths of coverage.
Results
Within this region we annotated 71 loci, of which 53 are orthologous to human known coding genes. When compared to the syntenic regions in human (20q13.13-q13.33) and mouse (chromosome 2, 167.5 Mb-178.3 Mb), this region was found to be highly conserved with respect to gene order. The most notable difference between the three species is the presence of a large expansion of zinc finger coding genes and pseudogenes on mouse chromosome 2 between Edn3 and Phactr3 that is absent from pig and human. All of our annotation has been made publicly available in the Vertebrate Genome Annotation browser, VEGA. We assessed the impact of coverage on sequence assembly across this region and found, as expected, that increased sequence depth resulted in fewer, longer contigs. One-third of our annotated loci could not be fully re-aligned back to the low coverage version of the sequence, principally because the transcripts are fragmented over several contigs.
Conclusion
We have demonstrated the considerable advantages of sequencing at increased read depths and discuss the implications that lower coverage sequence may have on subsequent comparative and functional studies, particularly those involving complex loci such as GNAS.
doi:10.1186/gb-2007-8-8-r168
PMCID: PMC2374978  PMID: 17705864

Results 1-11 (11)