PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-18 (18)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
more »
Document Types
1.  The zebrafish reference genome sequence and its relationship to the human genome 
Howe, Kerstin | Clark, Matthew D. | Torroja, Carlos F. | Torrance, James | Berthelot, Camille | Muffato, Matthieu | Collins, John E. | Humphray, Sean | McLaren, Karen | Matthews, Lucy | McLaren, Stuart | Sealy, Ian | Caccamo, Mario | Churcher, Carol | Scott, Carol | Barrett, Jeffrey C. | Koch, Romke | Rauch, Gerd-Jörg | White, Simon | Chow, William | Kilian, Britt | Quintais, Leonor T. | Guerra-Assunção, José A. | Zhou, Yi | Gu, Yong | Yen, Jennifer | Vogel, Jan-Hinnerk | Eyre, Tina | Redmond, Seth | Banerjee, Ruby | Chi, Jianxiang | Fu, Beiyuan | Langley, Elizabeth | Maguire, Sean F. | Laird, Gavin K. | Lloyd, David | Kenyon, Emma | Donaldson, Sarah | Sehra, Harminder | Almeida-King, Jeff | Loveland, Jane | Trevanion, Stephen | Jones, Matt | Quail, Mike | Willey, Dave | Hunt, Adrienne | Burton, John | Sims, Sarah | McLay, Kirsten | Plumb, Bob | Davis, Joy | Clee, Chris | Oliver, Karen | Clark, Richard | Riddle, Clare | Eliott, David | Threadgold, Glen | Harden, Glenn | Ware, Darren | Mortimer, Beverly | Kerry, Giselle | Heath, Paul | Phillimore, Benjamin | Tracey, Alan | Corby, Nicole | Dunn, Matthew | Johnson, Christopher | Wood, Jonathan | Clark, Susan | Pelan, Sarah | Griffiths, Guy | Smith, Michelle | Glithero, Rebecca | Howden, Philip | Barker, Nicholas | Stevens, Christopher | Harley, Joanna | Holt, Karen | Panagiotidis, Georgios | Lovell, Jamieson | Beasley, Helen | Henderson, Carl | Gordon, Daria | Auger, Katherine | Wright, Deborah | Collins, Joanna | Raisen, Claire | Dyer, Lauren | Leung, Kenric | Robertson, Lauren | Ambridge, Kirsty | Leongamornlert, Daniel | McGuire, Sarah | Gilderthorp, Ruth | Griffiths, Coline | Manthravadi, Deepa | Nichol, Sarah | Barker, Gary | Whitehead, Siobhan | Kay, Michael | Brown, Jacqueline | Murnane, Clare | Gray, Emma | Humphries, Matthew | Sycamore, Neil | Barker, Darren | Saunders, David | Wallis, Justene | Babbage, Anne | Hammond, Sian | Mashreghi-Mohammadi, Maryam | Barr, Lucy | Martin, Sancha | Wray, Paul | Ellington, Andrew | Matthews, Nicholas | Ellwood, Matthew | Woodmansey, Rebecca | Clark, Graham | Cooper, James | Tromans, Anthony | Grafham, Darren | Skuce, Carl | Pandian, Richard | Andrews, Robert | Harrison, Elliot | Kimberley, Andrew | Garnett, Jane | Fosker, Nigel | Hall, Rebekah | Garner, Patrick | Kelly, Daniel | Bird, Christine | Palmer, Sophie | Gehring, Ines | Berger, Andrea | Dooley, Christopher M. | Ersan-Ürün, Zübeyde | Eser, Cigdem | Geiger, Horst | Geisler, Maria | Karotki, Lena | Kirn, Anette | Konantz, Judith | Konantz, Martina | Oberländer, Martina | Rudolph-Geiger, Silke | Teucke, Mathias | Osoegawa, Kazutoyo | Zhu, Baoli | Rapp, Amanda | Widaa, Sara | Langford, Cordelia | Yang, Fengtang | Carter, Nigel P. | Harrow, Jennifer | Ning, Zemin | Herrero, Javier | Searle, Steve M. J. | Enright, Anton | Geisler, Robert | Plasterk, Ronald H. A. | Lee, Charles | Westerfield, Monte | de Jong, Pieter J. | Zon, Leonard I. | Postlethwait, John H. | Nüsslein-Volhard, Christiane | Hubbard, Tim J. P. | Crollius, Hugues Roest | Rogers, Jane | Stemple, Derek L.
Nature  2013;496(7446):498-503.
Zebrafish have become a popular organism for the study of vertebrate gene function1,2. The virtually transparent embryos of this species, and the ability to accelerate genetic studies by gene knockdown or overexpression, have led to the widespread use of zebrafish in the detailed investigation of vertebrate gene function and increasingly, the study of human genetic disease3–5. However, for effective modelling of human genetic disease it is important to understand the extent to which zebrafish genes and gene structures are related to orthologous human genes. To examine this, we generated a high-quality sequence assembly of the zebrafish genome, made up of an overlapping set of completely sequenced large-insert clones that were ordered and oriented using a high-resolution high-density meiotic map. Detailed automatic and manual annotation provides evidence of more than 26,000 protein-coding genes6, the largest gene set of any vertebrate so far sequenced. Comparison to the human reference genome shows that approximately 70% of human genes have at least one obvious zebrafish orthologue. In addition, the high quality of this genome assembly provides a clearer understanding of key genomic features such as a unique repeat content, a scarcity of pseudogenes, an enrichment of zebrafish-specific genes on chromosome 4 and chromosomal regions that influence sex determination.
doi:10.1038/nature12111
PMCID: PMC3703927  PMID: 23594743
2.  Ensembl 2014 
Nucleic Acids Research  2013;42(D1):D749-D755.
Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training.
doi:10.1093/nar/gkt1196
PMCID: PMC3964975  PMID: 24316576
3.  Ensembl 2013 
Nucleic Acids Research  2012;41(D1):D48-D55.
The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces.
doi:10.1093/nar/gks1236
PMCID: PMC3531136  PMID: 23203987
4.  Ensembl 2012 
Nucleic Acids Research  2011;40(D1):D84-D90.
The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project.
doi:10.1093/nar/gkr991
PMCID: PMC3245178  PMID: 22086963
5.  Dalliance: interactive genome viewing on the web 
Bioinformatics  2011;27(6):889-890.
Summary: Dalliance is a new genome viewer which offers a high level of interactivity while running within a web browser. All data is fetched using the established distributed annotation system (DAS) protocol, making it easy to customize the browser and add extra data.
Availability and Implementation: Dalliance runs entirely within your web browser, and relies on existing DAS server infrastructure. Browsers for several mammalian genomes are available at http://www.biodalliance.org/, and the use of DAS means you can add your own data to these browsers. In addition, the source code (Javascript) is available under the BSD license, and is straightforward to install on your own web server and embed within other documents.
Contact: thomas@biodalliance.org
doi:10.1093/bioinformatics/btr020
PMCID: PMC3051325  PMID: 21252075
6.  Ensembl 2011 
Nucleic Acids Research  2010;39(Database issue):D800-D806.
The Ensembl project (http://www.ensembl.org) seeks to enable genomic science by providing high quality, integrated annotation on chordate and selected eukaryotic genomes within a consistent and accessible infrastructure. All supported species include comprehensive, evidence-based gene annotations and a selected set of genomes includes additional data focused on variation, comparative, evolutionary, functional and regulatory annotation. The most advanced resources are provided for key species including human, mouse, rat and zebrafish reflecting the popularity and importance of these species in biomedical research. As of Ensembl release 59 (August 2010), 56 species are supported of which 5 have been added in the past year. Since our previous report, we have substantially improved the presentation and integration of both data of disease relevance and the regulatory state of different cell types.
doi:10.1093/nar/gkq1064
PMCID: PMC3013672  PMID: 21045057
7.  iMotifs: an integrated sequence motif visualization and analysis environment 
Bioinformatics  2010;26(6):843-844.
Motivation: Short sequence motifs are an important class of models in molecular biology, used most commonly for describing transcription factor binding site specificity patterns. High-throughput methods have been recently developed for detecting regulatory factor binding sites in vivo and in vitro and consequently high-quality binding site motif data are becoming available for increasing number of organisms and regulatory factors. Development of intuitive tools for the study of sequence motifs is therefore important.
iMotifs is a graphical motif analysis environment that allows visualization of annotated sequence motifs and scored motif hits in sequences. It also offers motif inference with the sensitive NestedMICA algorithm, as well as overrepresentation and pairwise motif matching capabilities. All of the analysis functionality is provided without the need to convert between file formats or learn different command line interfaces.
The application includes a bundled and graphically integrated version of the NestedMICA motif inference suite that has no outside dependencies. Problems associated with local deployment of software are therefore avoided.
Availability: iMotifs is licensed with the GNU Lesser General Public License v2.0 (LGPL 2.0). The software and its source is available at http://wiki.github.com/mz2/imotifs and can be run on Mac OS X Leopard (Intel/PowerPC). We also provide a cross-platform (Linux, OS X, Windows) LGPL 2.0 licensed library libxms for the Perl, Ruby, R and Objective-C programming languages for input and output of XMS formatted annotated sequence motif set files.
Contact: matias.piipari@gmail.com; imotifs@googlegroups.com
doi:10.1093/bioinformatics/btq026
PMCID: PMC2832821  PMID: 20106815
8.  Ensembl’s 10th year 
Nucleic Acids Research  2009;38(Database issue):D557-D562.
Ensembl (http://www.ensembl.org) integrates genomic information for a comprehensive set of chordate genomes with a particular focus on resources for human, mouse, rat, zebrafish and other high-value sequenced genomes. We provide complete gene annotations for all supported species in addition to specific resources that target genome variation, function and evolution. Ensembl data is accessible in a variety of formats including via our genome browser, API and BioMart. This year marks the tenth anniversary of Ensembl and in that time the project has grown with advances in genome technology. As of release 56 (September 2009), Ensembl supports 51 species including marmoset, pig, zebra finch, lizard, gorilla and wallaby, which were added in the past year. Major additions and improvements to Ensembl since our previous report include the incorporation of the human GRCh37 assembly, enhanced visualisation and data-mining options for the Ensembl regulatory features and continued development of our software infrastructure.
doi:10.1093/nar/gkp972
PMCID: PMC2808936  PMID: 19906699
9.  CASP5 Target Classification 
Proteins  2003;53(Suppl 6):340-351.
This report summarizes the Critical Assessment of Protein Structure Prediction (CASP5) target proteins, which included 67 experimental models submitted from various structural genomics efforts and independent research groups. Throughout this special issue, CASP5 targets are referred to with the identification numbers T0129–T0195. Several of these targets were excluded from the assessment for various reasons: T0164 and T0166 were cancelled by the organizers; T0131, T0144, T0158, T0163, T0171, T0175, and T0180 were not available in time; T0145 was “natively unfolded”; the T0139 structure became available before the target expired; and T0194 was solved for a different sequence than the one submitted. Table I outlines the sequence and structural information available for CASP5 proteins in the context of existing folds and evolutionary relationships. This information provided the basis for a domain-based classification of the target structures into three assessment categories: comparative modeling (CM), fold recognition (FR), and new fold (NF). The FR category was further subdivided into homologues [FR(H)] and analogs [FR(A)] based on evolutionary considerations, and the overlap between assessment categories was classified as CM/FR(H) and FR(A)/NF. CASP5 domains are illustrated in Figure 1. Examples of nontrivial links between CASP5 target domains and existing structures that support our classifications are provided.
doi:10.1002/prot.10555
PMCID: PMC2656935  PMID: 14579323
10.  Adding Some SPICE to DAS 
Bioinformatics (Oxford, England)  2005;21(Suppl 2):ii40-ii41.
Summary
The distributed annotation system (DAS) defines a communication protocol used to exchange biological annotations. It is motivated by the idea that annotations should not be provided by single centralized databases but instead be spread over multiple sites. Data distribution, performed by DAS servers, is separated from visualization, which is carried out by DAS clients. The original DAS protocol was designed to serve annotation of genomic sequences. We have extended the protocol to be applicable to macromolecular structures. Here we present SPICE, a new DAS client that can be used to visualize protein sequence and structure annotations.
Availability
http://www.efamily.org.uk/software/dasclients/spice/
doi:10.1093/bioinformatics/bti1106
PMCID: PMC2656757  PMID: 16204122
11.  A Bayesian deconvolution strategy for immunoprecipitation-based DNA methylome analysis 
Nature biotechnology  2008;26(7):779-785.
DNA methylation is an indispensible epigenetic modification of mammalian genomes. Consequently there is great interest in strategies for genome-wide/whole-genome DNA methylation analysis, and immunoprecipitation-based methods have proven to be a powerful option. Such methods are rapidly shifting the bottleneck from data generation to data analysis, necessitating the development of better analytical tools. Until now, a major analytical difficulty associated with immunoprecipitation-based DNA methylation profiling has been the inability to estimate absolute methylation levels. Here we report the development of a novel cross-platform algorithm – Bayesian Tool for Methylation Analysis (Batman) – for analyzing Methylated DNA Immunoprecipitation (MeDIP) profiles generated using arrays (MeDIP-chip) or next-generation sequencing (MeDIP-seq). The latter is an approach we have developed to elucidate the first high-resolution whole-genome DNA methylation profile (DNA methylome) of any mammalian genome. MeDIP-seq/MeDIP-chip combined with Batman represent robust, quantitative, and cost-effective functional genomic strategies for elucidating the function of DNA methylation.
doi:10.1038/nbt1414
PMCID: PMC2644410  PMID: 18612301
12.  Data growth and its impact on the SCOP database: new developments 
Nucleic Acids Research  2007;36(Database issue):D419-D425.
The Structural Classification of Proteins (SCOP) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. The SCOP hierarchy comprises the following levels: Species, Protein, Family, Superfamily, Fold and Class. While keeping the original classification scheme intact, we have changed the production of SCOP in order to cope with a rapid growth of new structural data and to facilitate the discovery of new protein relationships. We describe ongoing developments and new features implemented in SCOP. A new update protocol supports batch classification of new protein structures by their detected relationships at Family and Superfamily levels in contrast to our previous sequential handling of new structural data by release date. We introduce pre-SCOP, a preview of the SCOP developmental version that enables earlier access to the information on new relationships. We also discuss the impact of worldwide Structural Genomics initiatives, which are producing new protein structures at an increasing rate, on the rates of discovery and growth of protein families and superfamilies. SCOP can be accessed at http://scop.mrc-lmb.cam.ac.uk/scop.
doi:10.1093/nar/gkm993
PMCID: PMC2238974  PMID: 18000004
13.  Large-Scale Discovery of Promoter Motifs in Drosophila melanogaster 
A key step in understanding gene regulation is to identify the repertoire of transcription factor binding motifs (TFBMs) that form the building blocks of promoters and other regulatory elements. Identifying these experimentally is very laborious, and the number of TFBMs discovered remains relatively small, especially when compared with the hundreds of transcription factor genes predicted in metazoan genomes. We have used a recently developed statistical motif discovery approach, NestedMICA, to detect candidate TFBMs from a large set of Drosophila melanogaster promoter regions. Of the 120 motifs inferred in our initial analysis, 25 were statistically significant matches to previously reported motifs, while 87 appeared to be novel. Analysis of sequence conservation and motif positioning suggested that the great majority of these discovered motifs are predictive of functional elements in the genome. Many motifs showed associations with specific patterns of gene expression in the D. melanogaster embryo, and we were able to obtain confident annotation of expression patterns for 25 of our motifs, including eight of the novel motifs. The motifs are available through Tiffin, a new database of DNA sequence motifs. We have discovered many new motifs that are overrepresented in D. melanogaster promoter regions, and offer several independent lines of evidence that these are novel TFBMs. Our motif dictionary provides a solid foundation for further investigation of regulatory elements in Drosophila, and demonstrates techniques that should be applicable in other species. We suggest that further improvements in computational motif discovery should narrow the gap between the set of known motifs and the total number of transcription factors in metazoan genomes.
Author Summary
In contrast to the genomic sequences that encode proteins, little is known about the regulatory elements that instruct the cell as to when and where a given gene should be active. Regulatory elements are thought to consist of clusters of short DNA words (motifs), each of which acts as a binding site for sequence-specific DNA binding protein. Thus, building a comprehensive dictionary of such motifs is an important step towards a broader understanding of gene regulation. Using the recently published NestedMICA method for detecting overrepresented motifs in a set of sequences, we build a dictionary of 120 motifs from regulatory sequences in the fruitfly genome, 87 of which are novel. Analysis of positional biases, conservation across species, and association with specific patterns of gene expression in fruitfly embryos suggest that the great majority of these newly discovered motifs represent functional regulatory elements. In addition to providing an initial motif dictionary for one of the most intensively studied model organisms, this work provides an analytical framework for the comprehensive discovery of regulatory motifs in complex animal genomes.
doi:10.1371/journal.pcbi.0030007
PMCID: PMC1779301  PMID: 17238282
14.  SISYPHUS—structural alignments for proteins with non-trivial relationships 
Nucleic Acids Research  2006;35(Database issue):D253-D259.
With the increasing amount of structural data, the number of homologous protein structures bearing topological irregularities is steadily growing. These include proteins with circular permutations, segment-swapping, context-dependent folding or chameleon sequences that can adopt alternative secondary structures. Their non-trivial structural relationships are readily identified during expert analysis but their automatic identification using the existing computational tools still remains difficult or impossible. Such non-trivial cases of protein relationships are known to pose a problem to multiple alignment algorithms and to impede comparative modeling studies. They support a new emerging concept of evolutionary changeable protein fold, which creates practical difficulties for the hierarchical classifications of protein structures.To facilitate the understanding of, and to provide a comprehensive annotation of proteins with such non-trivial structural relationships we have created SISYPHUS ([Σισυϕος]—in Greek crafty), a compendium to the SCOP database. The SISYPHUS database contains a collection of manually curated structural alignments and their inter-relationships. The multiple alignments are constructed for protein structural regions that range from oligomeric biological units, or individual domains to fragments of different size. The SISYPHUS multiple alignments are displayed with SPICE, a browser that provides an integrated view of protein sequences, structures and their annotations. The database is available from .
doi:10.1093/nar/gkl746
PMCID: PMC1635320  PMID: 17068077
15.  NestedMICA: sensitive inference of over-represented motifs in nucleic acid sequence 
Nucleic Acids Research  2005;33(5):1445-1453.
NestedMICA is a new, scalable, pattern-discovery system for finding transcription factor binding sites and similar motifs in biological sequences. Like several previous methods, NestedMICA tackles this problem by optimizing a probabilistic mixture model to fit a set of sequences. However, the use of a newly developed inference strategy called Nested Sampling means NestedMICA is able to find optimal solutions without the need for a problematic initialization or seeding step. We investigate the performance of NestedMICA in a range scenario, on synthetic data and a well-characterized set of muscle regulatory regions, and compare it with the popular MEME program. We show that the new method is significantly more sensitive than MEME: in one case, it successfully extracted a target motif from background sequence four times longer than could be handled by the existing program. It also performs robustly on synthetic sequences containing multiple significant motifs. When tested on a real set of regulatory sequences, NestedMICA produced motifs which were good predictors for all five abundant classes of annotated binding sites.
doi:10.1093/nar/gki282
PMCID: PMC1064142  PMID: 15760844
16.  SCOP database in 2004: refinements integrate structure and sequence family data 
Nucleic Acids Research  2004;32(Database issue):D226-D229.
The Structural Classification of Proteins (SCOP) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. Protein domains in SCOP are hierarchically classified into families, superfamilies, folds and classes. The continual accumulation of sequence and structural data allows more rigorous analysis and provides important information for understanding the protein world and its evolutionary repertoire. SCOP participates in a project that aims to rationalize and integrate the data on proteins held in several sequence and structure databases. As part of this project, starting with release 1.63, we have initiated a refinement of the SCOP classification, which introduces a number of changes mostly at the levels below superfamily. The pending SCOP reclassification will be carried out gradually through a number of future releases. In addition to the expanded set of static links to external resources, available at the level of domain entries, we have started modernization of the interface capabilities of SCOP allowing more dynamic links with other databases. SCOP can be accessed at http://scop.mrc-lmb.cam.ac.uk/scop.
doi:10.1093/nar/gkh039
PMCID: PMC308773  PMID: 14681400
17.  SCOP database in 2002: refinements accommodate structural genomics 
Nucleic Acids Research  2002;30(1):264-267.
The SCOP (Structural Classification of Proteins) database is a comprehensive ordering of all proteins of known structure, according to their evolutionary and structural relationships. Protein domains in SCOP are grouped into species and hierarchically classified into families, superfamilies, folds and classes. Recently, we introduced a new set of features with the aim of standardizing access to the database, and providing a solid basis to manage the increasing number of experimental structures expected from structural genomics projects. These features include: a new set of identifiers, which uniquely identify each entry in the hierarchy; a compact representation of protein domain classification; a new set of parseable files, which fully describe all domains in SCOP and the hierarchy itself. These new features are reflected in the ASTRAL compendium. The SCOP search engine has also been updated, and a set of links to external resources added at the level of domain entries. SCOP can be accessed at http://scop.mrc-lmb.cam.ac.uk/scop.
PMCID: PMC99154  PMID: 11752311
18.  SCOP: a Structural Classification of Proteins database 
Nucleic Acids Research  2000;28(1):257-259.
The Structural Classification of Proteins (SCOP) database provides a detailed and comprehensive description of the relationships of known protein structures. The classification is on hierarchical levels: the first two levels, family and superfamily, describe near and distant evolutionary relationships; the third, fold, describes geometrical relationships. The distinction between evolutionary relationships and those that arise from the physics and chemistry of proteins is a feature that is unique to this database so far. The sequences of proteins in SCOP provide the basis of the ASTRAL sequence libraries that can be used as a source of data to calibrate sequence search algorithms and for the generation of statistics on, or selections of, protein structures. Links can be made from SCOP to PDB-ISL: a library containing sequences homologous to proteins of known structure. Sequences of proteins of unknown structure can be matched to distantly related proteins of known structure by using pairwise sequence comparison methods to find homologues in PDB-ISL. The database and its associated files are freely accessible from a number of WWW sites mirrored from URL http://scop.mrc-lmb. cam.ac.uk/scop/
PMCID: PMC102479  PMID: 10592240

Results 1-18 (18)