Search tips
Search criteria

Results 1-16 (16)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Ensembl Genomes 2013: scaling up access to genome-wide data 
Nucleic Acids Research  2013;42(D1):D546-D552.
Ensembl Genomes ( is an integrating resource for genome-scale data from non-vertebrate species. The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project, and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. This article provides an update to the previous publications about the resource, with a focus on recent developments. These include the addition of important new genomes (and related data sets) including crop plants, vectors of human disease and eukaryotic pathogens. In addition, the resource has scaled up its representation of bacterial genomes, and now includes the genomes of over 9000 bacteria. Specific extensions to the web and programmatic interfaces have been developed to support users in navigating these large data sets. Looking forward, analytic tools to allow targeted selection of data for visualization and download are likely to become increasingly important in future as the number of available genomes increases within all domains of life, and some of the challenges faced in representing bacterial data are likely to become commonplace for eukaryotes in future.
PMCID: PMC3965094  PMID: 24163254
2.  Comprehensive Skeletal Phenotyping and Linkage Mapping in an Intercross of Recombinant Congenic Mouse Strains HcB-8 and HcB-23 
Cells, Tissues, Organs  2011;194(2-4):244-248.
Bone biomechanical performance is a complex trait or, more properly, an ensemble of complex traits. Biomechanical performance incorporates flexibility under loading, yield and failure load, and energy to failure; all are important measures of bone function. To date, the vast majority of work has focused on yield and failure load and its surrogate, bone mineral density. We performed a reciprocal intercross of the mouse strains HcB-8 and HcB-23 to map and ultimately identify genes that contribute to differences in biomechanical performance. Mechanical testing was performed by 3-point bending of the femora. We measured femoral diaphysis cross-sectional anatomy from photographs of the fracture surfaces. We used beam equations to calculate material level mechanical properties. We performed a principal component (PC) analysis of normalized whole bone phenotypes (17 input traits). We measured distances separating mandibular landmarks from calibrated digital photographs and performed linkage analysis. Experiment-wide α = 0.05 significance thresholds were established by permutation testing. Three quantitative trait loci (QTLs) identified in these studies illustrate the advantages of the comprehensive phenotyping approach. A pleiotropic QTL on chromosome 4 affected multiple whole bone phenotypes with LOD scores as large as 17.5, encompassing size, cross-sectional ellipticity, stiffness, yield and failure load, and bone mineral density. This locus was linked to 3 of the PCs but unlinked to any of the tissue level phenotypes. From this pattern, we infer that the QTL operates by modulating the proliferative response to mechanical loading. On this basis, we successfully predicted that this locus also affects the length of a specific region of the mandible. A pleiotropic locus on chromosome 10 with LOD scores displays opposite effects on failure load and toughness with LOD scores of 4.5 and 5.5, respectively, so that the allele that increases failure load decreases toughness. A chromosome 19 QTL for PC2 with an LOD score of 4.8 was not detected with either the whole bone or tissue level phenotypes. We conclude that first, comprehensive, system-oriented phenotyping provides much information that could not be obtained by focusing on bone mineral density alone. Second, mechanical performance includes inherent trade-offs between strength and brittleness. Third, considering the aggregate phenotypic data allows prediction of novel QTLs.
PMCID: PMC3178085  PMID: 21625064
Biomechanics; Bone modeling; Quantitative trait loci; Linkage; Pleiotropy; Principal components
3.  VectorBase: improvements to a bioinformatics resource for invertebrate vector genomics 
Nucleic Acids Research  2011;40(D1):D729-D734.
VectorBase ( is a NIAID-supported bioinformatics resource for invertebrate vectors of human pathogens. It hosts data for nine genomes: mosquitoes (three Anopheles gambiae genomes, Aedes aegypti and Culex quinquefasciatus), tick (Ixodes scapularis), body louse (Pediculus humanus), kissing bug (Rhodnius prolixus) and tsetse fly (Glossina morsitans). Hosted data range from genomic features and expression data to population genetics and ontologies. We describe improvements and integration of new data that expand our taxonomic coverage. Releases are bi-monthly and include the delivery of preliminary data for emerging genomes. Frequent updates of the genome browser provide VectorBase users with increasing options for visualizing their own high-throughput data. One major development is a new population biology resource for storing genomic variations, insecticide resistance data and their associated metadata. It takes advantage of improved ontologies and controlled vocabularies. Combined, these new features ensure timely release of multiple types of data in the public domain while helping overcome the bottlenecks of bioinformatics and annotation by engaging with our user community.
PMCID: PMC3245112  PMID: 22135296
4.  Ensembl Genomes: an integrative resource for genome-scale data from non-vertebrate species 
Nucleic Acids Research  2011;40(D1):D91-D97.
Ensembl Genomes ( is an integrative resource for genome-scale data from non-vertebrate species. The project exploits and extends technology (for genome annotation, analysis and dissemination) developed in the context of the (vertebrate-focused) Ensembl project and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. Since its launch in 2009, Ensembl Genomes has undergone rapid expansion, with the goal of providing coverage of all major experimental organisms, and additionally including taxonomic reference points to provide the evolutionary context in which genes can be understood. Against the backdrop of a continuing increase in genome sequencing activities in all parts of the tree of life, we seek to work, wherever possible, with the communities actively generating and using data, and are participants in a growing range of collaborations involved in the annotation and analysis of genomes.
PMCID: PMC3245118  PMID: 22067447
5.  SUPERFAMILY 1.75 including a domain-centric gene ontology method 
Nucleic Acids Research  2010;39(Database issue):D427-D434.
The SUPERFAMILY resource provides protein domain assignments at the structural classification of protein (SCOP) superfamily level for over 1400 completely sequenced genomes, over 120 metagenomes and other gene collections such as UniProt. All models and assignments are available to browse and download at A new hidden Markov model library based on SCOP 1.75 has been created and a previously ignored class of SCOP, coiled coils, is now included. Our scoring component now uses HMMER3, which is in orders of magnitude faster and produces superior results. A cloud-based pipeline was implemented and is publicly available at Amazon web services elastic computer cloud. The SUPERFAMILY reference tree of life has been improved allowing the user to highlight a chosen superfamily, family or domain architecture on the tree of life. The most significant advance in SUPERFAMILY is that now it contains a domain-based gene ontology (GO) at the superfamily and family levels. A new methodology was developed to ensure a high quality GO annotation. The new methodology is general purpose and has been used to produce domain-based phenotypic ontologies in addition to GO.
PMCID: PMC3013712  PMID: 21062816
6.  Lineage-specific expansion of DNA-binding transcription factor families 
Trends in Genetics  2010;26(9-3):388-393.
DNA-binding domains (DBDs) are essential components of sequence-specific transcription factors (TFs). We have investigated the distribution of all known DBDs in more than 500 completely sequenced genomes from the three major superkingdoms (Bacteria, Archaea and Eukaryota) and documented conserved and specific DBD occurrence in diverse taxonomic lineages. By combining DBD occurrence in different species with taxonomic information, we have developed an automatic method for inferring the origins of DBD families and their specific combinations with other protein families in TFs. We found only three out of 131 (2%) DBD families shared by the three superkingdoms.
PMCID: PMC2937223  PMID: 20675012
7.  Genomic repertoires of DNA-binding transcription factors across the tree of life 
Nucleic Acids Research  2010;38(21):7364-7377.
Sequence-specific transcription factors (TFs) are important to genetic regulation in all organisms because they recognize and directly bind to regulatory regions on DNA. Here, we survey and summarize the TF resources available. We outline the organisms for which TF annotation is provided, and discuss the criteria and methods used to annotate TFs by different databases. By using genomic TF repertoires from ∼700 genomes across the tree of life, covering Bacteria, Archaea and Eukaryota, we review TF abundance with respect to the number of genes, as well as their structural complexity in diverse lineages. While typical eukaryotic TFs are longer than the average eukaryotic proteins, the inverse is true for prokaryotes. Only in eukaryotes does the same family of DNA-binding domain (DBD) occur multiple times within one polypeptide chain. This potentially increases the length and diversity of DNA-recognition sequence by reusing DBDs from the same family. We examined the increase in TF abundance with the number of genes in genomes, using the largest set of prokaryotic and eukaryotic genomes to date. As pointed out before, prokaryotic TFs increase faster than linearly. We further observe a similar relationship in eukaryotic genomes with a slower increase in TFs.
PMCID: PMC2995046  PMID: 20675356
8.  FlyTF: improved annotation and enhanced functionality of the Drosophila transcription factor database 
Nucleic Acids Research  2009;38(Database issue):D443-D447.
FlyTF ( is a database of computationally predicted and/or experimentally verified site-specific transcription factors (TFs) in the fruit fly Drosophila melanogaster. The manual classification of TFs in the initial version of FlyTF that concentrated primarily on the DNA-binding characteristics of the proteins has now been extended to a more fine-grained annotation of both DNA binding and regulatory properties in the new release. Furthermore, experimental evidence from the literature was classified into a defined vocabulary, and in collaboration with FlyBase, translated into Gene Ontology (GO) annotation. While our GO annotations will also be available through FlyBase as they will be incorporated into the genes’ official GO annotation in the future, the entire evidence used for classification including computational predictions and quotes from the literature can be accessed through FlyTF. The FlyTF website now builds upon the InterMine framework, which provides experimental and computational biologists with powerful search and filter functionality, list management tools and access to genomic information associated with the TFs.
PMCID: PMC2808907  PMID: 19884132
9.  SUPERFAMILY—sophisticated comparative genomics, data mining, visualization and phylogeny 
Nucleic Acids Research  2008;37(Database issue):D380-D386.
SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site.
PMCID: PMC2686452  PMID: 19036790
10.  InterPro: the integrative protein signature database 
Nucleic Acids Research  2008;37(Database issue):D211-D215.
The InterPro database ( integrates together predictive models or ‘signatures’ representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total ∼58 000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein–protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (
PMCID: PMC2686546  PMID: 18940856
11.  DBD––taxonomically broad transcription factor predictions: new content and functionality 
Nucleic Acids Research  2007;36(Database issue):D88-D92.
DNA-binding domain (DBD) is a database of predicted sequence-specific DNA-binding transcription factors (TFs) for all publicly available proteomes. The proteomes have increased from 150 in the initial version of DBD to over 700 in the current version. All predicted TFs must contain a significant match to a hidden Markov model representing a sequence-specific DNA-binding domain family. Access to TF predictions is provided through, where new search options are now provided such as searching by gene names in model organisms, searching for all proteins in a particular DBD family and specific organism. We illustrate the application of this type of search facility by contrasting trends of DBD family occurrence throughout the tree of life, highlighting the clear partition between eukaryotic and prokaryotic DBD expansions. The website content has been expanded to include dedicated pages for each TF containing domain assignment details, gene names, links to external databases and links to TFs with similar domain arrangements. We compare the increase in number of predicted TFs with proteome size in eukaryotes and prokaryotes. Eukaryotes follow a slower rate of increase in TFs than prokaryotes, which could be due to the presence of splice variants or an increase in combinatorial control.
PMCID: PMC2238844  PMID: 18073188
12.  Biomechanical characterization of the three-dimensional kinematic behaviour of the Dynesys dynamic stabilization system: an in vitro study 
European Spine Journal  2005;15(6):913-922.
The Dynesys, a flexible posterior stabilization system that provides an alternative to fusion, is designed to preserve intersegmental kinematics and alleviate loading at the facet joints. Recent biomechanical evidence suggests that the overall range of motion (ROM) with the Dynesys is less than the intact spine. The purpose of this investigation was to conduct a comprehensive characterization of the three-dimensional kinematic behaviour of the Dynesys and determine if the length of the Dynesys polymer spacer contributes to differences in the kinematic behaviour at the implanted level. Ten cadaveric lumbar spine segments (L2–L5) were tested by applying a pure moment of ±7.5 Nm in flexion–extension, lateral bending, and axial rotation, with and without a follower preload of 600 N. Test conditions included: (a) intact; (b) injury; (c) injury stabilized with Dynesys at L3–L4 (standard spacer); (d) long spacer (+2 mm); and (e) short spacer (−2 mm). Intervertebral rotations were measured using an optoelectronic camera system. The intersegmental range of motion (ROM), neutral zone (NZ), and three-dimensional helical axis of motion (HAM) were calculated. Statistical significance of changes in ROM, NZ, and HAM was determined using repeated measures analysis of variance (ANOVA) and Student–Newman–Keuls post-hoc analysis with P<0.05. Implantation of the standard length Dynesys significantly reduced ROM compared to the intact and injured specimens, with the least significant changes seen in axial rotation. Injury typically increased the NZ, but implantation of the Dynesys restored the NZ to a magnitude less that that of the intact spine. The Dynesys produced a significant posterior shift in the HAM in flexion–extension and axial rotation. The spacer length had a significant effect on ROM with the long spacer resulting in the largest ROM in all loading directions without a follower preload. The largest differences were in axial rotation. A 4 mm increase in spacer length led to an average intersegmental motion increase of 30% in axial rotation, 23% in extension, 14% in flexion, and 11% in lateral bending. There were no significant changes in NZ with different spacer lengths. Typically, the short spacer caused a greater shift and a greater change in orientation of the HAM than the long spacer. The long spacer resulted in a ROM and a motion pattern, as represented by the HAM, that was closer to that seen in an intact specimen. The results of this study suggest that the length of the Dynesys spacer altered the segmental position and therefore affected kinematic behaviour.
PMCID: PMC3489456  PMID: 16217663
Biomechanics; Lumbar spine; Non-fusion; Stabilization; Surgical treatment
13.  New developments in the InterPro database 
Nucleic Acids Research  2007;35(Database issue):D224-D228.
InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (), and for download by anonymous FTP (). The InterProScan search tool is now also available via a web service at .
PMCID: PMC1899100  PMID: 17202162
14.  The SUPERFAMILY database in 2007: families and functions 
Nucleic Acids Research  2006;35(Database issue):D308-D313.
The SUPERFAMILY database provides protein domain assignments, at the SCOP ‘superfamily’ level, for the predicted protein sequences in over 400 completed genomes. A superfamily groups together domains of different families which have a common evolutionary ancestor based on structural, functional and sequence data. SUPERFAMILY domain assignments are generated using an expert curated set of profile hidden Markov models. All models and structural assignments are available for browsing and download from . The web interface includes services such as domain architectures and alignment details for all protein assignments, searchable domain combinations, domain occurrence network visualization, detection of over- or under-represented superfamilies for a given genome by comparison with other genomes, assignment of manually submitted sequences and keyword searches. In this update we describe the SUPERFAMILY database and outline two major developments: (i) incorporation of family level assignments and (ii) a superfamily-level functional annotation. The SUPERFAMILY database can be used for general protein evolution and superfamily-specific studies, genomic annotation, and structural genomics target suggestion and assessment.
PMCID: PMC1669749  PMID: 17098927
16.  Leptospirosis—a diagnostic problem and an industrial hazard 
Three cases of human leptospirosis occurred on a small dairy farm at the foot of the Black Mountains in Powys. We describe the clinical course of these three patients and consider the sources of infection and the industrial implications.
PMCID: PMC1971955  PMID: 7277295

Results 1-16 (16)