Search tips
Search criteria

Results 1-25 (34)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
1.  Integrating GPCR-specific information with full text articles 
BMC Bioinformatics  2011;12:362.
With the continued growth in the volume both of experimental G protein-coupled receptor (GPCR) data and of the related peer-reviewed literature, the ability of GPCR researchers to keep up-to-date is becoming increasingly curtailed.
We present work that integrates the biological data and annotations in the GPCR information system (GPCRDB) with next-generation methods for intelligently exploring, visualising and interacting with the scientific articles used to disseminate them. This solution automatically retrieves relevant information from GPCRDB and displays it both within and as an adjunct to an article.
This approach allows researchers to extract more knowledge more swiftly from literature. Importantly, it allows reinterpretation of data in articles published before GPCR structure data became widely available, thereby rescuing these valuable data from long-dormant sources.
PMCID: PMC3179973  PMID: 21910883
2.  Low-complexity regions within protein sequences have position-dependent roles 
BMC Systems Biology  2010;4:43.
Regions of protein sequences with biased amino acid composition (so-called Low-Complexity Regions (LCRs)) are abundant in the protein universe. A number of studies have revealed that i) these regions show significant divergence across protein families; ii) the genetic mechanisms from which they arise lends them remarkable degrees of compositional plasticity. They have therefore proved difficult to compare using conventional sequence analysis techniques, and functions remain to be elucidated for most of them. Here we undertake a systematic investigation of LCRs in order to explore their possible functional significance, placed in the particular context of Protein-Protein Interaction (PPI) networks and Gene Ontology (GO)-term analysis.
In keeping with previous results, we found that LCR-containing proteins tend to have more binding partners across different PPI networks than proteins that have no LCRs. More specifically, our study suggests i) that LCRs are preferentially positioned towards the protein sequence extremities and, in contrast with centrally-located LCRs, such terminal LCRs show a correlation between their lengths and degrees of connectivity, and ii) that centrally-located LCRs are enriched with transcription-related GO terms, while terminal LCRs are enriched with translation and stress response-related terms.
Our results suggest not only that LCRs may be involved in flexible binding associated with specific functions, but also that their positions within a sequence may be important in determining both their binding properties and their biological roles.
PMCID: PMC2873317  PMID: 20385029
3.  LRRCE: a leucine-rich repeat cysteine capping motif unique to the chordate lineage 
BMC Genomics  2008;9:599.
The small leucine-rich repeat proteins and proteoglycans (SLRPs) form an important family of regulatory molecules that participate in many essential functions. They typically control the correct assembly of collagen fibrils, regulate mineral deposition in bone, and modulate the activity of potent cellular growth factors through many signalling cascades. SLRPs belong to the group of extracellular leucine-rich repeat proteins that are flanked at both ends by disulphide-bonded caps that protect the hydrophobic core of the terminal repeats. A capping motif specific to SLRPs has been recently described in the crystal structures of the core proteins of decorin and biglycan. This motif, designated as LRRCE, differs in both sequence and structure from other, more widespread leucine-rich capping motifs. To investigate if the LRRCE motif is a common structural feature found in other leucine-rich repeat proteins, we have defined characteristic sequence patterns and used them in genome-wide searches.
The LRRCE motif is a structural element exclusive to the main group of SLRPs. It appears to have evolved during early chordate evolution and is not found in protein sequences from non-chordate genomes. Our search has expanded the family of SLRPs to include new predicted protein sequences, mainly in fishes but with intriguing putative orthologs in mammals. The chromosomal locations of the newly predicted SLRP genes would support the large-scale genome or gene duplications that are thought to have occurred during vertebrate evolution. From this expanded list we describe a new class of SLRP sequences that could be representative of an ancestral SLRP gene.
Given its exclusivity the LRRCE motif is a useful annotation tool for the identification and classification of new SLRP sequences in genome databases. The expanded list of members of the SLRP family offers interesting insights into early vertebrate evolution and suggests an early chordate evolutionary origin for the LRRCE capping motif.
PMCID: PMC2637281  PMID: 19077264
4.  Best practices in bioinformatics training for life scientists 
Briefings in Bioinformatics  2013;14(5):528-537.
The mountains of data thrusting from the new landscape of modern high-throughput biology are irrevocably changing biomedical research and creating a near-insatiable demand for training in data management and manipulation and data mining and analysis. Among life scientists, from clinicians to environmental researchers, a common theme is the need not just to use, and gain familiarity with, bioinformatics tools and resources but also to understand their underlying fundamental theoretical and practical concepts. Providing bioinformatics training to empower life scientists to handle and analyse their data efficiently, and progress their research, is a challenge across the globe. Delivering good training goes beyond traditional lectures and resource-centric demos, using interactivity, problem-solving exercises and cooperative learning to substantially enhance training quality and learning outcomes. In this context, this article discusses various pragmatic criteria for identifying training needs and learning objectives, for selecting suitable trainees and trainers, for developing and maintaining training skills and evaluating training quality. Adherence to these criteria may help not only to guide course organizers and trainers on the path towards bioinformatics training excellence but, importantly, also to improve the training experience for life scientists.
PMCID: PMC3771230  PMID: 23803301
bioinformatics; training; bioinformatics courses; training life scientists; train the trainers
5.  Biochemical and biophysical characterization of four EphB kinase domains reveals contrasting thermodynamic, kinetic and inhibition profiles 
Bioscience Reports  2013;33(3):e00040.
The Eph (erythropoietin-producing hepatocellular carcinoma) B receptors are important in a variety of cellular processes through their roles in cell-to-cell contact and signalling; their up-regulation and down-regulation has been shown to have implications in a variety of cancers. A greater understanding of the similarities and differences within this small, highly conserved family of tyrosine kinases will be essential to the identification of effective therapeutic opportunities for disease intervention. In this study, we have developed a route to production of multi-milligram quantities of highly purified, homogeneous, recombinant protein for the kinase domain of these human receptors in Escherichia coli. Analyses of these isolated catalytic fragments have revealed stark contrasts in their amenability to recombinant expression and their physical properties: e.g., a >16°C variance in thermal stability, a 3-fold difference in catalytic activity and disparities in their inhibitor binding profiles. We find EphB3 to be an outlier in terms of both its intrinsic stability, and more importantly its ligand-binding properties. Our findings have led us to speculate about both their biological significance and potential routes for generating EphB isozyme-selective small-molecule inhibitors. Our comprehensive methodologies provide a template for similar in-depth studies of other kinase superfamily members.
PMCID: PMC3673036  PMID: 23627399
EphB1; EphB2; EphB3; EphB4; kinase inhibition; protein stability; CMPD3, compound 3; DSF, differential scanning fluorimetry; DTT, dithiothreitol; Eph, erythropoietin-producing hepatocellular carcinoma; GdnHCl, guanidine hydrochloride; ITC, isothermal titration calorimetry; Ni-NTA, Ni2+-nitrilotriacetate; PTP1B, protein tyrosine phosphatase 1B; RTK, receptor tyrosine kinase; SEC, size-exclusion chromatography; TCEP, tris-(2-carboxyethyl)phosphine; TEV, tobacco etch virus; TFA, trifluoroacetic acid
7.  The PRINTS database: a fine-grained protein sequence annotation and analysis resource—its status in 2012 
The PRINTS database, now in its 21st year, houses a collection of diagnostic protein family ‘fingerprints’. Fingerprints are groups of conserved motifs, evident in multiple sequence alignments, whose unique inter-relationships provide distinctive signatures for particular protein families and structural/functional domains. As such, they may be used to assign uncharacterized sequences to known families, and hence to infer tentative functional, structural and/or evolutionary relationships. The February 2012 release (version 42.0) includes 2156 fingerprints, encoding 12 444 individual motifs, covering a range of globular and membrane proteins, modular polypeptides and so on. Here, we report the current status of the database, and introduce a number of recent developments that help both to render a variety of our annotation and analysis tools easier to use and to make them more widely available.
Database URL:
PMCID: PMC3326521  PMID: 22508994
8.  Biocurators and Biocuration: surveying the 21st century challenges 
Curated databases are an integral part of the tool set that researchers use on a daily basis for their work. For most users, however, how databases are maintained, and by whom, is rather obscure. The International Society for Biocuration (ISB) represents biocurators, software engineers, developers and researchers with an interest in biocuration. Its goals include fostering communication between biocurators, promoting and describing their work, and highlighting the added value of biocuration to the world. The ISB recently conducted a survey of biocurators to better understand their educational and scientific backgrounds, their motivations for choosing a curatorial job and their career goals. The results are reported here. From the responses received, it is evident that biocuration is performed by highly trained scientists and perceived to be a stimulating career, offering both intellectual challenges and the satisfaction of performing work essential to the modern scientific community. It is also apparent that the ISB has at least a dual role to play to facilitate biocurators’ work: (i) to promote biocuration as a career within the greater scientific community; (ii) to aid the development of resources for biomedical research through promotion of nomenclature and data-sharing standards that will allow interconnection of biological databases and better exploit the pivotal contributions that biocurators are making.
Database URL:
PMCID: PMC3308150  PMID: 22434828
9.  Bioinformatics Training Network (BTN): a community resource for bioinformatics trainers 
Briefings in Bioinformatics  2011;13(3):383-389.
Funding bodies are increasingly recognizing the need to provide graduates and researchers with access to short intensive courses in a variety of disciplines, in order both to improve the general skills base and to provide solid foundations on which researchers may build their careers. In response to the development of ‘high-throughput biology’, the need for training in the field of bioinformatics, in particular, is seeing a resurgence: it has been defined as a key priority by many Institutions and research programmes and is now an important component of many grant proposals. Nevertheless, when it comes to planning and preparing to meet such training needs, tension arises between the reward structures that predominate in the scientific community which compel individuals to publish or perish, and the time that must be devoted to the design, delivery and maintenance of high-quality training materials. Conversely, there is much relevant teaching material and training expertise available worldwide that, were it properly organized, could be exploited by anyone who needs to provide training or needs to set up a new course. To do this, however, the materials would have to be centralized in a database and clearly tagged in relation to target audiences, learning objectives, etc. Ideally, they would also be peer reviewed, and easily and efficiently accessible for downloading. Here, we present the Bioinformatics Training Network (BTN), a new enterprise that has been initiated to address these needs and review it, respectively, to similar initiatives and collections.
PMCID: PMC3357490  PMID: 22110242
Bioinformatics; training; end users; bioinformatics courses; learning bioinformatics
10.  InterPro in 2011: new developments in the family and domain prediction database 
Nucleic Acids Research  2011;40(D1):D306-D312.
InterPro ( is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.
PMCID: PMC3245097  PMID: 22096229
11.  NucleaRDB: information system for nuclear receptors 
Nucleic Acids Research  2011;40(D1):D377-D380.
The NucleaRDB is a Molecular Class-Specific Information System that collects, combines, validates and disseminates large amounts of heterogeneous data on nuclear hormone receptors. It contains both experimental and computationally derived data. The data and knowledge present in the NucleaRDB can be accessed using a number of different interactive and programmatic methods and query systems. A nuclear hormone receptor-specific PDF reader interface is available that can integrate the contents of the NucleaRDB with full-text scientific articles. The NucleaRDB is freely available at
PMCID: PMC3245090  PMID: 22064856
13.  Towards BioDBcore: a community-defined information specification for biological databases 
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
PMCID: PMC3017395  PMID: 21205783
14.  Towards BioDBcore: a community-defined information specification for biological databases 
Nucleic Acids Research  2010;39(Database issue):D7-D10.
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
PMCID: PMC3013734  PMID: 21097465
15.  Calling International Rescue: knowledge lost in literature and data landslide! 
Biochemical Journal  2009;424(Pt 3):317-333.
We live in interesting times. Portents of impending catastrophe pervade the literature, calling us to action in the face of unmanageable volumes of scientific data. But it isn't so much data generation per se, but the systematic burial of the knowledge embodied in those data that poses the problem: there is so much information available that we simply no longer know what we know, and finding what we want is hard – too hard. The knowledge we seek is often fragmentary and disconnected, spread thinly across thousands of databases and millions of articles in thousands of journals. The intellectual energy required to search this array of data-archives, and the time and money this wastes, has led several researchers to challenge the methods by which we traditionally commit newly acquired facts and knowledge to the scientific record. We present some of these initiatives here – a whirlwind tour of recent projects to transform scholarly publishing paradigms, culminating in Utopia and the Semantic Biochemical Journal experiment. With their promises to provide new ways of interacting with the literature, and new and more powerful tools to access and extract the knowledge sequestered within it, we ask what advances they make and what obstacles to progress still exist? We explore these questions, and, as you read on, we invite you to engage in an experiment with us, a real-time test of a new technology to rescue data from the dormant pages of published documents. We ask you, please, to read the instructions carefully. The time has come: you may turn over your papers…
PMCID: PMC2805925  PMID: 19929850
dynamic document content; interactive PDF; linking documents with research data; manuscript mark-up; mark-up standards; semantic publishing; BJ, Biochemical Journal; COHSE, Conceptual Open Hypermedia Services Environment; DOI, Digital Object Identifier; GO, Gene Ontology; GPCR, G protein-coupled receptor; HTML, HyperText Mark-up Language; IUPAC, International Union of Pure and Applied Chemistry; NTD, Neglected Tropical Diseases; OBO, Open Biomedical Ontologies; PDB, Protein Data Bank; PDF, Portable Document Format; PLoS, Public Library of Science; PMC, PubMed Central; PTM, post-translational modification; RSC, Royal Society of Chemistry; SDA, Structured Digital Abstract; STM, Scientific, Technical and Medical; UD, Utopia Documents; XML, eXtensible Mark-up Language; XMP, eXtensible Metadata Platform
16.  Aspergillus Genomes and the Aspergillus Cloud 
Nucleic Acids Research  2008;37(Database issue):D509-D514.
Aspergillus Genomes is a public resource for viewing annotated genes predicted by various Aspergillus sequencing projects. It has arisen from the union of two significant resources: the Aspergillus/Aspergillosis website and the Central Aspergillus Data REpository (CADRE). The former has primarily served the medical community, providing information about Aspergillus and associated diseases to medics, patients and scientists; the latter has focused on the fungal genomic community, providing a central repository for sequences and annotation extracted from Aspergillus Genomes. By merging these databases, genomes benefit from extensive cross-linking with medical information to create a unique resource, spanning genomics and clinical aspects of the genus. Aspergillus Genomes is accessible from
PMCID: PMC2686514  PMID: 19039001
17.  InterPro: the integrative protein signature database 
Nucleic Acids Research  2008;37(Database issue):D211-D215.
The InterPro database ( integrates together predictive models or ‘signatures’ representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total ∼58 000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein–protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (
PMCID: PMC2686546  PMID: 18940856
18.  New developments in the InterPro database 
Nucleic Acids Research  2007;35(Database issue):D224-D228.
InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (), and for download by anonymous FTP (). The InterProScan search tool is now also available via a web service at .
PMCID: PMC1899100  PMID: 17202162
19.  Toward bacterial protein sub-cellular location prediction: single-class discrimminant models for all gram- and gram+ compartments 
Bioinformation  2006;1(8):276-280.
Based on Bayesian Networks, methods were created that address protein sequence-based bacterial subcellular location prediction. Distinct predictive algorithms for the eight bacterial subcellular locations were created. Several variant methods were explored. These variations included differences in the number of residues considered within the query sequence - which ranged from the N-terminal 10 residues to the whole sequence - and residue representation - which took the form of amino acid composition, percentage amino acid composition, or normalised amino acid composition. The accuracies of the best performing networks were then compared to PSORTB. All individual location methods outperform PSORTB except for the Gram+ cytoplasmic protein predictor, for which accuracies were essentially equal, and for outer membrane protein prediction, where PSORTB outperforms the binary predictor. The method described here is an important new approach to method development for subcellular location prediction. It is also a new, potentially valuable tool for candidate subunit vaccine selection.
PMCID: PMC1891713  PMID: 17597907
Bayesian networks; prediction method; subcellular location; membrane protein; periplasmic protein; secreted protein
20.  Combining algorithms to predict bacterial protein sub-cellular location: Parallel versus concurrent implementations 
Bioinformation  2006;1(8):285-289.
We describe a novel and potentially important tool for candidate subunit vaccine selection through in silico reverse-vaccinology. A set of Bayesian networks able to make individual predictions for specific subcellular locations is implemented in three pipelines with different architectures: a parallel implementation with a confidence level-based decision engine and two serial implementations with a hierarchical decision structure, one initially rooted by prediction between membrane types and another rooted by soluble versus membrane prediction. The parallel pipeline outperformed the serial pipeline, but took twice as long to execute. The soluble-rooted serial pipeline outperformed the membrane-rooted predictor. Assessment using genomic test sets was more equivocal, as many more predictions are made by the parallel pipeline, yet the serial pipeline identifies 22 more of the 74 proteins of known location.
PMCID: PMC1891705  PMID: 17597909
beta barrel transmembrane protein; prokaryotic membrane proteins; Bayesian Networks; prediction method; subcellular location
21.  Multi-class subcellular location prediction for bacterial proteins 
Bioinformation  2006;1(7):260-264.
Two algorithms, based on Bayesian Networks (BNs), for bacterial subcellular location prediction, are explored in this paper: one predicts all locations for Gram+ bacteria and the other all locations for Gram- bacteria. Methods were evaluated using different numbers of residues (from the N-terminal 10 residues to the whole sequence) and residue representation (amino acid-composition, percentage amino acid-composition or normalised amino acid-composition). The accuracy of the best resulting BN was compared to PSORTB. The accuracy of this multi-location BN was roughly comparable to PSORTB; the difference in predictions is low, often less than 2%. The BN method thus represents both an important new avenue of methodological development for subcellular location prediction and a potentially value new tool of true utilitarian value for candidate subunit vaccine selection.
PMCID: PMC1891703  PMID: 17597904
Bayesian networks; prediction method; subcellular location; membrane protein; periplasmic protein; secreted protein
22.  A predictor of membrane class: Discriminating α-helical and β-barrel membrane proteins from non-membranous proteins 
Bioinformation  2006;1(6):208-213.
Accurate protein structure prediction remains an active objective of research in bioinformatics. Membrane proteins comprise approximately 20% of most genomes. They are, however, poorly tractable targets of experimental structure determination. Their analysis using bioinformatics thus makes an important contribution to their on-going study. Using a method based on Bayesian Networks, which provides a flexible and powerful framework for statistical inference, we have addressed the alignment-free discrimination of membrane from non-membrane proteins. The method successfully identifies prokaryotic and eukaryotic α-helical membrane proteins at 94.4% accuracy, β-barrel proteins at 72.4% accuracy, and distinguishes assorted non-membranous proteins with 85.9% accuracy. The method here is an important potential advance in the computational analysis of membrane protein structure. It represents a useful tool for the characterisation of membrane proteins with a wide variety of potential applications.
PMCID: PMC1891694  PMID: 17597890
α-helical membrane proteins; β-barrel membrane proteins; membrane protein discrimination; Bayesian Network; alignment-free prediction
23.  Beta barrel trans-membrane proteins: Enhanced prediction using a Bayesian approach 
Bioinformation  2006;1(6):231-233.
Membrane proteins, which constitute approximately 20% of most genomes, form two main classes: alpha helical and beta barrel transmembrane proteins. Using methods based on Bayesian Networks, a powerful approach for statistical inference, we have sought to address β-barrel topology prediction. The β-barrel topology predictor reports individual strand accuracies of 88.6%. The method outlined here represents a potentially important advance in the computational determination of membrane protein topology.
PMCID: PMC1891693  PMID: 17597895
beta barrel transmembrane protein; prokaryotic membrane proteins; Bayesian Networks; prediction method; sub-cellular location
24.  Alpha helical trans-membrane proteins: Enhanced prediction using a Bayesian approach 
Bioinformation  2006;1(6):234-236.
Membrane proteins, which constitute approximately 20% of most genomes, are poorly tractable targets for experimental structure determination, thus analysis by prediction and modelling makes an important contribution to their on-going study. Membrane proteins form two main classes: alpha helical and beta barrel trans-membrane proteins. By using a method based on Bayesian Networks, which provides a flexible and powerful framework for statistical inference, we addressed α-helical topology prediction. This method has accuracies of 77.4% for prokaryotic proteins and 61.4% for eukaryotic proteins. The method described here represents an important advance in the computational determination of membrane protein topology and offers a useful, and complementary, tool for the analysis of membrane proteins for a range of applications.
PMCID: PMC1891692  PMID: 17597896
trans-membrane protein; alpha helix; static full Bayesian model; prediction; amino acid descriptors
25.  TATPred: a Bayesian method for the identification of twin arginine translocation pathway signal sequences 
Bioinformation  2006;1(5):184-187.
The twin arginine translocation (TAT) system ferries folded proteins across the bacterial membrane. Proteins are directed into this system by the TAT signal peptide present at the amino terminus of the precursor protein, which contains the twin arginine residues that give the system its name. There are currently only two computational methods for the prediction of TAT translocated proteins from sequence. Both methods have limitations that make the creation of a new algorithm for TAT-translocated protein prediction desirable. We have developed TATPred, a new sequence-model method, based on a Nave-Bayesian network, for the prediction of TAT signal peptides. In this approach, a comprehensive range of models was tested to identify the most reliable and robust predictor. The best model comprised 12 residues: three residues prior to the twin arginines and the seven residues that follow them. We found a prediction sensitivity of 0.979 and a specificity of 0.942.
PMCID: PMC1891679  PMID: 17597885
twin arginine motif; Bayesian Network; TAT translocation; signal sequence; vaccine

Results 1-25 (34)