Search tips
Search criteria

Results 1-10 (10)

Clipboard (0)

Select a Filter Below

Year of Publication
1.  The BioMart community portal: an innovative alternative to large, centralized data repositories 
Smedley, Damian | Haider, Syed | Durinck, Steffen | Pandini, Luca | Provero, Paolo | Allen, James | Arnaiz, Olivier | Awedh, Mohammad Hamza | Baldock, Richard | Barbiera, Giulia | Bardou, Philippe | Beck, Tim | Blake, Andrew | Bonierbale, Merideth | Brookes, Anthony J. | Bucci, Gabriele | Buetti, Iwan | Burge, Sarah | Cabau, Cédric | Carlson, Joseph W. | Chelala, Claude | Chrysostomou, Charalambos | Cittaro, Davide | Collin, Olivier | Cordova, Raul | Cutts, Rosalind J. | Dassi, Erik | Genova, Alex Di | Djari, Anis | Esposito, Anthony | Estrella, Heather | Eyras, Eduardo | Fernandez-Banet, Julio | Forbes, Simon | Free, Robert C. | Fujisawa, Takatomo | Gadaleta, Emanuela | Garcia-Manteiga, Jose M. | Goodstein, David | Gray, Kristian | Guerra-Assunção, José Afonso | Haggarty, Bernard | Han, Dong-Jin | Han, Byung Woo | Harris, Todd | Harshbarger, Jayson | Hastings, Robert K. | Hayes, Richard D. | Hoede, Claire | Hu, Shen | Hu, Zhi-Liang | Hutchins, Lucie | Kan, Zhengyan | Kawaji, Hideya | Keliet, Aminah | Kerhornou, Arnaud | Kim, Sunghoon | Kinsella, Rhoda | Klopp, Christophe | Kong, Lei | Lawson, Daniel | Lazarevic, Dejan | Lee, Ji-Hyun | Letellier, Thomas | Li, Chuan-Yun | Lio, Pietro | Liu, Chu-Jun | Luo, Jie | Maass, Alejandro | Mariette, Jerome | Maurel, Thomas | Merella, Stefania | Mohamed, Azza Mostafa | Moreews, Francois | Nabihoudine, Ibounyamine | Ndegwa, Nelson | Noirot, Céline | Perez-Llamas, Cristian | Primig, Michael | Quattrone, Alessandro | Quesneville, Hadi | Rambaldi, Davide | Reecy, James | Riba, Michela | Rosanoff, Steven | Saddiq, Amna Ali | Salas, Elisa | Sallou, Olivier | Shepherd, Rebecca | Simon, Reinhard | Sperling, Linda | Spooner, William | Staines, Daniel M. | Steinbach, Delphine | Stone, Kevin | Stupka, Elia | Teague, Jon W. | Dayem Ullah, Abu Z. | Wang, Jun | Ware, Doreen | Wong-Erasmus, Marie | Youens-Clark, Ken | Zadissa, Amonida | Zhang, Shi-Jian | Kasprzyk, Arek
Nucleic Acids Research  2015;43(Web Server issue):W589-W598.
The BioMart Community Portal ( is a community-driven effort to provide a unified interface to biomedical databases that are distributed worldwide. The portal provides access to numerous database projects supported by 30 scientific organizations. It includes over 800 different biological datasets spanning genomics, proteomics, model organisms, cancer data, ontology information and more. All resources available through the portal are independently administered and funded by their host organizations. The BioMart data federation technology provides a unified interface to all the available data. The latest version of the portal comes with many new databases that have been created by our ever-growing community. It also comes with better support and extensibility for data analysis and visualization tools. A new addition to our toolbox, the enrichment analysis tool is now accessible through graphical and web service interface. The BioMart community portal averages over one million requests per day. Building on this level of service and the wealth of information that has become available, the BioMart Community Portal has introduced a new, more scalable and cheaper alternative to the large data stores maintained by specialized organizations.
PMCID: PMC4489294  PMID: 25897122
2.  Conservation and Losses of Non-Coding RNAs in Avian Genomes 
PLoS ONE  2015;10(3):e0121797.
Here we present the results of a large-scale bioinformatics annotation of non-coding RNA loci in 48 avian genomes. Our approach uses probabilistic models of hand-curated families from the Rfam database to infer conserved RNA families within each avian genome. We supplement these annotations with predictions from the tRNA annotation tool, tRNAscan-SE and microRNAs from miRBase. We identify 34 lncRNA-associated loci that are conserved between birds and mammals and validate 12 of these in chicken. We report several intriguing cases where a reported mammalian lncRNA, but not its function, is conserved. We also demonstrate extensive conservation of classical ncRNAs (e.g., tRNAs) and more recently discovered ncRNAs (e.g., snoRNAs and miRNAs) in birds. Furthermore, we describe numerous “losses” of several RNA families, and attribute these to either genuine loss, divergence or missing data. In particular, we show that many of these losses are due to the challenges associated with assembling avian microchromosomes. These combined results illustrate the utility of applying homology-based methods for annotating novel vertebrate genomes.
PMCID: PMC4378963  PMID: 25822729
3.  Rfam 12.0: updates to the RNA families database 
Nucleic Acids Research  2014;43(Database issue):D130-D137.
The Rfam database (available at is a collection of non-coding RNA families represented by manually curated sequence alignments, consensus secondary structures and annotation gathered from corresponding Wikipedia, taxonomy and ontology resources. In this article, we detail updates and improvements to the Rfam data and website for the Rfam 12.0 release. We describe the upgrade of our search pipeline to use Infernal 1.1 and demonstrate its improved homology detection ability by comparison with the previous version. The new pipeline is easier for users to apply to their own data sets, and we illustrate its ability to annotate RNAs in genomic and metagenomic data sets of various sizes. Rfam has been expanded to include 260 new families, including the well-studied large subunit ribosomal RNA family, and for the first time includes information on short sequence- and structure-based RNA motifs present within families.
PMCID: PMC4383904  PMID: 25392425
4.  Special Focus 
RNA Biology  2013;10(7):1160.
The development of RNA bioinformatic tools began more than 30 y ago with the description of the Nussinov and Zuker dynamic programming algorithms for single sequence RNA secondary structure prediction. Since then, many tools have been developed for various RNA sequence analysis problems such as homology search, multiple sequence alignment, de novo RNA discovery, read-mapping, and many more. In this issue, we have collected a sampling of reviews and original research that demonstrate some of the many ways bioinformatics is integrated with current RNA biology research.
PMCID: PMC3849163  PMID: 23948768
5.  Rfam 11.0: 10 years of RNA families 
Nucleic Acids Research  2012;41(Database issue):D226-D232.
The Rfam database (available via the website at and through our mirror at is a collection of non-coding RNA families, primarily RNAs with a conserved RNA secondary structure, including both RNA genes and mRNA cis-regulatory elements. Each family is represented by a multiple sequence alignment, predicted secondary structure and covariance model. Here we discuss updates to the database in the latest release, Rfam 11.0, including the introduction of genome-based alignments for large families, the introduction of the Rfam Biomart as well as other user interface improvements. Rfam is available under the Creative Commons Zero license.
PMCID: PMC3531072  PMID: 23125362
7.  Biocurators and Biocuration: surveying the 21st century challenges 
Curated databases are an integral part of the tool set that researchers use on a daily basis for their work. For most users, however, how databases are maintained, and by whom, is rather obscure. The International Society for Biocuration (ISB) represents biocurators, software engineers, developers and researchers with an interest in biocuration. Its goals include fostering communication between biocurators, promoting and describing their work, and highlighting the added value of biocuration to the world. The ISB recently conducted a survey of biocurators to better understand their educational and scientific backgrounds, their motivations for choosing a curatorial job and their career goals. The results are reported here. From the responses received, it is evident that biocuration is performed by highly trained scientists and perceived to be a stimulating career, offering both intellectual challenges and the satisfaction of performing work essential to the modern scientific community. It is also apparent that the ISB has at least a dual role to play to facilitate biocurators’ work: (i) to promote biocuration as a career within the greater scientific community; (ii) to aid the development of resources for biomedical research through promotion of nomenclature and data-sharing standards that will allow interconnection of biological databases and better exploit the pivotal contributions that biocurators are making.
Database URL:
PMCID: PMC3308150  PMID: 22434828
8.  Manual GO annotation of predictive protein signatures: the InterPro approach to GO curation 
InterPro amalgamates predictive protein signatures from a number of well-known partner databases into a single resource. To aid with interpretation of results, InterPro entries are manually annotated with terms from the Gene Ontology (GO). The InterPro2GO mappings are comprised of the cross-references between these two resources and are the largest source of GO annotation predictions for proteins. Here, we describe the protocol by which InterPro curators integrate GO terms into the InterPro database. We discuss the unique challenges involved in integrating specific GO terms with entries that may describe a diverse set of proteins, and we illustrate, with examples, how InterPro hierarchies reflect GO terms of increasing specificity. We describe a revised protocol for GO mapping that enables us to assign GO terms to domains based on the function of the individual domain, rather than the function of the families in which the domain is found. We also discuss how taxonomic constraints are dealt with and those cases where we are unable to add any appropriate GO terms. Expert manual annotation of InterPro entries with GO terms enables users to infer function, process or subcellular information for uncharacterized sequences based on sequence matches to predictive models.
Database URL: The complete InterPro2GO mappings are available at:
PMCID: PMC3270475  PMID: 22301074
9.  InterPro in 2011: new developments in the family and domain prediction database 
Nucleic Acids Research  2011;40(Database issue):D306-D312.
InterPro ( is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.
PMCID: PMC3245097  PMID: 22096229
10.  Quadruplex DNA: sequence, topology and structure 
Nucleic Acids Research  2006;34(19):5402-5415.
G-quadruplexes are higher-order DNA and RNA structures formed from G-rich sequences that are built around tetrads of hydrogen-bonded guanine bases. Potential quadruplex sequences have been identified in G-rich eukaryotic telomeres, and more recently in non-telomeric genomic DNA, e.g. in nuclease-hypersensitive promoter regions. The natural role and biological validation of these structures is starting to be explored, and there is particular interest in them as targets for therapeutic intervention. This survey focuses on the folding and structural features on quadruplexes formed from telomeric and non-telomeric DNA sequences, and examines fundamental aspects of topology and the emerging relationships with sequence. Emphasis is placed on information from the high-resolution methods of X-ray crystallography and NMR, and their scope and current limitations are discussed. Such information, together with biological insights, will be important for the discovery of drugs targeting quadruplexes from particular genes.
PMCID: PMC1636468  PMID: 17012276

Results 1-10 (10)