Search tips
Search criteria

Results 1-11 (11)

Clipboard (0)

Select a Filter Below

Year of Publication
author:("canula, Craig")
1.  InterProScan 5: genome-scale protein function classification 
Bioinformatics  2014;30(9):1236-1240.
Motivation: Robust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the open source code is hosted at Google Code.
Availability and implementation: InterProScan is distributed via FTP at and the source code is available from
Contact: or or
PMCID: PMC3998142  PMID: 24451626
2.  EBI metagenomics—a new resource for the analysis and archiving of metagenomic data 
Nucleic Acids Research  2013;42(Database issue):D600-D606.
Metagenomics is a relatively recently established but rapidly expanding field that uses high-throughput next-generation sequencing technologies to characterize the microbial communities inhabiting different ecosystems (including oceans, lakes, soil, tundra, plants and body sites). Metagenomics brings with it a number of challenges, including the management, analysis, storage and sharing of data. In response to these challenges, we have developed a new metagenomics resource ( that allows users to easily submit raw nucleotide reads for functional and taxonomic analysis by a state-of-the-art pipeline, and have them automatically stored (together with descriptive, standards-compliant metadata) in the European Nucleotide Archive.
PMCID: PMC3965009  PMID: 24165880
3.  Metagenomic analysis: the challenge of the data bonanza 
Briefings in Bioinformatics  2012;13(6):743-746.
Several thousand metagenomes have already been sequenced, and this number is set to grow rapidly in the forthcoming years as the uptake of high-throughput sequencing technologies continues. Hand-in-hand with this data bonanza comes the computationally overwhelming task of analysis. Herein, we describe some of the bioinformatic approaches currently used by metagenomics researchers to analyze their data, the issues they face and the steps that could be taken to help overcome these challenges.
PMCID: PMC3504930  PMID: 22962339
metagenomics; next-generation sequencing (NGS); high-throughput sequencing (HTS); functional analysis; environmental bioinformatics
5.  Manual GO annotation of predictive protein signatures: the InterPro approach to GO curation 
InterPro amalgamates predictive protein signatures from a number of well-known partner databases into a single resource. To aid with interpretation of results, InterPro entries are manually annotated with terms from the Gene Ontology (GO). The InterPro2GO mappings are comprised of the cross-references between these two resources and are the largest source of GO annotation predictions for proteins. Here, we describe the protocol by which InterPro curators integrate GO terms into the InterPro database. We discuss the unique challenges involved in integrating specific GO terms with entries that may describe a diverse set of proteins, and we illustrate, with examples, how InterPro hierarchies reflect GO terms of increasing specificity. We describe a revised protocol for GO mapping that enables us to assign GO terms to domains based on the function of the individual domain, rather than the function of the families in which the domain is found. We also discuss how taxonomic constraints are dealt with and those cases where we are unable to add any appropriate GO terms. Expert manual annotation of InterPro entries with GO terms enables users to infer function, process or subcellular information for uncharacterized sequences based on sequence matches to predictive models.
Database URL: The complete InterPro2GO mappings are available at:
PMCID: PMC3270475  PMID: 22301074
6.  InterPro in 2011: new developments in the family and domain prediction database 
Nucleic Acids Research  2011;40(Database issue):D306-D312.
InterPro ( is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.
PMCID: PMC3245097  PMID: 22096229
7.  The InterPro BioMart: federated query and web service access to the InterPro Resource 
The InterPro BioMart provides users with query-optimized access to predictions of family classification, protein domains and functional sites, based on a broad spectrum of integrated computational models (‘signatures’) that are generated by the InterPro member databases: Gene3D, HAMAP, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. These predictions are provided for all protein sequences from both the UniProt Knowledge Base and the UniParc protein sequence archive. The InterPro BioMart is supplementary to the primary InterPro web interface (, providing a web service and the ability to build complex, custom queries that can efficiently return thousands of rows of data in a variety of formats. This article describes the information available from the InterPro BioMart and illustrates its utility with examples of how to build queries that return useful biological information.
Database URL:
PMCID: PMC3170169  PMID: 21785143
8.  InterPro: the integrative protein signature database 
Nucleic Acids Research  2008;37(Database issue):D211-D215.
The InterPro database ( integrates together predictive models or ‘signatures’ representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total ∼58 000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein–protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (
PMCID: PMC2686546  PMID: 18940856
9.  New developments in the InterPro database 
Nucleic Acids Research  2007;35(Database issue):D224-D228.
InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (), and for download by anonymous FTP (). The InterProScan search tool is now also available via a web service at .
PMCID: PMC1899100  PMID: 17202162
10.  Chloromethane-Induced Genes Define a Third C1 Utilization Pathway in Methylobacterium chloromethanicum CM4 
Journal of Bacteriology  2002;184(13):3476-3484.
Methylobacterium chloromethanicum CM4 is an aerobic α-proteobacterium capable of growth with chloromethane as the sole carbon and energy source. Two proteins, CmuA and CmuB, were previously purified and shown to catalyze the dehalogenation of chloromethane and the vitamin B12-mediated transfer of the methyl group of chloromethane to tetrahydrofolate. Three genes located near cmuA and cmuB, designated metF, folD and purU and encoding homologs of methylene tetrahydrofolate (methylene-H4folate) reductase, methylene-H4folate dehydrogenase-methenyl-H4folate cyclohydrolase and formyl-H4folate hydrolase, respectively, suggested the existence of a chloromethane-specific oxidation pathway from methyl-tetrahydrofolate to formate in strain CM4. Hybridization and PCR analysis indicated that these genes were absent in Methylobacterium extorquens AM1, which is unable to grow with chloromethane. Studies with transcriptional xylE fusions demonstrated the chloromethane-dependent expression of these genes. Transcriptional start sites were mapped by primer extension and allowed to define three transcriptional units, each likely comprising several genes, that were specifically expressed during growth of strain CM4 with chloromethane. The DNA sequences of the deduced promoters display a high degree of sequence conservation but differ from the Methylobacterium promoters described thus far. As shown previously for purU, inactivation of the metF gene resulted in a CM4 mutant unable to grow with chloromethane. Methylene-H4folate reductase activity was detected in a cell extract of strain CM4 only in the presence of chloromethane but not in the metF mutant. Taken together, these data provide evidence that M. chloromethanicum CM4 requires a specific set of tetrahydrofolate-dependent enzymes for growth with chloromethane.
PMCID: PMC135114  PMID: 12057941
11.  Chloromethane Utilization Gene Cluster from Hyphomicrobium chloromethanicum Strain CM2T and Development of Functional Gene Probes To Detect Halomethane-Degrading Bacteria 
Hyphomicrobium chloromethanicum CM2T, an aerobic methylotrophic member of the α subclass of the class proteobacteria, can grow with chloromethane as the sole carbon and energy source. H. chloromethanicum possesses an inducible enzyme system for utilization of chloromethane, in which two polypeptides (67-kDa CmuA and 35-kDa CmuB) are expressed. Previously, four genes, cmuA, cmuB, cmuC, and purU, were shown to be essential for growth of Methylobacterium chloromethanicum on chloromethane. The cmuA and cmuB genes were used as probes to identify homologs in H. chloromethanicum. A cmu gene cluster (9.5 kb) in H. chloromethanicum contained 10 open reading frames: folD (partial), pduX, orf153, orf207, orf225, cmuB, cmuC, cmuA, fmdB, and paaE (partial). CmuA from H. chloromethanicum (67 kDa) showed high identity to CmuA from M. chloromethanicum and contains an N-terminal methyltransferase domain and a C-terminal corrinoid-binding domain. CmuB from H. chloromethanicum is related to a family of methyl transfer proteins and to the CmuB methyltransferase from M. chloromethanicum. CmuC from H. chloromethanicum shows identity to CmuC from M. chloromethanicum and is a putative methyltransferase. folD codes for a methylene-tetrahydrofolate cyclohydrolase, which may be involved in the C1 transfer pathway for carbon assimilation and CO2 production, and paaE codes for a putative redox active protein. Molecular analyses and some preliminary biochemical data indicated that the chloromethane utilization pathway in H. chloromethanicum is similar to the corrinoid-dependent methyl transfer system in M. chloromethanicum. PCR primers were developed for successful amplification of cmuA genes from newly isolated chloromethane utilizers and enrichment cultures.
PMCID: PMC92571  PMID: 11133460

Results 1-11 (11)