Search tips
Search criteria

Results 1-6 (6)

Clipboard (0)

Select a Filter Below

Year of Publication
1.  InterProScan 5: genome-scale protein function classification 
Bioinformatics  2014;30(9):1236-1240.
Motivation: Robust large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterize many millions of sequences. Here, we describe a new Java-based architecture for the widely used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete reimplementation of the software framework, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the open source code is hosted at Google Code.
Availability and implementation: InterProScan is distributed via FTP at and the source code is available from
Contact: or or
PMCID: PMC3998142  PMID: 24451626
3.  Manual GO annotation of predictive protein signatures: the InterPro approach to GO curation 
InterPro amalgamates predictive protein signatures from a number of well-known partner databases into a single resource. To aid with interpretation of results, InterPro entries are manually annotated with terms from the Gene Ontology (GO). The InterPro2GO mappings are comprised of the cross-references between these two resources and are the largest source of GO annotation predictions for proteins. Here, we describe the protocol by which InterPro curators integrate GO terms into the InterPro database. We discuss the unique challenges involved in integrating specific GO terms with entries that may describe a diverse set of proteins, and we illustrate, with examples, how InterPro hierarchies reflect GO terms of increasing specificity. We describe a revised protocol for GO mapping that enables us to assign GO terms to domains based on the function of the individual domain, rather than the function of the families in which the domain is found. We also discuss how taxonomic constraints are dealt with and those cases where we are unable to add any appropriate GO terms. Expert manual annotation of InterPro entries with GO terms enables users to infer function, process or subcellular information for uncharacterized sequences based on sequence matches to predictive models.
Database URL: The complete InterPro2GO mappings are available at:
PMCID: PMC3270475  PMID: 22301074
4.  InterPro in 2011: new developments in the family and domain prediction database 
Nucleic Acids Research  2011;40(Database issue):D306-D312.
InterPro ( is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.
PMCID: PMC3245097  PMID: 22096229
5.  Comparative genomics of lactic acid bacteria reveals a niche-specific gene set 
BMC Microbiology  2009;9:50.
The recently sequenced genome of Lactobacillus helveticus DPC4571 [1] revealed a dairy organism with significant homology (75% of genes are homologous) to a probiotic bacteria Lb. acidophilus NCFM [2]. This led us to hypothesise that a group of genes could be determined which could define an organism's niche.
Taking 11 fully sequenced lactic acid bacteria (LAB) as our target, (3 dairy LAB, 5 gut LAB and 3 multi-niche LAB), we demonstrated that the presence or absence of certain genes involved in sugar metabolism, the proteolytic system, and restriction modification enzymes were pivotal in suggesting the niche of a strain. We identified 9 niche specific genes, of which 6 are dairy specific and 3 are gut specific. The dairy specific genes identified in Lactobacillus helveticus DPC4571 were lhv_1161 and lhv_1171, encoding components of the proteolytic system, lhv_1031 lhv_1152, lhv_1978 and lhv_0028 encoding restriction endonuclease genes, while bile salt hydrolase genes lba_0892 and lba_1078, and the sugar metabolism gene lba_1689 from Lb. acidophilus NCFM were identified as gut specific genes.
Comparative analysis revealed that if an organism had homologs to the dairy specific geneset, it probably came from a dairy environment, whilst if it had homologs to gut specific genes, it was highly likely to be of intestinal origin.
We propose that this "barcode" of 9 genes will be a useful initial guide to researchers in the LAB field to indicate an organism's ability to occupy a specific niche.
PMCID: PMC2660350  PMID: 19265535
6.  Genome Sequence of Lactobacillus helveticus, an Organism Distinguished by Selective Gene Loss and Insertion Sequence Element Expansion▿ †  
Journal of Bacteriology  2007;190(2):727-735.
Mobile genetic elements are major contributing factors to the generation of genetic diversity in prokaryotic organisms. For example, insertion sequence (IS) elements have been shown to specifically contribute to niche adaptation by promoting a variety of genetic rearrangements. The complete genome sequence of the cheese culture Lactobacillus helveticus DPC 4571 was determined and revealed significant conservation compared to three nondairy gut lactobacilli. Despite originating from significantly different environments, 65 to 75% of the genes were conserved between the commensal and dairy lactobacilli, which allowed key niche-specific gene sets to be described. However, the primary distinguishing feature was 213 IS elements in the DPC 4571 genome, 10 times more than for the other lactobacilli. Moreover, genome alignments revealed an unprecedented level of genome stability between these four Lactobacillus species, considering the number of IS elements in the L. helveticus genome. Comparative analysis also indicated that the IS elements were not the primary agents of niche adaptation for the L. helveticus genome. A clear bias toward the loss of genes reported to be important for gut colonization was observed for the cheese culture, but there was no clear evidence of IS-associated gene deletion and decay for the majority of genes lost. Furthermore, an extraordinary level of sequence diversity exists between copies of certain IS elements in the DPC 4571 genome, indicating they may represent an ancient component of the L. helveticus genome. These data suggest a special unobtrusive relationship between the DPC 4571 genome and its mobile DNA complement.
PMCID: PMC2223680  PMID: 17993529

Results 1-6 (6)