Search tips
Search criteria

Results 1-12 (12)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  pfsearchV3: a code acceleration and heuristic to search PROSITE profiles 
Bioinformatics  2013;29(9):1215-1217.
Summary: The PROSITE resource provides a rich and well annotated source of signatures in the form of generalized profiles that allow protein domain detection and functional annotation. One of the major limiting factors in the application of PROSITE in genome and metagenome annotation pipelines is the time required to search protein sequence databases for putative matches. We describe an improved and optimized implementation of the PROSITE search tool pfsearch that, combined with a newly developed heuristic, addresses this limitation. On a modern x86_64 hyper-threaded quad-core desktop computer, the new pfsearchV3 is two orders of magnitude faster than the original algorithm.
Availability and implementation: Source code and binaries of pfsearchV3 are freely available for download at, implemented in C and supported on Linux. PROSITE generalized profiles including the heuristic cut-off scores are available at the same address.
PMCID: PMC3634184  PMID: 23505298
2.  New and continuing developments at PROSITE 
Nucleic Acids Research  2012;41(Database issue):D344-D347.
PROSITE ( consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule a collection of rules, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE signatures, together with ProRule, are used for the annotation of domains and features of UniProtKB/Swiss-Prot entries. Here, we describe recent developments that allow users to perform whole-proteome annotation as well as a number of filtering options that can be combined to perform powerful targeted searches for biological discovery. The latest version of PROSITE (release 20.85, of 30 August 2012) contains 1308 patterns, 1039 profiles and 1041 ProRules.
PMCID: PMC3531220  PMID: 23161676
3.  TriAnnot: A Versatile and High Performance Pipeline for the Automated Annotation of Plant Genomes 
In support of the international effort to obtain a reference sequence of the bread wheat genome and to provide plant communities dealing with large and complex genomes with a versatile, easy-to-use online automated tool for annotation, we have developed the TriAnnot pipeline. Its modular architecture allows for the annotation and masking of transposable elements, the structural, and functional annotation of protein-coding genes with an evidence-based quality indexing, and the identification of conserved non-coding sequences and molecular markers. The TriAnnot pipeline is parallelized on a 712 CPU computing cluster that can run a 1-Gb sequence annotation in less than 5 days. It is accessible through a web interface for small scale analyses or through a server for large scale annotations. The performance of TriAnnot was evaluated in terms of sensitivity, specificity, and general fitness using curated reference sequence sets from rice and wheat. In less than 8 h, TriAnnot was able to predict more than 83% of the 3,748 CDS from rice chromosome 1 with a fitness of 67.4%. On a set of 12 reference Mb-sized contigs from wheat chromosome 3B, TriAnnot predicted and annotated 93.3% of the genes among which 54% were perfectly identified in accordance with the reference annotation. It also allowed the curation of 12 genes based on new biological evidences, increasing the percentage of perfect gene prediction to 63%. TriAnnot systematically showed a higher fitness than other annotation pipelines that are not improved for wheat. As it is easily adaptable to the annotation of other plant genomes, TriAnnot should become a useful resource for the annotation of large and complex genomes in the future.
PMCID: PMC3355818  PMID: 22645565
cluster; gene models; pipeline; plant genome; structural and functional annotation; transposable elements; wheat
4.  PROSITE, a protein domain database for functional characterization and annotation 
Nucleic Acids Research  2009;38(Database issue):D161-D166.
PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE is largely used for the annotation of domain features of UniProtKB/Swiss-Prot entries. Among the 983 (DNA-binding) domains, repeats and zinc fingers present in Swiss-Prot (release 57.8 of 22 September 2009), 696 (∼70%) are annotated with PROSITE descriptors using information from ProRule. In order to allow better functional characterization of domains, PROSITE developments focus on subfamily specific profiles and a new profile building method giving more weight to functionally important residues. Here, we describe AMSA, an annotated multiple sequence alignment format used to build a new generation of generalized profiles, the migration of ScanProsite to Vital-IT, a cluster of 633 CPUs, and the adoption of the Distributed Annotation System (DAS) to facilitate PROSITE data integration and interchange with other sources. The latest version of PROSITE (release 20.54, of 22 September 2009) contains 1308 patterns, 863 profiles and 869 ProRules. PROSITE is accessible at:
PMCID: PMC2808866  PMID: 19858104
5.  Functional Differentiation of tbf1 Orthologues in Fission and Budding Yeasts▿ †  
Eukaryotic Cell  2008;8(2):207-216.
In Saccharomyces cerevisiae, TBF1, an essential gene, influences telomere function but also has other roles in the global regulation of transcription. We have identified a new member of the tbf1 gene family in the mammalian pathogen Pneumocystis carinii. We demonstrate by transspecies complementation that its ectopic expression can provide the essential functions of Schizosaccharomyces pombe tbf1 but that there is no rescue between fission and budding yeast orthologues. Our findings indicate that an essential function of this family of proteins has diverged in the budding and fission yeasts and suggest that effects on telomere length or structure are not the primary cause of inviability in S. pombe tbf1 null strains.
PMCID: PMC2643609  PMID: 19074598
6.  PeroxiBase: a database with new tools for peroxidase family classification 
Nucleic Acids Research  2008;37(Database issue):D261-D266.
Peroxidases (EC 1.11.1.x), which are encoded by small or large multigenic families, are involved in several important physiological and developmental processes. They use various peroxides as electron acceptors to catalyse a number of oxidative reactions and are present in almost all living organisms. We have created a peroxidase database ( that contains all identified peroxidase-encoding sequences (about 6000 sequences in 940 organisms). They are distributed between 11 superfamilies and about 60 subfamilies. All the sequences have been individually annotated and checked. PeroxiBase can be consulted using six major interlink sections ‘Classes’, ‘Organisms’, ‘Cellular localisations’, ‘Inducers’, ‘Repressors’ and ‘Tissue types’. General documentation on peroxidases and PeroxiBase is accessible in the ‘Documents’ section containing ‘Introduction’, ‘Class description’, ‘Publications’ and ‘Links’. In addition to the database, we have developed a tool to classify peroxidases based on the PROSITE profile methodology. To improve their specificity and to prevent overlaps between closely related subfamilies the profiles were built using a new strategy based on the silencing of residues. This new profile construction method and its discriminatory capacity have been tested and validated using the different peroxidase families and subfamilies present in the database. The peroxidase classification tool called PeroxiScan is accessible at the following address:
PMCID: PMC2686439  PMID: 18948296
7.  Functional Characterization of Pneumocystis carinii brl1 by Transspecies Complementation Analysis▿ †  
Eukaryotic Cell  2007;6(12):2448-2452.
Pneumocystis jirovecii is a fungus which causes severe opportunistic infections in immunocompromised humans. The brl1 gene of P. carinii infecting rats was identified and characterized by using bioinformatics in conjunction with functional complementation in Saccharomyces cerevisiae and Schizosaccharomyces pombe. The ectopic expression of this gene rescues null alleles of essential nuclear membrane proteins of the Brr6/Brl1 family in both yeasts.
PMCID: PMC2168235  PMID: 17993570
8.  The 20 years of PROSITE 
Nucleic Acids Research  2007;36(Database issue):D245-D249.
PROSITE consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule, a collection of rules based on profiles and patterns, which increases the discriminatory power of profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. In this article, we describe the implementation of a new method to assign a status to pattern matches, the new PROSITE web page and a new approach to improve the specificity and sensitivity of PROSITE methods. The latest version of PROSITE (release 20.19 of 11 September 2007) contains 1319 patterns, 745 profiles and 764 ProRules. Over the past 2 years, about 200 domains have been added, and now 53% of UniProtKB/Swiss-Prot entries (release 54.2 of 11 September 2007) have a PROSITE match. PROSITE is available on the web at:
PMCID: PMC2238851  PMID: 18003654
9.  MyHits: improvements to an interactive resource for analyzing protein sequences 
Nucleic Acids Research  2007;35(Web Server issue):W433-W437.
The MyHits web site ( is an integrated service dedicated to the analysis of protein sequences. Since its first description in 2004, both the user interface and the back end of the server were improved. A number of tools (e.g. MAFFT, Jacop, Dotlet, Jalview, ESTScan) were added or updated to improve the usability of the service. The MySQL schema and its associated API were revamped and the database engine (HitKeeper) was separated from the web interface. This paper summarizes the current status of the server, with an emphasis on the new services.
PMCID: PMC1933190  PMID: 17545200
10.  New developments in the InterPro database 
Nucleic Acids Research  2007;35(Database issue):D224-D228.
InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following protein signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D and PANTHER. The latter two new member databases have been integrated since the last publication in this journal. There have been several new developments in InterPro, including an additional reading field, new database links, extensions to the web interface and additional match XML files. InterPro has always provided matches to UniProtKB proteins on the website and in the match XML file on the FTP site. Additional matches to proteins in UniParc (UniProt archive) are now available for download in the new match XML files only. The latest InterPro release (13.0) contains more than 13 000 entries, covering over 78% of all proteins in UniProtKB. The database is available for text- and sequence-based searches via a webserver (), and for download by anonymous FTP (). The InterProScan search tool is now also available via a web service at .
PMCID: PMC1899100  PMID: 17202162
11.  The PROSITE database 
Nucleic Acids Research  2005;34(Database issue):D227-D230.
The PROSITE database consists of a large collection of biologically meaningful signatures that are described as patterns or profiles. Each signature is linked to a documentation that provides useful biological information on the protein family, domain or functional site identified by the signature. The PROSITE database is now complemented by a series of rules that can give more precise information about specific residues. During the last 2 years, the documentation and the ScanProsite web pages were redesigned to add more functionalities. The latest version of PROSITE (release 19.11 of September 27, 2005) contains 1329 patterns and 552 profile entries. Over the past 2 years more than 200 domains have been added, and now 52% of UniProtKB/Swiss-Prot entries (release 48.1 of September 27, 2005) have a cross-reference to a PROSITE entry. The database is accessible at .
PMCID: PMC1347426  PMID: 16381852
12.  MyHits: a new interactive resource for protein annotation and domain identification 
Nucleic Acids Research  2004;32(Web Server issue):W332-W335.
The MyHits web server ( is a new integrated service dedicated to the annotation of protein sequences and to the analysis of their domains and signatures. Guest users can use the system anonymously, with full access to (i) standard bioinformatics programs (e.g. PSI-BLAST, ClustalW, T-Coffee, Jalview); (ii) a large number of protein sequence databases, including standard (Swiss-Prot, TrEMBL) and locally developed databases (splice variants); (iii) databases of protein motifs (Prosite, Interpro); (iv) a precomputed list of matches (‘hits’) between the sequence and motif databases. All databases are updated on a weekly basis and the hit list is kept up to date incrementally. The MyHits server also includes a new collection of tools to generate graphical representations of pairwise and multiple sequence alignments including their annotated features. Free registration enables users to upload their own sequences and motifs to private databases. These are then made available through the same web interface and the same set of analytical tools. Registered users can manage their own sequences and annotations using only web tools and freeze their data in their private database for publication purposes.
PMCID: PMC441617  PMID: 15215405

Results 1-12 (12)