P2CS provides an integrated environment for exploration, visualization and annotation of TCS proteins from all available bacterial genomes and metagenomes via [21
]. The P2CS homepage contains a navigation bar that allows database browsing. Among the menus, users will also find P2CS Browse, which links directly to sortable lists of analysed genomes, plasmids and metagenomes. The selection of a microbe or a microbiome displays the result of the P2CS analysis process. It shows global counts of the different categories of TCSs and detailed class counts of each category. Each class result provides a clickable link to a detailed gene list. Selecting an object from the list of identifiers, displays a detailed gene description page with an image representing the gene in its genomic context, in the appropriate frame. Blast searches can be performed with the gene using external links, against the NCBI protein database or the annotated databases Swiss-Prot/TrEMBL. To obtain detailed information on a given gene, the software provides database links to investigate structural and functional domains of the corresponding protein sequence using the Conserved Domain Search service [22
], the Simple Modular Architecture Research Tool [23
] and the TMHMM transmembrane topology prediction method [24
]. The presence and location of signal peptide cleavage sites in amino acid sequences can also be checked using SIG-Pred [25
A second menu, P2CS Search, provides several search modes that allow users to request genes on the basis of their locus-tag, domain possession or TCS class. The search module builds search output as a tabular view that is linked to a full description and genomic context for each selected gene. The gene description page is the core exploration tool, providing several analysis options as described above. Analysis of each gene can be performed and users have the ability to display and propose the modification of any gene.
P2CS was designed to allow download of TCS data in tab-delimited format and generates a file compatible with spreadsheet programs such as Excel. For each genome and metagenome, users can also download the flat format files used for the construction of the database.
P2CS has been developed for computational analysis of the modular TCSs of prokaryotic genomes and metagenomes. It provides a complete overview of information on TCSs, including predicted candidate proteins and probable proteins, which need further curation/validation. The analysis process recovers each protein presenting N-terminal HisKA or C-terminal HATPase domains and classifies them as probable incomplete HK. The status can be changed through the manual curation process.
Users can modify annotation parameters and append comments, which are made available for consultation by other users. To ensure the integrity of the database, we propose to the interested experts to download formatted data and then after manual curation, the same downloaded files can be used as exchange format for an update of the database by the P2CS team.
One of the most important features of P2CS is the ability to search for TCSs within an ORFeome. One common problem of prokaryotic genome annotation is the accuracy of gene prediction and the loss of valuable data as a consequence of underestimation of the number of predicted genes. A blatant example is the genome of M. magneticum AMB-1
], with 23 overlooked TCS genes. A possible explanation for the high number of missing TCS genes is the GC richness of this genome (65%), which may constitute a complication in the gene prediction process.