Over 960 completely sequenced prokaryote genomes were analysed (914 bacterial and 55 archaean), and 953 of these contained no detectable homologue of pepsin. This included all of the archaean genomes. However, pepsin homologues were detected in seven bacterial genomes, and these are shown in Table . Characteristics of the predicted proteins of these homologues are shown in Table . All the bacteria containing pepsin homologues are members of the class Gammaproteobacteria, and all except Marinomonas are members of the order Alteromonadales; Marinomonas is a member of the order Oceanospirillales. Despite the genomes having been completely sequenced, no pepsin homologue was detected in the proteomes of the following Shewanella species: S. baltica, S. frigidimarina, S. halifaxensis, S. oneidensis, S. pealeana, S. putrefaciens, S. sp. ANA-3, S. sp. MR-4, S. sp. MR-7, S. sp. W3-18-1 or S. woodyi, nor in Sinorhizobium meliloti.
Homologues of pepsin from completely sequenced bacterial genomes
Predicted characteristics of bacterial homologues of pepsin
BlastP searches of the NCBI non-redundant database using the bacterial pepsin homologues as queries returned only one other bacterial sequence. This was the A5A_A0203 gene product from Vibrio cholerae strain MZO-2 (UniProt accession A6A7Y6). This sequence consists of 382 residues with the peptidase unit predicted to be residues 14-257, and the active site residues to be Asp41, Phe87 and Asp249 (numbered according to the translated coding sequence). The genome of this particular strain is incomplete and no pepsin homologue is present in any of the other thirteen strains of Vibrio cholerae for which the genome sequences are complete. Because this sequence might be a contaminant, it has not been included in the analysis. A search of the UniProt sequence database using HMMER also failed to return any other sequences from bacteria. Two Psi-Blast searches of the NCBI non-redundant protein sequence database were undertaken using the amino acid sequences of the pepsin homologues from Shewanella amazonesis and Marinomonas as the seed sequences for 19 and 14 iterations respectively (returning 2888 known pepsin homologues); these also failed to find any new bacterial homologues.
Four homologues were found among the environmental samples nucleotide sequence database (see Table ), and all were from marine environments. The predicted protein sequences of AACY023056044 and AACY024067159 were virtually identical. None of these fragments were identical to any known pepsin homologue, but each was most closely related to a homologue from Shewanella and are likely to be derived from at least three other bacterial species. Because these are fragments only, they are not considered further.
Bacterial pepsin homologues in environmental samples
Several of the bacteria with pepsin homologues are marine and psychrophilic, including Colwellia psychrerythraea
(formerly Vibrio psychroerythus
] and Shewanella detrificans
]. Shewanella loihica
is also marine but associated with deep-sea, hydrothermal vents. Shewanella amazonensis
was isolated from marine sediments from the shallow waters of the Amazon river delta, and Shewanella sediminis
from sediments in Halifax Harbour, Canada [15
sp. MWYL1 is a salt marsh species initially isolated from the root surface of the grass Spartina anglica
]. Exceptionally, Sinorhizobium medicae
is found associated with root nodules of Medicago
]. There are completed genome for several other Shewanella
species that are also marine (S. baltica
and S. frigidimarina
, for example), but these do not contain pepsin homologues, so there is no apparent correlation between environment and presence of a pepsin homologue.
An alignment containing only bacterial homologues, human pepsin A and memapsins 1 and 2, generated by extracting the sequences from the MUSCLE alignment and removing all the gap-only columns, is shown in Fig. . As can be seen from this figure, both active site aspartates (Asp32 and Asp215) are conserved, indicating that the bacterial homologues are bilobed. This means that each homologue has the same structure for the peptidase unit as pepsin, and each would be active in the monomeric form.
Figure 1 Alignment of bacterial pepsin homologues with human pepsin A and memapsins 1 and 2. Residues are numbered according to mature human pepsin A. Inserts relative to pepsin A are indicated by letters. Each active site residue (Asp32, Tyr75 and Asp215) is (more ...)
None of the proteins were predicted to possess a signal peptide. This means that unlike the majority of members of peptidase family A1, these bacterial proteins are not secreted (the aspartic peptidase BcAP1 from the plant pathogenic fungus Botrytis cinerea
also does not possess a signal peptide nor disulfide bonds [18
]). Many members of family A1 are active at acidic pH, being secreted in the stomach, or to the lysosome or plant and fungal vacuoles, but a cytoplasmic peptidase would presumably be active at neutral pH. Amongst the mammalian pepsin homologues, renin is secreted into the blood and is also active at neutral pH. This change in pH optimum has been explained by the replacement of Thr218 by Ala, thereby preventing a hydrogen bond forming which affects the acidity of the active site residue Asp215. That this replacement is also found in the HIV retropepsin, which is also active at neutral pH, supports this hypothesis [19
]. Intriguingly, six of the seven bacterial homologues also have Ala218 (see Fig. ). In pepsin A, Thr218 is hydrogen-bonded to Asp303, but in renin both residues are replaced by Ala and site-directed mutagenesis of Ala303 for Asp lowered the pH optimum [20
]. In the sequences from Shewanella
, Asp303 is replaced by Leu, which is an isosteric replacement. In the Sinorhizobium
sequence Asp303 is replaced by Arg, and in the Marinomonas
sequence it is replaced by Ser. There would be no problem accommodating the smaller Ser in the available space, but the larger Arg could be more problematic.
Fig. shows that of the cysteines forming the three disulphide bridges found in many of the eukaryotic homologues, only those forming the first are conserved in the Sinorhizobium
homologues. Cys282 from the third disulphide bridge is retained in the sequences from Colwellia
, Shewanella loihica
and S. sediminis
, which may mean that the proteins are thiol-dependent. Disulphide bridges would not be expected in intracellular proteins. The Sinorhizobium
homologues also differ in having Phe75 instead of Tyr (although it must be acknowledged that the alignment here is by no means certain); although this is a common replacement, none of the homologues with Phe75 has ever been biochemically characterized or shown to be catalytically active. Replacement of Tyr75 for Phe in rhizopuspepsin by site-directed mutagenesis led to some, weakened activity [21
]. The hydrophobic-hydrophobic-Gly motifs in the psi-loops are conserved in all the bacterial homologues except Sinorhizobium
. Because these bacterial homologues have inserts here, the alignment is uncertain. Although it is possible that the inserts might compensate for the loss of the hydrophobic-hydrophobic-Gly motif, which would imply a different structure in this region, it is much more likely that the two proteins lacking these motifs are not active as peptidases.
From the solved tertiary structure the residues forming the S1 and S1' substrate-binding pockets for human pepsin A are known. Pepsin A prefers large hydrophobic residues (Phe and Leu) in substrates for both P1 and P1', and the substrate-binding pockets are correspondingly lined with hydrophobic residues [22
]. The bacterial homologues, with the exception of that from Marinomonas
, have most of the substrate-binding residues conserved except Thr77, which is replaced by another hydrophobic residue, and Thr218, which is replaced by Ala (see Fig. ).
The phylogenetic tree (Fig. ) shows that all bar two of the bacterial sequences are close to the origin of the division between subfamilies on the tree, implying that horizontal transfer of genes from a recent eukaryote species to a bacterium is unlikely (the key is found in Additional File 1
). This is confirmed by the results from Alien Hunter (see Table ) which finds horizontal transfer of genes unlikely in all species except Marinomonas
. Considering that Marinomonas
is a commensal organism living on the root surface of a grass, the origin of a pepsin homologue via horizontal transfer of a gene from the host is not unexpected. The Marinomonas
homologue is most closely related to that from Sinorhizobium
. These sequences do not cluster with the other bacterial homologues on the tree. Sinorhizobium
is also a commensal organism, living in the root nodules of legumes, but the region containing the gene was not predicted to be the result of horizontal transfer. There are therefore two groups of bacterial pepsins, one containing sequences from Shewanella
and one containing sequences from Sinorhizobium
Figure 2 Phylogenetic tree derived from members of peptidase family A1. The tree was generated for all peptidase unit sequences of family A1 holotypes, plus those from the sequences of the bacteria listed in Table 1. The tree is unrooted, but the sequence of plasmepsin-5, (more ...)
To further investigate whether the pepsin homologues in Shewanella
species were derived from horizontal gene transfer, we attempted to estimate the rate of mutation. We hypothesized that if the bacterial pepsins are of ancient origin then their rate of mutation must be very low otherwise we would not be able to see the sequence similarity to eukaryotic proteins. We calculated the percentage identities between the pepsin sequences from Shewanella
species and compared these with the percentage identities from other peptidase families. The families of signal peptidase 2 (peptidase family A8) and the ClpP subunit of endopeptidase Clp (family S16) were chosen because these are well conserved in bacteria, and all the Shewanella
species have only one homologue per family. The percentage identities with respect to the homologues from Shewanella denitrificans
are shown in Table , which shows that the ClpP subunit is the most stable, followed by signal peptidase 2 and then pepsin. The pepsin homologues are changing at twice the rate of the other two families. The closest homologues to S. denitrificans
signal peptidase 2 and ClpP are the respective proteins from S. loihica
. However, of the Shewanella
pepsin homologues, that from S. loihica
is most distantly related to S. denitrificans
, even more distantly related than the sequence from a species in a different genus, Colwellia
. To put the percentage identities into a geological time-frame, percentage identities were calculated for cathepsin D, a pepsin homologue found only in animals where divergence times can be estimated from the fossil record. The percentage identity between cathepsin D sequences from human and pig is 88% (human and pig diverged around 65 million years ago [23
]), and human and Xenopus tropicalis
is 73% (the species diverged around 350 million years ago [24
]); while that of human cathepsin D and nemepsin-2 from the nematode Ancylostoma caninum
is 52% (the species diverged around 660 million years ago [23
]). The mutation rate for pepsin homologues amongst bacteria must therefore be considerably higher than eukaryote members of the family, and this rapid mutation rate might explain their placing near the root of the phylogenetic tree.
Comparison of replacements among bacteria with pepsin homologues
The evidence as to whether the ancestral pepsin gene in bacteria originated from a horizontal gene transfer from a eukaryote or not can be summed up as follows. The positioning of most of the bacterial pepsins close to the divergence of the two subfamilies on the phylogenetic tree, the cytoplasmic location and lack of disulfide bridges and the scores from Alien Hunter imply no lateral transference; the absence of homologues in other closely related bacterial species and Archaea and the apparent fast mutation rate are points in favour of horizontal gene transfer.
If the bacterial genes are the result of lateral gene transference from an ancient eukaryote gene, then there is no known current eukaryote gene that resembles it, because nearly all modern pepsin homologues are secreted proteins and the ancient gene product would presumably be cytoplasmic. The eukaryotic peptidases that are most closely related to the bacterial homologues are the memapsins, as can be seen in Fig. . The memapsins and bacterial pepsins cluster with high confidence. It is possible that memapsins, which are widely distributed in mammalian tissues, might be closer to the ancestral peptidase from which all other vertebrate pepsin homologues are derived.
Aspartic-type peptidases unrelated to pepsin are known in bacteria, including the gpr peptidase from Bacillus megaterium
(peptidase family A25) [25
] and omptin from Escherichia coli
(peptidase family A26) [26
]. Neither peptidase is inhibited by pepstatin. Sporulation factor SpoIIGA (peptidase family U4) [27
] has also been claimed to be an aspartic peptidase because of the presence of a single Asp-Ser-Gly motif (the protein is assumed to be active as a dimer), but this alone is not sufficient because a same motif occurs around the active site Asp of the serine-type peptidase subtilisin. Pepstatin-sensitive aspartic peptidases have previously been found in Escherichia coli
and Haemophilus influenzae
, but the sequences were not homologous to that of pepsin but homologues of gluconate permease [28
]. A recent publication [29
] has reported an acidic peptidase from a Synergistes
species isolated from the anaerobic digester used for treatment of tannery solid waste. This peptidase is inhibited 75% by 0.01 mM pepstatin, but not by inhibitors of other catalytic types. No sequence is available, but this may be the first characterized pepsin homologue from a bacterium. Unlike the homologues from the marine bacteria, this is presumably a secreted protein. No complete genome sequence for any Synergistes
species is publicly available, so we were unable to test for the presence of a pepsin homologue. The genomes of Dethiosulfovibrio peptidovorans
and Thermanaerovibrio acidaminovorans
, both of which are members of the same bacterial phylum Synergistetes, have been partially sequenced (genome projects 20741 and 29531, respectively). These were searched with the amino acid sequences of pepsin homologues from Shewanella amazonensis
and Sinorhizobium medicae
, but no pepsin homologues were found, even at an E value of 10.