The cells of our resident microbiota are estimated to outnumber human cells by factor of 10 and to encode
in toto a significantly more extensive proteomes than the human genome
[1]. This vast microbial proteome can be considered as an extension of our own as microorganisms are known to mediate numerous metabolic capabilities not carried out by mammalian cells and influence important aspects of human development, immunity and nutrition
[1],
[2]. The symbiotic relationship between humans and their microbiota ranges from mutualism through commensalism to parasitism, which can be considered to form a continuum rather than discretely defined phenotypes
[3]. Despite the importance of the human microbiota to health there are currently significant gaps in our understanding of the molecular basis of host-microbe interactions, in particular for mutualistic outcomes. Hence there is currently tremendous interest in investigating the proteome complement of the human mucosal microbiota, as the mucosal surfaces are the dominant interface for host-microbe interactions, with microbial cell surface and secreted proteins likely representing key players mediating interactions for both mutualistic and pathogenic outcomes
[4],
[5],
[6]. The mucus gel, the defining feature of mucosal surfaces, acts as an important defensive layer protecting the underlying epithelial cells from chemical, physical and microbial attacks
[7],
[8],
[9]. Indeed, many pathogens produce adhesins that binds to, and enzymes that degrade, mucins, the major component of mucus to enable access to the underlying cells and tissue
[7],
[10]. In addition a small fraction of gut mutualists are also known to degrade mucins, which represent an important source of nutrients for these microbes and contributes to the overall mucosa homeostasis
[8],
[9]. Mucins are a family of high molecular weight glycoproteins composed of a linear peptide backbone heavily decorated with long oligosaccharide side chains
[7],
[9]. These sugar chains are usually
O-linked and can make up 50–80% of the mucin by weight. Degradation of mucins thus requires the concerted action of both glycosidases and peptidases
[9],
[10] but nothing is currently known about such peptidases among mutualists
[9].
Unique or enriched proteins/protein domains encoded by microorganisms sharing a given phenotype/trait, including the capacity to thrive in a given habitat, can be revealed through comparative genomics
[11],
[12],
[13],
[14], with such genotypic features being thought to correspond to specific adaptations of the autochthonous microbes for their habitat(s) of predilection. The availability of vast, and rapidly expanding, genome sequence databases enables comparative genomics to be performed over a wide range of organisms across the three domains of cellular life, encompassing a broad diversity of habitats
[15]. Current sequencing technologies are also enabling metagenomic studies of microbial communities from various habitats providing additional opportunities to generate more comprehensive understanding of the molecular basis of host-microbes associations
[1],
[2],
[16],
[17].
The elucidation of the genomes of two important human mucosal pathogens, the microbial eukaryotes
Entamoeba histolytica, a pathogen of the gastrointestinal tract (GIT)
[18],
[19], and
Trichomonas vaginalis a pathogen of the urogenital tract (UGT)
[20],
[21], identified several genes and gene families encoding putative enzymes and surface proteins shared with other mucosal microbes through lateral gene transfers (LGT), including pathogenic and mutualistic Bacteria
[22],
[23],
[24]. One family of
T. vaginalis candidate surface proteins, with two members recently shown to be expressed on the cell surface of analysed clinical isolates
[25], showed significant sequence similarities
[23] to an
E. histolytica immuno-dominant surface protein
[26]. However, little is currently known about the function of these surface proteins from
E. histolytica or
T. vaginalis. The
E. histolytica protein contains a domain with similarity to carbohydrate-binding module (CBM) from family 32 and was recently shown to accumulate at the surface of the parasite uropods
[27] and might be involved in phagocytosis
[28],
[29]. CBMs are discrete folded domains that bind complex glycans and are normally found as ancillary modules in carbohydrate-active enzymes
[30],
[31]. They are organised into sequence-based families and display specificity for a wide range of mainly polymeric saccharide ligands
[32]. While ligand specificity within a family is often not conserved it can be indicative of the likely activity of an uncharacterised CBM sequence belonging to that family.
Here we present the in silico characterisation of a novel protein domain shared between the E. histolytica and T. vaginalis surface proteins in relation to their taxonomic distribution, structural organisation and potential functions. Our analyses demonstrated that the novel domain (Pfam entry PF13402, named M60-like/PF13402) defines a new sub-family of extracellular zinc (Zn)-metallopeptidases that are conserved amongst a range of host-associated bacterial and eukaryotic microbes including mutualists and pathogens of invertebrates and vertebrates. The great majority of microbial M60-like/PF13402 containing proteins possess surface anchoring motifs and the putative peptidase domain was often associated with sequences that have been implicated in complex glycan recognition such as CBMs. Biochemical analyses demonstrated that an M60-like/PF13402 protein from Bacteroides thetaiotaomicron, a prominent member of our indigenous gut microbiota, displayed metal and a catalytic glutamate residue dependent proteolytic activity against mammalian mucins, identifying the first peptidase with mucinase activity from a human mutualist. These data strongly support the hypothesis that M60-like/PF13402 containing proteins play important roles in colonisation of the invertebrate digestive tract and vertebrate mucosal surfaces by a broad diversity of mutualistic and pathogenic microbes.