Since their discovery in 1979, scavenger receptors have been defined by their ability to ‘scavenge’ modified LDL from their environment for internalization and subsequent degradation
]. As more proteins were discovered that fit this definition, the SRs came to represent a polyphyletic group of receptors with varying domain architectures and protein structures that appear to have arose independently (for example, although CD36, a class B SR, also binds modified lipids, permutation tests show that it is unrelated to SRAI (data not shown)). This prompted the introduction of subclasses to group structurally similar proteins
]. However, even within the class A subclass there is considerable variability. Functionally, for example, MARCO can bind acLDL
], SRAI can bind both oxLDL and acLDL
], and SCARA5 can bind neither
]. Structurally, the cA-SRs differ at their C-terminal region and in the lengths of their other domains (Figure
). There is very little justification for grouping the cA-SRs together based on the original definition of ligand binding unless there is an evolutionary relationship amongst the members.
To investigate the evolutionary connection within the cA-SRs, we first needed to definitively characterize the domain architecture of these proteins. Domain boundaries had been previously defined for the individual members of the cA-SRs, but usually in comparison to SRAI and were not based on current tools. Our findings (Figure
, Additional file
: Table S2) suggest that there are 4 domains - cytoplasmic, transmembrane, α
-helical, and collagenous - shared by all members of the cA-SRs. Conserved motifs in these domains common across the cA-SRs suggest not only a common origin of these proteins, but also that they may share significant functionality with each other (Figure
). While the lengths and consistency of the cytoplasmic and transmembrane domains remain mostly fixed, the α
-helical and collagenous domains vary in length across the receptors in a manner consistent with the possibility of repeats brought about by recombination or duplication events
]. In contrast, the fifth terminal domain differs or is absent in the cA-SRs. While SRAI, MARCO, and SCARA5 share a SRCR domain at their terminus, SCARA4 possesses a C-type lectin domain and SCARA3 terminates at its collagenous region. The SRCR and C-type lectin domains are both able to recognize pathogens
], suggesting that the radiation in this region may be due to a domain swapping event that may have allowed for the diversification of host pathogen recognition
Data mining was used to identify known and novel cA-SRs in publicly available databases. Conservation of these proteins across vertebrate species was examined via phylogenetics. No cA-SRs were identified in available non-vertebrate genomes, implying that although the individual domains that make up these receptors - specifically the SRCR and C-type lectin domains - are ancient, the modern cA-SR domain architecture likely arose after the divergence of vertebrates from other species. Using these sequences, the relationships between the 5 members of the cA-SRs were analyzed.
To determine a shared evolutionary ancestry amongst all 5 members of the cA-SRs, permutation tests were performed using the representative Homo sapiens protein sequences, which revealed significant sequence similarity between all of these proteins (Table
). Additionally, notable motifs shared amongst all or most receptors were identified (Figure
), lending definitive reason for these proteins to be classified as a protein family.
Phylogenetic analyses allowed us to hypothesize regarding the evolutionary history of this protein family. First, analyses presented in Figures
indicate that SRAI and SCARA5 are most closely related to each other than to the other cA-SRs. This finding is further supported in the fact that the highest amount of sequence similarity is shared between SRAI and SCARA5 (Table
). This is unsurprising given what is known biologically about these 2 proteins. Although little research has been completed on SCARA5, it is known that both it and SRAI bind Gram-positive and -negative bacteria
] and are both hypothesized to be involved in host defense
]. Second, SCARA3 and SCARA4 were also identified as closely related proteins. Not only are their domain lengths similar (Figure
), but these proteins are also presented as an independent cluster in the phylogenetic analysis of all cA-SRs (Figure
). Although they are not well studied, from what we know these 2 proteins do not share much functionality. From what little is known regarding SCARA4, this receptor appears to function in a similar fashion to the SRCR-containing cA-SRs by binding Gram-positive and -negative bacteria and being expressed on cells involved in host defense
]. In contrast, SCARA3 is expressed on fibroblasts and has been proposed to protect against reactive oxygen species by binding and internalizing oxidative molecules
]. However, the lengths and general composition of SCARA3 and SCARA4 proteins are very similar as indicated by a shared percent identity of 26.6% across the full-lengths of their proteins (Table
). Perhaps the differences in their biological functions are restricted to the presence of a C-terminal C-type lectin domain in SCARA4 and the potentially lost terminal domain in SCARA3.
Lastly, the positioning of MARCO is intermediate between the SRAI/SCARA5 and SCARA3/SCARA4 clusters. The phylogenetic evidence presented in Figure
suggests that this protein sequence is most similar to SCARA3/SCARA4 with high posterior probabilities and bootstrap support. However, percent identity measures (Table
) as well as functional evidence suggests that it is most similar to the other SRCR-containing receptors. For example, research conducted by Arredouani et al.
demonstrates that both SRAI and MARCO are essential for clearance of bacteria and inert particles from the lungs
], indicating that even though MARCO is more evolutionarily related to SCARA3 and SCARA4, it is more functionally related to the SRCR-containing receptors. Further analysis of the exon gene structures of the cA-SRs or further functional analyses of all 5 members may help resolve this inconsistency.
This data supports the hypothesis of a single ancestral cA-SR from which duplication events occurred allowing for the diversification of this group. We propose that 4 independent gene duplication events occurred allowing for the presence of 5 cA-SRs in vertebrate species. This common ancestor likely included most of the common features of the cA-SRs including the transmembrane, α-helical, and collagenous domains, and may have also contained the SRCR domain shared by 3 of the 5 cA-SRs. This ancestral cA-SR may have duplicated (Figure
, Event 1) into 2 distinct proteins (labelled 1.1 and 1.2) which would have contained the domain structure typical of this group (i.e. cytoplasmic, transmembrane, collagenous, and C-terminal domains). A second duplication event of putative proto-gene 1.1 (Figure
, Event 2) would have resulted in the genes that differentiated into SRAI and SCARA5. The putative 1.2 gene would have contained an SRCR coding domain, and possibly an extended collagenous region (as compared to 1.1). This SRCR encoding region would likely have been lost in the predecessor of SCARA3 and SCARA4 upon a third duplication event, which would have resulted in the ancestral gene encoding MARCO (Figure
, Event 3). The SRCR domain may have been replicated by a C-type lectin domain in the predecessor of SCARA3 and SCARA4 and later lost in SCARA3 when a fourth duplication event resulted in the divergence from SCARA3 and SCARA4 (Figure
, Event 4), or may have simply replaced the C-type lectin of SCARA4.