While relaxin evolution has been the centre of much controversy (relaxin is often cited as a gene that conflicts with the Darwinian theory of evolution [
24,
32-
34]), this report is the first attempt to describe the evolutionary history of the whole relaxin-like peptide family from a phylogenetic perspective. Previous studies have concentrated on the primate relaxins and relaxin-like factors [
26], or not included detailed phylogenetic analyses [
27]. We have sought to expand upon these by incorporating sequences identified in all the available completed genomes with a subset of cloned relaxin-like sequences, particularly those from non-mammalian species.
None of the phylogenetic tree construction programs used was able to completely resolve the evolution of the relaxin-like peptide family. This is likely due to variable divergence across the family and the short sequence length [
35]. Incorporating results from the MP and NJ methods suggested positions for several branches that were unresolved after ML analysis. Minimizing incongruence between the gene and species trees by reducing the number of assumed duplications in the reconciled tree also provided a method to infer the evolutionary history of this family.
Similarly to previously published results, searches of available genomic and EST data failed to identify any novel members of the relaxin-like peptide family [
28]. Given the stringent and well-described insulin family signature that revolves around the invariant cysteine residues that confer the insulin-like structure seen across the superfamily, we find it improbable that any novel relaxin or insulin-like sequences will be identified.
The presence of an invertebrate relaxin has been of speculation since 1983 when relaxin-like activity was first detected in the protozoa,
T. pyriformis [
22]. Similar activity was reported in
H. momus [
23] and in
C. intestinalis, where a cDNA sequence almost identical to pig relaxin was found [
24]. However, our searches of all completed invertebrate genomes (including
C. intestinalis) failed to identify any relaxin-like sequences, including the published sequence. Multiple insulin-like peptides have been found in several invertebrates, including:
Bombyxi mori (silkworm) [
36],
D. melanogaster [
37] and
C. elegans [
38]. As these sequences lack the relaxin-specific motif, and show no homology to other relaxin family peptides, they are not considered part of the relaxin subfamily and therefore have not been included in these analyses. Much of the controversy surrounding relaxin evolution concerns the identification of an invertebrate relaxin sequence (a cDNA sequence from
Ciona intestinalis) almost identical to pig relaxin (Georges and Schwabe, 1999). Completion of the
C. intestinalis and other invertebrate genomes has allowed us to conclude that there is not a relaxin-like sequence in any invertebrate sequenced to date. If an invertebrate relaxin does exist, it does not contain the relaxin-specific motif characterized in vertebrates.
A hallmark of relaxin sequences is their high variability, even amongst closely related species. Relaxin-like peptide sequences isolated from two whales are almost identical to porcine relaxin [
21], however as these sequences were derived from amino acid sequencing, without nucleotide or and genomic sequence available, they have not been able to be included in these phylogenetic analyses.
The presence of a functional relaxin in the ruminant lineage has yet to be confirmed [
25]. More genomic data is required to confirm the presence of a non-functional relaxin gene sequence in the bovine, similar to that observed in the ovine [
25]. Searches of the preliminary bovine genome assembly have failed to find a relaxin gene. Interestingly, a relaxin sequence has been identified in the camel [
39] and relaxin expression found in the closely related llama and alpaca [
40]. While classified as a ruminant, Camelidae have a unique reproductive anatomy and physiology [
41]. A bovine EST (BI682322) with high similarity to exon 2 of human relaxin-3 was identified. Confirmation of the presence of relaxin and relaxin-3 orthologs in ruminants awaits further sequencing of the bovine and ovine genomes.
The presence of an avian relaxin has also been of speculation. While relaxin-like activity has been reported in the chicken [
42], an avian relaxin-like peptide or gene has not been identified. While two relaxin-3-like genes were identified in the nearly completed chicken genome, no avian relaxin gene was found. As no other relaxin-like genes were found, the reported relaxin activity may be due to one of the relaxin-3-like genes.
The phylogeny of the relaxin-like peptide family indicates relaxin-3 is the ancestral relaxin, appearing prior to the divergence of teleosts. The finding of multiple relaxin-3-like genes in the fugu fish and zebrafish suggests multiple lineage-specific duplications of a single relaxin-3-like ancestor have occurred in fish [
27]. However, the possibility the other mammal specific relaxin-like peptides emerged earlier before being lost in the teleost can not be excluded [
27]. We find it more likely that these duplications, and the resulting multiple relaxin-3-like genes, are fish specific and due to genome wide duplications hypothesized to have occurred during fish evolution [
43]. Phylogenetic analyses show multiple fish homologs of both the mammalian relaxin-3 and INSL5 genes, meaning that two relaxin-3-like genes existed prior to the genome duplication event proposed to have occurred in the teleost ancestor. The putative fish relaxin homolog was either, present in the teleost ancestor, duplicated and the second copy lost or emerged shortly after or, as a result of, the genome-wide duplication event.
While termed relaxin-3-like based on sequence similarity, phylogenetic analysis indicates that several non-mammalian sequences (OmRLX3, DrRLX3b, DrRLX3d, TrRLX3d, TrRLX3e and GgRLX3b) could be INSL5 homologs. None of the sequences found in the complete X. tropicalis genome were placed in this group, while there are members present in the more ancient fish lineage and the younger avian lineage. It is possible that this gene has either been lost, or remains unidentified, in the X. tropicalis genome. A sequence with similarity only to the B chain of relaxin-3 was also found, but a corresponding A chain match was not, however, there is a gap in the genome assembly upstream which might contain the missing domain. Future assemblies of the Xenopus genome should resolve this issue. These results suggest that INSL5 could have emerged during teleost evolution, far earlier than previously believed. Unlike the mammal-specific relaxin-like genes, which are clustered together (on chromosome 9 in the human and chromosome 19 in the mouse), INSL5 is localized independently (chromosome 1 in the human and chromosome 4 in the mouse). These findings are of particular interest in the analysis of INSL5, which is still functionally uncharacterised.
All the potential non-mammalian INSL5 homologs retain the relaxin-specific B chain [RxxxRxxI/V] motif, hence would be capable of interacting with the relaxin receptor, LGR7, and thus functionally classified as a relaxin. Recent studies have shown INSL5 is a high affinity ligand for GPCR142 but not GPCR135, LGR7 or LGR8 [
19]. As the residues required for interaction with GPCR135 and GPCR142 are not known, it is unknown whether the non-mammalian INSL5 homologs would interact with GPCR142, GPCR135 and/or LGR7.
Phylogenetic results from this study suggest the presence of a relaxin homolog in fish and frogs, although not in the chicken. Relaxin sequences have previously been isolated and peptide sequenced from either the ovaries or testes of the edible frog [
30], little skate (
Raja erinacea) [
44], spiny dogfish (
Squalus acanthias) [
45], Atlantic stingray (
Dasyatis sabina) [
46] and the sand tiger shark (
Odontaspis taurus) [
47]. While having high similarity with relaxin-3, these sequences are not relaxin-3 homologs (as the B chain of the stingray sequence is lacking the relaxin-specific motif, it is not a functional relaxin [
46] and has not been considered further). Based on the expression of all these genes in reproductive organs such as the testes and ovaries, and the failure to find the
R. esculenta gene expressed in the brain using northern blot analysis [
30], we believe these to be among the first relaxin peptides with a reproductive function. Based on the similarity with relaxin-3 observed in these sequences, the ancestral relaxin homolog, and its new reproductive function, is likely to have emerged prior to the divergence of teleosts. A complete picture of relaxin-like peptides present in non-mammalian genomes will be invaluable in understanding the evolution of relaxin from neuropeptide to reproductive hormone.
The ancestral
RLN3 gene is under very strong purifying selection, highlighting the importance of its highly conserved function, likely to be in the brain [
2]. As high divergence is a hallmark of relaxin sequences, it is somewhat unsurprising that
RLN2 is under only weak purifying selection. We suggest that this lack of selective pressure has contributed to the high sequence divergence seen between many relaxins (e.g. human and mouse) and the differences in relaxin's functions observed across mammals.
Information about the selective constraints placed upon these peptides, can provide valuable insight into the nature of interactions with their receptors. Based on selection pressures we can conclude that the interactions between relaxin-3 and GPCR135, INSL5 and GPCR142 are very specific, while the binding of relaxin to LGR7 is much looser. In this context the cross-reactivity seen between LGR7 and INSL3 or H1 relaxin, which are both similar to relaxin in sequence but especially in structure, is understandable, as is the lack of binding between GPCR135 and GPCR142 with any other relaxin-like peptide. Unexpectedly, synonymous and nonsynonymous substitution rate estimates for
RLN1 and
INSL6 show these to be under positive selection. Positive selection is often difficult to observe using pairwise comparisons that average over the whole length of a sequence, making these results even more striking. While pairwise comparisons failed to confirm positive selection was acting on
INSL4, further statistical tests suggested that positive Darwinian selection acted on several sites in the
INSL4 sequence after its emergence. Further analysis will be required to confirm these sites as important in the acquisition of a new receptor and a new function by INSL4, particularly in light of recent studies that question the reliability of ML methods to accurately detect positive selection acting on single sites [
48-
50]. We are encouraged that both branch-specific and site-specific ML models find positive selection to be acting on INSL4.
When the B and A domains of each gene were analyzed separately, further differences in selection pressures became apparent. The interaction between relaxin and its receptor has been thought to be primarily mediated through the B chain of the peptide [
4], so the finding that selection pressures are stronger on the A chain of relaxin-1, INSL4 and INSL6 was unexpected. We also find it noteworthy that INSL4, INSL6 and relaxin-1 are the most recent members of the family to emerge and all appear to be under the effects of positive Darwinian selection.
INSL6 emerged during mammalian development,
INSL4 and
RLN1 during primate evolution, they remain functionally uncharacterized and INSL4 and INSL6 are without known receptors. The low selection pressure on the B domain and the strong constraints placed on the A domain of INSL4 and INSL6 suggests that, unlike the B chain mediated interaction of relaxin and INSL3 with their receptors, the interaction of these peptides with their receptors could be dependant on the A chain instead. The low
dN/
dS rate observed for
INSL5 indicates this peptide to be evolutionary stable and of functional importance. In particular the constraints placed on both A and B chains of INSL5 suggest a well-defined receptor interaction system, while the total absence of these constraints on either chain within relaxin-1 suggests the opposite, that perhaps this peptide is still evolving its function.