|Home | About | Journals | Submit | Contact Us | Français|
Male reproductive fitness is strongly affected by seminal fluid. In addition to interacting with the female environment, seminal fluid mediates important physiological characteristics of sperm, including capacitation and motility. In mammals, the male reproductive tract shows a striking degree of compartmentalization, with at least six distinct tissue types contributing material that is combined with sperm in an ejaculate. Although studies of whole ejaculates have been undertaken in some species, we lack a comprehensive picture of the specific proteins produced by different accessory tissues. Here, we perform proteomic investigations of six regions of the male reproductive tract in mice—seminal vesicles, anterior prostate, dorsolateral prostate, ventral prostate, bulbourethral gland, and bulbourethral diverticulum. We identify 766 proteins that could be mapped to 506 unique genes and compare them with a high-quality human seminal fluid data set. We find that Gene Ontology functions of seminal proteins are largely conserved between mice and humans. By placing these data in an evolutionary framework, we show that seminal vesicle proteins have experienced a significantly higher rate of nonsynonymous substitution compared with the genome, which could be the result of adaptive evolution. In contrast, proteins from the other five tissues showed significantly lower nonsynonymous substitution, revealing a previously unappreciated level of evolutionary constraint acting on the majority of male reproductive proteins.
For sexually reproducing organisms, the process of fertilization involves complex interactions between male- and female-derived proteins. These interactions extend beyond the fusion of gametes and can influence many important events leading up to fertilization. When internally fertilizing species mate, males contribute not only sperm but also a complex mixture of seminal fluid derived from a suite of accessory tissues.
Seminal fluid has many functions during reproduction, including regulation of sperm motility (Peitz and Olds-Clarke 1986; Agrawal and Vanha-Perttula 1987; Peitz 1988), mediation of sperm capacitation (Huang et al. 2000; Kawano and Yoshida 2007), suppression of the female immune system (Peitz and Bennett 1981; Anderson and Tarter 1982; Thaler 1989), and interaction with sperm from other males (Prout and Clark 2000). In mammals, experimental removal of accessory glands results in reduced pregnancy rate, a reduction in the number of young produced, and/or delays in embryonic development (Pang et al. 1979; Queen et al. 1981; Peitz and Olds-Clarke 1986; O et al. 1988; Henault et al. 1995; Carballada and Esponda 1999). Many seminal fluid proteins bind directly to sperm and comigrate with sperm through the female reproductive tract (Irwin et al. 1983; Robinson et al. 1987; Carballada and Esponda 1997, 1998).
Seminal fluid proteins might also contribute to reproductive isolation between closely related species. For example, Dean and Nachman (2009) showed that fertilization rate was significantly faster in conspecific versus heterospecific matings involving Mus domesticus and Mus musculus, and seminal fluid may mediate these interactions.
Despite the central role of seminal fluid in reproductive processes, relatively little is known about the proteins produced by specific accessory glands in mammals. Previous work usually focused on a few highly abundant proteins derived from the seminal vesicles and the prostate (Platz and Wolfe 1969; Mann and Lutwak-Mann 1981; Shivaji et al. 1990; Carballada and Esponda 1991), whereas more comprehensive expression studies focused on a single region of the reproductive tract, such as the prostate or epididymis (Abbott et al. 2003; Berquin et al. 2005; Johnston et al. 2005; Fujimoto et al. 2006; Jelinsky et al. 2007).
One striking feature of the male reproductive tract of mammals is its complex compartmentalization. Upon ejaculation, sperm is mixed with fluids from at least four distinct male accessory organs—the seminal vesicles, the prostate, the bulbourethral gland, and the bulbourethral diverticulum. Anatomical, histological, and gene expression data suggest that the mouse prostate can be further subdivided into at least three distinct biological regions: the anterior, dorsolateral, and ventral lobes (Hayashi et al. 1991; Abbott et al. 2003; Berquin et al. 2005). Although one of the central findings of molecular evolution has been that genes underlying male reproduction evolve rapidly (reviewed by Clark et al. 2006), it is unknown whether this pattern applies to genes expressed throughout all compartments of the male reproductive tract or whether it is localized to a few tissues that produce a specific suite of proteins.
We performed tandem mass spectrometry (MS) analyses on proteins isolated from extracellular regions of six distinct male reproductive tissues in Mus—the seminal vesicles, the anterior prostate (also referred to as the coagulating gland), the dorsolateral prostate, the ventral prostate, the bulbourethral gland (also referred to as the Cowper's gland), and the bulbourethral diverticulum. Other regions of the male reproductive tract, such as the ampullary glands, Rete testis, and epididymis, may contribute seminal fluid but were not included because their small size precluded adequate extraction of proteins. We compared our data with a high-quality human seminal fluid data set (Pilch and Mann 2006) and report four key findings: 1) different regions of the male reproductive tract produce different suites of proteins, 2) some male reproductive genes occur in currently unannotated regions of the reference genome, 3) seminal proteins of mouse and human have similar Gene Ontology (GO) functions, and 4) there is heterogeneity in evolutionary rate among proteins from different regions of the male reproductive tract. Although proteins expressed in the seminal vesicles evolve significantly more rapidly than the genome average, proteins from the other five regions show an opposite pattern of significant evolutionary constraint. Our study provides an important future resource for understanding variance in male reproductive fitness.
We avoided wild caught mice in this experiment because wild genotypes cannot be replicated. Classical inbred strains of mice carry replicable genotypes by virtue of their homozygosity, but their genomes include material from three distinct species (M. domesticus, M. musculus, and Mus castaneus, also referred to as subspecies in the literature) (Silver 1995; Wade et al. 2002; Frazer et al. 2007; Yang et al. 2007), so their relevance to natural variation is uncertain. Therefore, we focused on the wild-derived inbred strains LEWES/EiJ (LEWES) and WSB/Eij (WSB). These strains were both derived from natural populations of M. domesticus outside of the European hybrid zone and have the standard karyotype (2n = 40). They were initially obtained from the Jackson Laboratory (Bar Harbor, ME) and maintained in our laboratory for more than 10 generations.
By crossing a female LEWES to a male WSB, we removed potentially confounding effects of inbreeding depression in their progeny, while maintaining the benefits of a replicable genotype. Progeny from this cross are expected to be similar genetically to wild mice. A single F1(LEWES × WSB) male was used. Testing the effects of different social conditions on seminal protein composition, and quantifying variation in protein composition among individuals, is outside the scope of the current study.
The parents of this male were paired for 1 week and then separated so that the dam gave birth in isolation. At 21 days postpartum, the male was weaned individually to avoid dominance interactions among brothers. Grouped males have reduced fertility compared with singly caged males (Snyder 1967), probably because social hierarchy reduces reproduction of subordinate males. Although fertility declines significantly more rapidly in isolated males vs. males caged with females, this effect is only seen after approximately 22 months of age (Schimidt et al. 2009), much older than the mouse used here. All mice were maintained at the University of Arizona Central Animal Facility in accordance with IACUC regulations.
Artificial ejaculation techniques produce abnormal and inconsistent ejaculates in mice (Snyder 1966; Tecirlioglu et al. 2002). Therefore, we identified proteins directly from dissected tissues. At 60 days of age, the F1(LEWES × WSB) male was sacrificed, and the seminal vesicles, anterior prostate, ventral prostate, dorsolateral prostate, bulbourethral gland, and the bulbourethral diverticulum were quickly dissected into phosphate buffered saline (PBS). Testis weight, testis histology, sperm count, and fertility data all show that F1(LEWES × WSB) males are sexually mature by 60 days of age (Good et al. 2007). Internal fluids were manually pressed out using sterile 28G needles. The anterior, ventral, and dorsolateral prostates are small and tubular, so they had to be sliced to release adequate amounts of extracellular fluids. Cell lysis solutions were intentionally avoided in order to target extracellular proteins likely to be secreted and ejaculated. Isolated proteins in PBS were stored at −80 °C until MS analysis.
MS was performed as described in Findlay et al. (2008), with several modifications. Briefly, proteins from each of the six male reproductive tissues were digested with trypsin and prepared for MS as previously described (Aagaard et al. 2006). For each sample, peptides were then loaded onto a 75-μm internal diameter capillary column that had been packed with 40 cm of Jupiter C12 reverse phase material. Two technical replicates, each of 5 μg, were then analyzed by injecting the sample directly on to a column that had been placed online with an LTQ ion-trap mass spectrometer (ThermoElectron). Peptides were eluted off the column over a 4-h gradient, and mass spectra were acquired using data-dependent acquisition. Data files from each sample were analyzed using the CHARGE-CZAR, SEQUEST, and DTASELECT programs (Eng et al. 1994; Tabb et al. 2002; Klammer et al. 2005). MS2 files were searched against all annotated proteins in the mouse genome (NCBI build 37, Ensembl version 48). Only proteins detected with at least two tryptic peptides were included. The false discovery rate (FDR) was estimated by including a set of “decoy” proteins in the database when performing the search. Each decoy was formed by randomly rearranging the amino acids in an annotated protein. The FDR was calculated by dividing the number of decoy proteins identified by the number of annotated proteins identified; in no case were any decoy proteins detected under our stringent requirement of at least two unique tryptic peptides.
We note that MS cannot identify all proteins exhaustively. Some proteins, such as those encoded by “Androgen-binding protein” genes, are resistant to standard trypsin digestion (Karn and Laukaitis 2003). Accordingly, we did not detect this protein even though both the transcript and protein are abundant in mouse prostate (Dlouhy et al. 1986; Laukaitis et al. 2005). Because not all proteins are equally detectable, absolute or relative abundance cannot be reliably compared across proteins.
All spectra were searched against the C57BL/6J reference genome (NCBI build 37, Ensembl version 48). Genetic divergence between the F1(LEWES × WSB) mouse used here and the C57BL/6J reference genome is unlikely to affect protein identification. More than 90% of the C57BL/6J genome is thought to be derived from M. domesticus (Yang et al. 2007). Average pairwise divergence (π, Nei and Li 1979) in the introns of several genes sampled from M. domesticus is <0.003 (Baines and Harr 2007; Geraldes et al. 2008). Average pairwise divergence that results in nonsynonymous substitution is expected to be even lower. Furthermore, because proteins were identified by a median of four peptides (see Results), most proteins would be “hidden” only if amino acid substitutions occurred at multiple peptides.
Previous work in Drosophila has shown that genes encoding reproductive proteins are often not detected by gene prediction algorithms (Findlay et al. 2008, 2009). To identify unannotated reproductive genes in mouse tissues, we searched all mass spectra from each tissue against a six reading frame translation of the C57BL/6J reference genome (NCBI build 37, Ensembl version 48). The DNA sequence of each chromosome was translated in all six reading frames. Open-reading frames were discarded if they translated to fewer than 11 amino acids, consisted solely of a single amino acid repeat, or lacked the K or R residues predicted for tryptic peptides (Findlay et al. 2008). Translating and filtering produced ~137 million possible open-reading frames. We also included annotated proteins that were identified from the searches above, so that six-frame peptides matching them could be easily removed. Because many false positives might be expected from searching against such a large database, we required that six-frame peptides be matched by at least two spectra in order for them to be considered a true positive. We used a combination of BLAST (Altschul et al. 1990) and BLAT (Kent 2002) to map these peptides back to the mouse genome.
We compared our data with a high-quality data set identifying human seminal fluid proteins. Specifically, 923 unique proteins (of which 806 could be unambiguously linked to single Ensembl gene names) were identified from purified human seminal fluid (Pilch and Mann 2006).
Orthology was determined using the phylogenetic clustering analyses of Ensembl version 48 (www.ensembl.org) (NCBI mouse build 37). Protein sequences were aligned using CLUSTALW version 1.83 (Thompson et al. 1994), then associated with their coding DNA sequences using REVTRANS version 1.5 (Wernersson and Pedersen 2003). From these alignments, we calculated dN/dS, the number of nonsynonymous substitutions per nonsynonymous site normalized by the number of synonymous substitutions per synonymous site, using the method of Goldman and Yang (1994) as implemented in PAML version 3.15 (Yang 1997). For genes with multiple transcripts, we estimated dN/dS for all possible pairwise comparisons between mouse and rat, then chose the pair with the lowest estimated dS as an indication of the best alignment (following Dean et al. 2008). As quality control, we excluded any genes with fewer than 100 codons (dN/dS is difficult to estimate accurately with small genes), an estimated dN > 1 (probably an analytical artifact indicating more than one nonsynonymous substitution per nonsynonymous site), or an estimated dS ≥ 0.381 (twice the median dS value across the 15,950 genes, which may indicate poor alignment), resulting in 14,963 analyzed genes. We constructed 95% confidence intervals (CIs) of the estimated median by sampling 10,000 bootstrap replicates with R (www.r-project.org).
To test for recurrent positive selection acting on genes, we used a maximum likelihood framework implemented in the CODEML routine of PAML version 3.15 (Yang 1997), as well as a fixed effects likelihood (FEL) implemented in HYPHY version 0.9920070619beta (Kosakovsky Pond et al. 2005). The CODEML framework considers dS to be constant across the gene, whereas FEL allows dS to vary among codons (Kosakovsky Pond and Frost 2005).
Using the same pair of sequences chosen in the above mouse–rat comparisons, we retrieved all one-to-one orthologs in human, cow, and dog. A total of 10,912 genes had one-to-one orthologs across these five species. For genes with multiple transcripts in any of these latter three species, we chose the longest transcript. Alignments were made as described above. In a CODEML framework, we fit the data to three alternative models of molecular evolution (the M7, M8a, and M8 models as described by Yang et al. 2000; Swanson et al. 2003). In essence, M7 and M8a represent different null hypotheses, as neither allows for codons within a sequence to experience recurrent positive selection, whereas model M8 relaxes this constraint.
We considered a gene to have experienced recurrent positive selection if all five of the following criteria were met: 1) M8 fit the data significantly better than M7 at P < 0.01, using a likelihood ratio test, 2) M8 fit the data significantly better than M8a at P < 0.01, 3) the additional class of dN/dS estimated by M8 was greater than 1.1, 4) at least 1% of the codons belonged to this additional class of dN/dS, and 5) FEL analyses revealed significant evidence of positive selection in at least one codon (dN/dS > 1.1 at P < 0.10, the P value recommended by Kosakovsky Pond and Frost 2005). As further quality control, we estimated pairwise dS between mouse and each of the four other species using the runmode = −2 option in CODEML. We excluded any genes whose five species alignment produced fewer than 100 analyzed codons or produced pairwise dS of mouse–rat ≥ 0.38, mouse–human ≥ 1.20, mouse–dog ≥ 1.39, or mouse–cow ≥ 1.43 (each representing greater than twice the median dS estimated from these respective genome pairs, possibly indicating poor alignment). These quality control measures resulted in 9,071 analyzed genes.
Among the six male reproductive tissues, 43,076 spectra identified 3,056 peptides that mapped to 1,057 known proteins. A median number of 10 spectra from a median of four peptides, covering a median 16% of the protein, were used to identify each protein or protein family. Of the 1,057 proteins identified, 766 could be mapped to a single region in the genome. These 766 proteins annotated to 506 genes (supplementary table 1, Supplementary Material online), indicating a high level of alternative splicing. The other 291 proteins could not be unambiguously assigned to a single region in the genome, usually because they were members of large gene families containing highly similar genes (supplementary table 2, Supplementary Material online). It should be noted that not all proteins identified here are necessarily ejaculated. However, the main conclusions that follow remain unchanged if we consider only the most conservative set of proteins, which are those whose one-to-one ortholog in humans is also found in human ejaculates.
Of the 506 “single region” genes, 228 (45%) were detected in only one of the six tissues (table 1), suggesting that regions of the male reproductive tract often contribute unique proteins. Of the remaining 278 genes, 95, 68, 69, 37, and 9 were detected in two, three, four, five, and six tissues, respectively.
To visualize the degree of specialization across tissues, we constructed Neighbor-Joining phenograms (using Swofford 2002), based on protein presence/absence (fig. 1). There is strong divergence among tissues. For example, the three lobes of the prostate do not form a distinct clade, suggesting they are as divergent in protein complement from each other as they are from other tissues. Other studies have shown distinct patterns of gene expression among prostatic lobes (Abbott et al. 2003; Berquin et al. 2005; Fujimoto et al. 2006).
There is a broad overlap between the proteins identified in mouse male reproductive tissues and orthologous proteins identified in human seminal fluid. Of 506 mouse male reproductive proteins, 367 (73%) have a one-to-one ortholog in human (compared with the genome average of 14,925 of 23,049 = 65%), of which 136 were detected in the human seminal fluid data of Pilch and Mann (2006). The amount of overlap between studies is likely to be an underestimate, because we extracted proteins from dissections whereas Pilch and Mann (2006) isolated ejaculates directly.
Certain models of adaptive evolution predict that male reproductive genes should be favored on the X chromosome (Rice 1984; Vicoso and Charlesworth 2006). The 506 single gene proteins were not significantly more common on the X chromosome (14 of the 506 male reproductive proteins were X-linked compared with 954 of 21,813 genes in the genome, Fisher's Exact Test, P = 0.10).
We performed a search against the entire genome translated in all six reading frames with the goal of identifying genes or splice variants that are currently not annotated. Such a search can potentially improve the annotation of the mouse genome with direct evidence of translation. We detected three regions of the genome that contained peptides identified with high confidence, but where genes are currently not annotated (supplementary table 3, Supplementary Material online). Each of these peptides was at least 10 amino acids long and was detected with at least three spectra.
One of these proteins occurs on chromosome 7 in the vicinity of a gene called “Fc fragment of IgG-binding protein” (Fcgbp) (transcript = ENSMUST00000076648, gene = ENSMUSG00000047730). We detected 1,890 spectra matching 41 unique peptides in the annotated region that spans the 3′ end of the Fcgbp gene and the 5′ end of the next gene, Fbl (supplementary fig. 1, Supplementary Material online, Supplementary browser track). The human ortholog of Fcgbp spans the entire region between the annotated mouse Fcgbp and the unannotated regions hit by our six-frame search. If the mouse gene model of Fcgbp were annotated in a similar way, it would contain all of our unique peptides and the predicted transcripts ENSMUST00000082859, ENSMUST00000059886, and ENSMUST00000098633 (supplementary fig. 1, Supplementary Material online). We suggest that all of our hits to this unannotated region are part of one large Fcgbp gene. The size of the gene may make it difficult to sequence the complete mRNA (>15 kb) using a high throughput approach. For most of our unannotated peptides, there are no corresponding expressed sequence tag (EST) data (UCSC Genome Browser, http://genome.ucsc.edu).
In addition to this region of chromosome 7, we found two other peptides matching unannotated regions (supplementary table 3, Supplementary Material online). One on chromosome 1 was found four times in the anterior prostate, and the other on chromosome 13 was found three times in the dorsolateral prostate. Currently there is no EST evidence for transcription at either of these loci. Targeted amplification of cDNA from the regions could reveal the extent of their genes.
To better understand the functions of proteins found in the six male reproductive tissues, we tested for overrepresentation of GO terms (Ashburner et al. 2000) using ONTOLOGIZER version 2.0 (Robinson et al. 2004), with the “Term-For-Term” calculation method and Bonferroni-corrected P < 0.05. We compared the functions inferred for mouse seminal proteins with those inferred from a high-quality human seminal fluid data set (Pilch and Mann 2006).
Among the 506 mouse male reproductive proteins, there were 161 GO terms that were significantly overrepresented (supplementary table 4, Supplementary Material online). Under the biological process partition, many GO terms were related to various aspects of metabolism (alcohol metabolism, carbohydrate metabolism, nucleotide metabolism, protein metabolism, and organic acid metabolism), physiological processes (cell organization, muscle contraction), and response to stress (response to chemical, biotic, heat, and protein stimuli). Under the molecular function partition, nucleotide binding, protein binding, enzyme regulation, and catalytic activity were overrepresented among male reproductive proteins.
Analysis of the cellular component partition of the GO yielded a surprising result. Even though our technique of protein isolation targeted extracellular proteins, functional analyses suggested that intracellular proteins were significantly overrepresented among male reproductive proteins. Reanalysis of the human seminal protein data (Pilch and Mann 2006, supplementary table 4, Supplementary Material online) gave the same result. The human seminal proteins are unlikely to be confined to intracellular space because they were identified from real ejaculates rather than dissections. Therefore, this result probably reflects imperfect knowledge about the complex pathways in which proteins are exported, although it is possible that some cellular sloughing or leakage occurs too.
Many of the 161 significantly overrepresented terms in our data set overlap functional terms identified as overrepresented among human seminal proteins. Specifically, 68 of the 161 terms were also overrepresented in the human data (supplementary table 4, Supplementary Material online). These 68 GO terms represent only identical overlap, ignoring similar functions that are assigned different GO terms. However, the hierarchical nature of GO terms also introduces some degree of nonindependence in these analyses. It was not possible to compare functional overlap to non-reproductive tissues, because human expression data derive largely from cell lines (i.e., Su et al. 2002).
We next focused on mouse male reproductive genes without one-to-one orthologs in the human genome. Among the 63 mouse genes without human orthologs, there were five significantly overrepresented terms, all related to enzyme inhibitor activity (table 2). Of these, three were also significantly overrepresented among the 61 human seminal genes with no mouse ortholog (table 2). Therefore, enzyme inhibitors appear to be particularly prone to gene turnover, either as an increased birth–death process or through rapid evolution that hides true orthology.
Genes whose proteins were detected in the seminal vesicles showed significantly higher pairwise dN/dS (95% credible interval = 0.162–0.360) compared with the genome average (0.125–0.131) (table 1, figs. 2 and and3).3). In contrast to seminal vesicle genes, genes whose proteins were detected in the other five male reproductive tissues showed significantly lower dN/dS compared with the genome, suggesting that most male reproductive proteins are evolutionarily constrained (table 1, figs. 2 and and3).3). There was no significant difference in the length of protein analyzed (the length of the protein, minus regions where mouse and rat do not align) between male reproductive proteins and the genome average (Wilcoxon Rank Sum Test [WRST] = 2,613,762, P = 0.69, table 1), so these patterns are unlikely to be caused by differences in power. There was also no significant difference in the frequency of one-to-one orthology among the six reproductive tissues compared with the genome (table 1). Furthermore, genes that failed quality control measures were distributed equally across the six reproductive tissues (supplementary table 1, Supplementary Material online).
Among the 506 genes detected here, those that were exclusively expressed in male reproductive tissue showed significantly higher dN/dS than genes that are also expressed in nonreproductive tissues. We reanalyzed the expression data of Su et al. (2002), focusing on 21 main tissues (following Dean et al. 2008). Of the 506 genes identified here, 378 could be linked to these expression data (not all genes were represented on the microarray chip of Su et al. 2002), of which 280 yielded pairwise estimates of dN/dS. Genes that were never detected in nonreproductive tissues showed significantly higher dN/dS (N = 34, median dN/dS = 0.17) compared with those genes that were detected in nonreproductive tissues (N = 246, median dN/dS = 0.06) (WRST = 6004, P < 10−5).
Not all proteins are equally detectable using MS, making it difficult to use the number of spectra identified per protein (or the number of peptides identified per protein) to meaningfully quantify relative or absolute abundance. With this caveat in mind, the 143 proteins without a one-to-one ortholog in rat were detected with significantly more total spectra (median = 25 spectra) compared with the 363 proteins with orthologs (median = 12 spectra) (WRST = 20,170, P < 10−4). If the number of spectra at least partly reflects relative protein abundance (Liu et al. 2004), this pattern suggests that more abundant proteins experience greater gene turnover or evolve so rapidly that orthology is obscured.
The proteins identified here were collected from dissected male reproductive glands and are not necessarily ejaculated. To address this issue, we repeated the above analyses of dN/dS including only the 136 genes with one-to-one orthologs in humans that occur in human ejaculates (Pilch and Mann 2006). It is likely that such proteins are also found in mouse ejaculates. Qualitatively, our results do not change: Seminal vesicle proteins still showed higher median dN/dS, and the proteins from the other five tissues still showed lower median dN/dS, compared with the genome (supplementary fig. 2, Supplementary Material online). However, the differences were no longer statistically significant for seminal vesicle and anterior prostate proteins, but this is probably due to the large reduction in sample size (supplementary fig. 2, Supplementary Material online). Although direct confirmation that the proteins identified here are included in the ejaculate awaits further study, the overall patterns of evolution (i.e., fig. 3) are robust. Therefore, differences in evolutionary rate cannot be explained by methodological issues, such as differential ease of dissection across the reproductive tract.
The significantly higher pairwise dN/dS observed for seminal vesicle proteins could have arisen through either recurrent adaptive evolution or relaxed selective constraint. To distinguish between these alternatives, we performed codon-based maximum likelihood estimates of the frequency of positive selection across five diverse mammal species. There were only 12 seminal vesicle genes that had a one-to-one ortholog in all five species analyzed. Of these, one gene (transferrin [ENSMUSG00000032554]) showed significant evidence of positive selection (table 1). There was no significant difference in the length of protein analyzed (the length of the protein, minus regions where there was not full alignment across the five species) between male reproductive proteins and the genome average (WRST = 949,238, P = 0.84, table 1). The small number of seminal vesicle genes that could be analyzed in this framework probably did not yield enough power to detect a higher rate of adaptive evolution. For example, six related genes (SVS2 [ENSMUSG00000040132], SVS3a [ENSMUSG00000017003], SVS3b [ENSMUSG00000050383], SVS4 [ENSMUSG00000016998], SVS5 [ENSMUSG00000017004], and SVS6 [ENSMUSG00000017000]) are located in a ~100-kb window that has experienced an unusual history of duplication, deletion, and conversion (Hurle et al. 2007). These dynamics make orthology assignment in this region difficult, in turn causing all these genes to “drop out” of the five species analyses. Still, seminal vesicle genes evolve about as rapidly as testis-specialized genes (fig. 3), where adaptive evolution has been repeatedly reported (Good and Nachman 2005; Turner et al. 2008).
We have undertaken the first comprehensive study of potentially ejaculated proteins in the house mouse. By targeting six distinct regions of the male reproductive tract, we discovered a previously unappreciated amount of heterogeneity in evolutionary rate experienced by reproductive proteins (fig. 3). Although it is true that many reproductive genes are rapidly evolving in mammals (Waterston et al. 2002; Castillo-Davis et al. 2004; Gibbs et al. 2004; Nielsen et al. 2005), proteins from five of six male accessory glands showed significantly lower rates of evolution compared with the genome average. Therefore, although many rapidly evolving genes have reproductive functions (reviewed in Clark et al. 2006), most proteins found in male accessory glands do not evolve rapidly. Although further work is required to confirm the reproductive roles of most of the proteins identified here, these conclusions remain essentially unchanged if we confine our analyses to those proteins that are also found in human ejaculates.
Heterogeneity in evolutionary rate may result from the partitioning of functions across the reproductive tract. Genes whose proteins were detected in the seminal vesicles showed a significantly higher rate of evolution compared with the genomic average. This pattern may be related to a higher degree of specialization for reproductive functions. Seminal vesicle proteins were fewer in number but were identified with significantly more spectra per protein.
To better understand this heterogeneity, we focus on the functions of seminal vesicle proteins. Thirty-seven proteins were identified from seminal vesicles (table 1). Of these, 21 showed the highest number of spectra in seminal vesicles compared with the other five tissues, and all showed >5X the median number of spectra detected in the other five tissues (supplementary table 1, Supplementary material online). These 21 proteins, which appear to be most common in the seminal vesicles, seem to fall into two functional classes: 1) proteins that participate in the formation of the copulatory plug and 2) proteins that function in immune response. It should be stressed, however, that most functional assignments are inferred by computational annotation and not direct assay, and it is probably true that no gene is fully functionally characterized with respect to all its functions.
Shortly after copulation, a hard coagulum forms in the vaginal–cervical region in many rodent species and may persist for more than 24 h. In related work, we have identified the proteins found in the copulatory plug (unpublished data) using MS. The proteins encoded by seven of the 21 seminal vesicle–specific genes—SVS1 (ENSMUSG00000039215), SVS2 (ENSMUSG00000040132), SVS3A (ENSMUSG00000017003), SVS3B (ENSMUSG00000050383), SVS4 (ENSMUSG00000016998), SVS5 (ENSMUSG00000017004), and SVS6 (ENSMUSG00000017000)—were all detected in the copulatory plug with very high spectral counts. The proteins from SVS1 and SVS2 and SVS3a or SVS3b have been experimentally demonstrated to form transglutaminase-induced cross-linking, a central feature of the copulatory plug (Lundwall et al. 1997; Lin et al. 2002).
Two main hypotheses regarding the role of the copulatory plug in reproductive biology predict rapid evolutionary change in the constituent proteins. First, the copulatory plug may inhibit the sperm of other males. There is a strong correlation between the number of males that a female mates with per estrus cycle and the size and solidification of the copulatory plug in rodents and primates (Dixson and Anderson 2002; Ramm et al. 2005). Furthermore, rodents that do not form a copulatory plug appear to have evolved a different form of mate guarding by which a mechanical lock forms during copulation (Hartung and Dewsbury 1978). In addition, the rate of nonsynonymous substitution in SVS2 is positively correlated with the predicted frequency of multiple mating in rodents (Ramm et al. 2008). Taken together, these patterns suggest that copulatory plugs are an adaptive response to sperm competition. In nature, female mice mate with more than a single male in at least 22% of all estrus cycles (Dean et al. 2006; Firman and Simmons 2008), so sperm competition is a potentially powerful evolutionary force.
Second, the copulatory plug may alter female mating behavior by making her more resistant to subsequent mating (Goldfoot and Goy 1970; Carter and Schein 1971; Hardy and DeBold 1972). Although these behavioral shifts are likely adaptive from the perspective of the male (i.e., he is reducing the chance of future sperm competition), they could be deleterious to females (i.e., her litter is predicted to be less genetically diverse or more likely to carry meiotic drivers, as shown by Price et al. 2008). Sexual antagonism would predict rapid evolution among copulatory plug proteins as they continually change in response to female proteins.
Patterns of evolution are consistent with the predictions of male–male or male–female interactions. Of the three SVS genes that have a rat ortholog and pass quality control measures, all have pairwise dN/dS above 0.40 (more than 3X the genome median, supplementary table 1, Supplementary Material online). Formal tests of positive selection could not be undertaken for any SVS genes because they do not have clear orthologs in the other four species examined here. Nevertheless, some SVS proteins have been shown to experience recurrent positive selection with broader sampling within rodents (Karn et al. 2008; Ramm et al. 2008).
A literature search revealed that 11 of the 21 seminal vesicle-specific proteins appear to function in the context of immunity activation or inhibition. Immunity genes are a classic example of an arms race model, whereby interacting proteins (usually host–pathogen) continually evolve to detect or evade each other. In the context of reproduction, immunity is an especially interesting biological process, as sperm are foreign bodies that must evade the female's immune response in order to successfully fertilize. The female mounts a massive immune response shortly after copulation, potentially posing a strong barrier to male reproductive success.
Two of these 11 genes, nucb2 (ENSMUSG00000030659) and dnase2B (ENSMUSG00000028185), bind or hydrolyze DNA, respectively. Shortly after copulation, the female reproductive tract is inundated with white blood cells. Some of these white blood cells react to foreign bodies by expelling a net of proteases attached to a DNA backbone (Wartha et al. 2007). DNA binding or hydrolysis by seminal vesicle proteins may counteract this response and free entangled sperm (Alghamdi and Foster 2005).
Four additional genes—serpine2 (ENSMUSG00000026249), spink3 (ENSMUSG00000024503), timp1 (ENSMUSG00000001131), and 9530002K18RIK (ENSMUSG00000053729)—all inhibit certain proteolytic enzymes. It is possible that protease inhibitors in seminal fluid protect sperm from proteolytic attack. It is also plausible that protease inhibitors slow down the degradation of the copulatory plug, as some proteases degrade coagulated seminal vesicle secretions (Lilja 1985; Lundwall et al. 2006). Either scenario potentially places male-expressed protease inhibitors in a sexually antagonistic role with female-expressed proteases. Consistent with this model, rapid evolution has been observed in multiple Drosophila proteases and protease inhibitors (Swanson et al. 2004; Mueller et al. 2005; Panhuis and Swanson 2006; Kelleher et al. 2007; Lawniczak and Begun 2007; Findlay et al. 2008; Wong et al. 2008). Interestingly, protease inhibition is a function that was significantly overrepresented among mouse seminal proteins without a human ortholog (table 2), suggesting this class of genes may experience rapid turnover.
Five of the 11 immune-related genes may be involved in pathogen defense, as they encode proteins that target pathogens (ceacam10 [ENSMUSG00000054169]), present antigens (b2m [ENSMUSG00000060802]), perform lysis (9530003J23RIK [ENSMUSG00000020177]), degrade proteins (plau [ENSMUSG00000021822]), or alter glycosylated proteins (man2b1 [ENSMUSG00000005142]). It is possible that some of these proteins serve as a kind of paternal investment, whereby the male donates a suite of proteins that protect the female from incoming bacterial or viral infections following copulation. Here, rapid evolution would be predicted not because of sexual antagonism, but because of an arms race between host immunity genes and targeted pathogen molecules. The seven immunity-related genes that have a rat ortholog and pass quality control have a median pairwise dN/dS = 0.304 (~2.5X the genome median, range = 0.123–0.95, supplementary table 1, Supplementary Material online).
Although it is true that many reproductive proteins evolve rapidly, our work shows that the majority of proteins isolated from the male reproductive tract have been subjected to strong evolutionary constraint. Proteins found in the seminal vesicles are the notable exception, showing patterns of rapid evolution consistent with positive selection driving recurrent nonsynonymous changes. Interestingly, seminal vesicles produce a variety of proteins that form the copulatory plug and apparently suppress the female immune response. These proteins might mediate competitive outcomes among males or sexual conflict between males and females. Furthermore, many seminal vesicle proteins appear to participate in immune response. All of these interactions predict rapid evolution among seminal vesicle proteins due to either sexual or natural selection. Future investigation of ejaculated proteins will shed light on the functional partitioning across the male reproductive tract and how these functions affect male reproductive variance and/or reproductive isolation.
This work was supported by NIH postdoctoral fellowship F32GM070246-02 to M.D.D., NIH training grant T32 HG00035 to G.D.F., NICHHD senior postdoctoral fellowship 5F33HD055016-02 to R.C.K., NIH grant P41 RR011823 to M.J.M., and NSF and NIH grants to M.W.N. M. Carneiro read an early version of the manuscript. D. Tautz and two anonymous reviewers improved the manuscript.