Polyadenylation occurs in three stages: polyadenylation site choice, cleavage of the pre-mRNA, and addition of the poly(A) tail to the newly formed 3' end [
1,
2]. The first step, polyadenylation site choice, can be defined as the functional assembly of the factors necessary for pre-mRNA cleavage onto the pre-mRNA to allow for efficient, accurate cleavage of the pre-mRNA (it has also been called the commitment step) [
3-
5]. Mutation of the pre-mRNA sequence elements involved in polyadenylation site choice [
6-
12], or mutation of the protein machinery involved in polyadenylation site choice [
13-
15], result in inefficient polyadenylation of the pre-mRNA. Consequently, inefficient polyadenylation prevents export of mRNA and decreases production of the protein encoded by that mRNA [
1]. Therefore, polyadenylation site choice is an important first step in polyadenylation and is essential for optimal gene expression.
In mammalian somatic cells, the mechanism of polyadenylation site choice has been intensely studied [
1,
2]. A number of pre-mRNA sequences have been proposed to be important in choosing the site of polyadenylation [
16-
20]; however, two seem to play a prominent role in mammalian somatic cells. The first is the hexameric poly(A) signal (most often AAUAAA) found 15–30 bases upstream of the site of polyadenylation [
6,
21]. The other is the G/U-rich element found 20–40 bases downstream of the site of polyadenylation [
11,
22]. Together, these elements bind the multi-subunit cleavage and polyadenylation specificity factor (CPSF) [
23] and cleavage stimulation factor (CstF) [
11], respectively. Thus, the formation of this protein/RNA complex determines the polyadenylation site choice. The next step is cleavage of the pre-mRNA at the polyadenylation site (a process that requires additional factors, possibly including CPSF-73 [
24]) followed by addition of the poly(A) tail [
1,
2].
However, a number of mRNAs use different polyadenylation sites in different tissues or developmental stages [
25]. Changes in the composition of the protein polyadenylation machinery can invoke a change in polyadenylation site choice called alternative polyadenylation [
13,
26,
27]. In addition, inclusion or exclusion of pre-mRNA sequences outside of the polyadenylation region (
i. e., by changes in splicing pattern or the presence of a "stronger" polyadenylation site) may also affect polyadenylation site choice [
25]. Therefore, changes in the protein/pre-mRNA complex involved in polyadenylation site choice can change where the poly(A) tail is added, and thereby affect gene expression.
We have noticed that the polyadenylation sites chosen in male germ cells are different from those chosen in somatic cells [
28]. First, a number of mRNAs use a polyadenylation site at higher frequency than in other tissues [
25,
29-
31]. Second, the incidence of the sequence AAUAAA near the 3' ends of male germ cell mRNAs is lower than in somatic mRNAs [
28,
32]. Third, the polyadenylation sites chosen in male germ cell mRNAs often result in shorter 3' untranslated regions than somatic mRNAs [
32]. This suggests that there are significant differences in the polyadenylation sites chosen in somatic and male germ cells. There are two possible causes of these differences. Either male germ cell-enriched polyadenylation sites can be used in somatic cells but are not (because they are on pre-mRNAs not expressed in somatic cells, because other pre-mRNA elements prevent their use, or because the somatic polyadenylation sites out-compete them), or they are poor substrates for polyadenylation in somatic cells.
We hypothesized that male germ cell-specific polyadenylation sites are poor substrates for polyadenylation in somatic cells, and therefore would be used inefficiently in somatic cells. To test this, we developed a luciferase-based reporter assay to evaluate the polyadenylation efficiency of different sequences. We then used the assay to show that sequences surrounding male germ cell-specific polyadenylation sites (called polyadenylation cassettes) were inefficiently polyadenylated in somatic cells. Additionally, we developed a 3' RACE-based approach to analyze polyadenylation site positioning. Using this approach, we observed that mRNAs containing these male germ cell-specific polyadenylation sites were not polyadenylated at the site chosen in male germ cells. Rather, they showed aberrant polyadenylation upstream of the male germ cell-specific polyadenylation site. Finally, we showed that introduction of an AAUAAA (an element important to polyadenylation site choice in somatic cells) into a male germ cell-specific pre-mRNA allowed for more efficient polyadenylation of that site in somatic cells. These data suggested that male germ cell-specific polyadenylation sites were inefficiently chosen in somatic cells, and that polyadenylation site choice has different requirements in male germ cells than in somatic cells.