|Home | About | Journals | Submit | Contact Us | Français|
MicroRNAs (miRNAs) are an important class of posttranscriptional gene expression regulators. In the course of mapping novel marsupial-specific miRNAs in the genome of the gray short-tailed opossum, Monodelphis domestica, we encountered a cluster of 39 actual and potential miRNAs spanning 102 kb of the X chromosome. Analysis of the cluster revealed that 37 of the 39 miRNAs are predicted to form thermodynamically stable hairpins, and at least 3 members have been directly cloned from M. domestica tissues. The sequence characteristics of these miRNAs suggest that they all descended from a single common ancestor. Further, 2 distinct families appear to have diversified from the ancestral sequence through different duplication mechanisms: one through a series of simple tandem duplications and the other through a recurrent transposon-mediated duplication process.
MicroRNAs (miRNAs) are an abundant and important class of small regulatory RNAs first identified in the early 1990s (Lee et al. 1993). Since then, thousands of miRNA loci have been found in animals, plants, and viruses (Du and Zamore 2005; Ouellet et al. 2006). Although primarily found to be regulators of gene expression involved in cellular development and differentiation, miRNAs also play significant regulatory roles in a host of other normal and pathologic functions including apoptosis and cancer (Zhang, Wang, and Pan 2007). From an evolutionary perspective, 2 phenomena appear to hold. One is that major evolutionary events, such as the emergence of the Vertebrata or, later, the emergence of the Mammalia, are accompanied by the emergence of new miRNA families and that, once acquired in a lineage, new miRNAs are rarely altered or secondarily lost (Sempere et al. 2006; Heimberg et al. 2008). The second is that lineage-specific miRNAs are also acquired and these often bear the fingerprints of transposons (Smalheiser and Torvik 2005; Piriyapongsa and Jordan 2007; Piriyapongsa et al. 2007; Devor et al. 2009; Lehnert et al. 2009). We recently reported on miRNAs in the genome of a marsupial mammal, the gray short-tailed opossum, Monodelphis domestica (Devor and Samollow 2008). Of a total of 137 miRNAs identified in the M. domestica genome, 124 (91%) are conserved in other mammalian genomes in miRBase (Griffiths-Jones et al. 2006), whereas 13 were identified as new marsupial miRNAs. Eleven of the new miRNAs were also found in the tammar wallaby (Macropus eugenii) genome (Devor and Huang 2011), and 7 contained marsupial-specific transposon sequence signatures (Devor et al. 2009). Two of the new miRNAs that appear to be unique to the M. domestica genome map to the small (~79 Mb) M. domestica X chromosome. In the course of mapping these 2 new X chromosome miRNAs, we found that they were located within a larger cluster of closely related sequences spanning nearly 102 kb and that almost all these sequences are predicted to form thermodynamically stable hairpins suggesting that this cluster potentially encodes numerous miRNAs.
Here, we report a detailed examination of this cluster that reveals 39 actual and potential miRNAs descended from a single common ancestor likely via both retroposon-mediated and tandem duplication events.
miRNA libraries were prepared from several M. domestica tissues as previously described (Devor and Samollow 2008; Devor et al. 2008). More than 400 individual clones were sequenced on an Applied Biosystems Model 3100 Genetic Analyzer. About 80% of the clones yielded sequence signatures in the correct size range for miRNAs (21–23 nt). Duplicates were identified and eliminated resulting in a final pool composed of 90 unique signatures. Chromosome coordinates were determined by BLAST (basic alignment search tool) search in the M. domestica genome assembly MonDom 5 in Ensembl (http://www.ensemble.org/Monodelphis_domestica/Info/Index), and each signature was then screened via BLAST in miRBase (Griffiths-Jones et al. 2006). Of the 90 unique signatures, 73 were archived in miRBase as miRNAs previously identified in other mammalian species. The remaining 17 sequences were analyzed as possible new miRNAs by obtaining flanking genomic sequence from MonDom5. These genomic DNA sequences were then converted to RNA (potential pre-miRNAs) and analyzed for the hallmark thermodynamically stable hairpin structure in the RNA folding program mFOLD (Zuker 2003). Thirteen of the 17 candidate miRNAs met miRBase criteria for new miRNAs, namely, that they were expressed, that they were part of a thermodynamically stable hairpin, and that the mature sequence had not been previously identified in any other species (Griffiths-Jones et al. 2006).
Sequence alignments were performed with Clustal W (Thompson et al. 1994), and the Kimura 2-parameter model was used to estimate the pairwise distances by the dnadist method in PHYLIP 3.6 (Felsenstein 1989).
Seventeen unique sequence signatures were assessed as potential new miRNAs, and 13 were seen to be contained within a thermodynamically stable hairpin structure. These 13 sequences were accepted as new candidate miRNAs based on miRBase criteria (Griffiths-Jones et al. 2006). Two of the new miRNAs, designated Mdo-miR-1545 and Mdo-miR-1544, mapped very near each other on the opossum X chromosome. In the process of validating the chromosome coordinates for these 2 miRNAs in MonDom 5, a large number of similar sequences were seen spread over a ~102-kb region of Xq1.1. Further investigation of this region by direct inspection of the entire 101932-bp sequence revealed that there were 37 additional miRNA-like sequences, of which 20 were identical or nearly identical to Mdo-miR-1545 and 17 were identical or nearly identical to Mdo-miR-1544. We designated these sequences Mdo-miR-1545a-u and Mdo-miR-1544a-r, respectively, where Mdo-miR-1545a and Mdo-miR-1544a refer to the originally cloned miRNAs.
The newly detected miRNA cluster lies on the M. domestica X chromosome (Mdo X) distal to the miRNA Mdo-miR-223, which is conserved throughout the Mammalia (Figure 1A). RepeatMasker annotation (Jurka et al. 2005) of the region showed that 44302 bp (43.5%) could be identified as transposon remnants. These remnants were highly similar to one another, and virtually all could be identified as derivatives of 5′-UTR sequences of marsupial L1 LINE elements. When the 39 putative precursor miRNA sequences were viewed within the RepeatMasker annotation, a consistent repetitive pattern emerged (Figure 1B). Each pre-miRNA was closely flanked on the 5′ side by a marsupial L1-derived fragment oriented in the opposite direction to the miRNA, and 24 of the 39 pre-miRNAs were also flanked on the 3′ end by another quite small marsupial L1 fragment oriented in the same direction as the miRNA. A Clustal W alignment (Thompson et al. 1994) of the pre-miRNAs and 5′- and 3′-flanking sequences indicates that the L1 fragments plus the miRNAs form a repeating sequence cassette. Moreover, these repeating cassettes are spaced within the region in a fairly regular manner (Figure 1B). Taking these structural features together, we suggest that all 39 members of this cluster are descended from a single common ancestor and that the individual members of the cluster arose via a series of duplication events. The position of the opossum L1 transposon fragments within the duplication cassette raises the possibility that these elements may have played a role in the duplication events that generated this miRNA cluster. We have recently presented evidence that a number of marsupial-specific miRNAs including this cluster arose from marsupial-specific transposons (Devor et al. 2009).
In addition to identifying the flanking opossum L1 transposon 5′-UTR-derived fragments, the RepeatMasker annotation included all or part of 19 (49%) of the 39 miRNAs themselves within the 5′ L1 segments (Devor et al. 2009). Eight of these, miR-1544c and j and miR-1545a,e,f,s,t,u, were completely masked, whereas 9, miR-1545b,c,d,h,i,j,k,p,q, had the 3′ stem masked and 2, miR-1544a and miR-1545o, had the 5′ stem masked. This aspect of the annotation raises the possibility that, in addition to the involvement of transposon remnants in the mechanism through which serial duplications and insertions created the overall miR-1545/1544 cluster, the progenitor of both the miR-1545 and miR-1544 groups originally arose in the upstream L1 transposon. There have been numerous suggestions in recent years that transposons of different types could, through fortuitous juxtaposition, give rise to miRNAs (Smalheiser and Torvik 2005, 2006; Borchert et al. 2006; Piriyapongsa and Jordan 2007; Piriyapongsa et al. 2007). The miRNA cluster presented here, along with other new miRNAs in this species (Devor et al. 2009), support the proposition that transposons are a previously unappreciated source of novel species-specific miRNA loci.
A phylogenetic analysis of the 39 pre-miRNAs is presented in Figure 2. The unrooted neighbor joining tree indicates that there are distinct families in this cluster, an miR-1545 family and an miR-1544 family. Sequence similarities between the miR-1545 and the miR-1544 families are consistent with their descent from a single common ancestor, as suggested above. The small size of the mature and pre-miRNA sequences limits statistical power for tests of deviations from neutral evolution, but overall the observed patterns seem consistent with a single ancestral miRNA that gave rise to distinct miR-1545 and miR-1544 families in the Mdo-X cluster.
One expectation of the proposed transposon-mediated series of duplication events should be little or no association between the genetic distance between pre-miRNAs as calculated by the Kimura 2-parameter method and their physical distances on the chromosome. This assumes that transposon-mediated insertions are random with respect to their point of origin. In contrast with a transposon model, a simple tandem duplication model predicts a positive association between genetic distance and physical distance on the chromosome as duplicated elements begin as physical neighbors and subsequently diverge. Investigating these nonmutually exclusive duplication mechanisms by plotting the genetic distance between pre-miRNAs against their physical (interlocus) distances reveals 2 different patterns for the miR-1545 versus miR-1544 family members (Figure 3). As seen in Figure 3A, there is a weak but significant (r = 0.167, H0: r = 0, P < 0.05) positive relationship between genetic and physical distances for the Mdo-miR-1545 family that could be consistent with a transposon-mediated duplication model. By contrast, Figure 3B shows a much stronger correlation (r = 0.541, H0: r = 0, P < 0.001) between genetic and physical distances for the Mdo-miR-1544 family.
One explanation for the magnitudes of correlation in the miR-1544 and -1545 families is that they are being driven by 2 members of the cluster, Mdo-miR-1545g and Mdo-miR-1544r, which, as discussed below, may have become pseudogenized. However, investigating this possibility by removing these miRNAs from the above correlation analysis yields correlations of r = 0.175 and r = 0.503 for the 1545 and1544 families, respectively. These slight differences suggest that although 1545g and 1544r might have some influence in the resulting genetic to physical distance relationships, they are not the primary drivers for the magnitudes of these relationships.
Most important, the correlation for the miR-1544 family is significantly higher than the correlation for the miR-1545 family (H0: 0.167 = 0.541, t = 6.95, degrees of freedom = miR-1544, P = 1.54 × 10−11). It is interesting to note that although the miR-1545 and miR-1544 family elements are interspersed within the X-chromosomal cluster, the miR-1545 family contains a signature more consistent with a transposon-mediated duplication model, whereas the miR-1544 family contains a signature more consistent with a tandem duplication model. Of course, it is also possible to generate a signature consistent with expectations of a tandem duplication model by modifying the transposon-mediated duplication model to require that transposition events are more likely to integrate near their source or more likely to be lost the farther from their source. Evidence for preferential insertion or deletion is not available, but the observation that the X-chromosomal cluster is highly localized, suggesting that, at least at this overall level local, insertions or remote deletions are preferred for both families. Finally, we note that the signatures of a tandem duplication model versus a transposon-mediated duplication model could easily be obscured by other forces acting on the miR-1545 or miR-1544 families, but whatever those forces are they would need to act differentially between the 2 families.
Having demonstrated that the precursors containing the 2 cloned sequences, Mdo-miR-1545a and Mdo-miR-1544a, form thermodynamically stable hairpin structures that are the hallmark of miRNAs (ΔG = −39.2 and −38.3 kcal, respectively), we sought to determine how many of the members of the cluster also form thermodynamically stable hairpins by converting the precursor sequences of the remaining 37 members into predicted RNAs and analyzing them with the RNA folding program mFOLD (Zuker 2003). The folded RNAs are presented in the Supplementary Figure S1. Among these 37 precursor RNAs, 35 are seen to form canonical hairpin structures with ΔG values ranging from −22.0 to −40.3 kcal (average ΔG = −32.5 ± 4.6 kcal). Of the remaining 2 precursors, miR-1545g fails to form a thermodynamically stable hairpin and miR-1544r forms a nonhairpin secondary RNA structure. In addition, miR-1545g has an 11-bp deletion in its putative mature sequence (Figure 4A). This combination of features raises the possibility that miR-1545g and miR-1544r have accumulated enough sequence variants to render them no longer functional, or pseudogenized, as miRNAs. Indeed, although miR-1545a and miR-1544a were initially detected via direct cloning from RNA, the question of how many of the other members of the cluster are active miRNAs remains largely unanswered. That some of them are expressed is clear, however, as revealed by some findings from an earlier study in which we cloned a few miRNA sequences during an investigation of PIWI-associated small RNAs in M. domestica testes RNA (Devor et al. 2008). Among these miRNA clones, 3 were found to contain a mature sequence that is identical to miR-1544d, f, i, and p (Figure 4A). The unexpected aspect of this observation was that the expressed mature miRNA from those 3 clones was not the sequence obtained for miR-1544a but, rather, was the sequence directly opposite it on the hairpin, which is often referred to as the star sequence (Yang et al. 2010). The orientation of these 2 miR-1544 sequences is such that, after DICER processing of the exported hairpin, they form an imperfectly matched 22-mer double-stranded RNA with canonical 2-nucleotide 3′ overhangs (Figure 4B). This, of course, is the classic DICER product from which the mature miRNA is selected and loaded into the RNA-induced silencing complex for posttranscriptional gene silencing. After miRBase convention, we designate the 2 expressed miR-1544 sequences miR-1544-5p and miR-1544-3p (Figure 4A).
Finally, alignment of the pre-miRNAs themselves (see Devor et al. 2009) showed that there is a predictable pattern of sequence variation within each family; specifically, the least conserved parts of the hairpin are in the stem ends and the loop, whereas the most conserved parts are the mature miRNA and the putative star sequence on the other arm of the stem. This is the so-called “camel” pattern proposed by Berezikov and colleagues (Berezikov and Plasterk 2005; Berezikov et al., 2005). However, as seen in Figure 4A, there is sequence variation within the mature sequences, including in the seed regions (positions 2–9), which indicates within-family divergence similar to that seen in other miRNA clusters, particularly the larger rapidly evolving clusters found on human chromosome 19 (Hsa 19; Bentwich et al. 2005) and the platypus X1 chromosome (Murchison et al. 2008). Thus, it seems likely that the Mdo-X cluster, too, is evolving and individual members are taking on new targets and new expression patterns.
The Hsa 19 cluster discussed by Bentwich et al. (2005) involves 54 miRNAs that appear to have been generated through a mechanism similar to that proposed for the M. domestica Mdo-X cluster. All members of the Hsa 19 cluster are flanked by Alu retrotransposons that were integral to the serial duplication mechanism that generated them. Further, alignment of the mature miRNAs in this cluster shows a level of sequence divergence quite similar to that seen in the Mdo-X cluster. Additional evidence shows the Hsa 19 cluster to be unique to primate genomes and expression to be limited to placental tissues. Another smaller primate miRNA cluster, also expanding through duplications, has been mapped to the X chromosome (Bentwich et al. 2005; Zhang, Peng, Wang, and Su 2007). This cluster displays expression in primate testes. Another miRNA cluster displaying organ-specific expression involves 3 miRNAs, miR-182, miR-96, and miR-183, conserved throughout vertebrates (Xu et al. 2007). For these, expression is specific to retinal tissues. However, this cluster displays very little sequence conservation outside the mature sequences themselves, and there is no clear evidence of a mechanism through which it was created. A large miRNA cluster on the X1 chromosome of the platypus, Ornithorhynchus anatinus, a prototherian mammal, was reported by Murchison et al. (2008). Expression of several members of this cluster is also seen in another prototherian, the short-beaked echidna (Tachyglossus aculeatus). In both species, expression of members of the cluster is primarily restricted to testes. Alignment of the precursors from platypus shows a high degree of sequence variation and, like the M. domestica miRNAs, a great deal of this variation is seen in the mature sequences probably indicating target divergence. However, Murchison et al. (2008) make no mention of repetitive elements in the region containing the cluster apart from noting that one member of the cluster has had a LINE element inserted into one of the precursors. Most recently, Li et al. (2010) describe a small primate-specific X chromosome cluster characterized by duplication events and rapid evolution. Like the 2 other known X chromosome miRNA clusters, this group of 6 miRNAs displays expression almost exclusively in the male reproductive system.
We have identified a cluster of 39 actual and potential miRNAs on the X chromosome of the marsupial M. domestica. Searches of GenBank, Ensembl, and miRBase show that these miRNAs have been identified only in this species. Structural and phylogenetic analyses indicate that this cluster was likely generated from an ancestral miRNA that gave rise to 2 distinct families, the miR-1545 family and the miR-1544 family, through a series of duplications. The duplication events giving rise to the miR-1545 family are consistent with a transposon-mediated model. However, despite the interspersion of the miR-1545 and miR-1544 family members in the cluster, the duplication events giving rise to the miR-1544 family are more consistent with a tandem duplication model. Small RNA cloning from several M. domestica tissues has verified expression of at least 3 members of the cluster, and structural data suggest that 2 other members are probably no longer active as functional miRNAs.
With the discovery of the miR-1544/miR-1545 miRNA cluster in M. domestica reported here and a similar miRNA cluster reported on the platypus X1 chromosome, large miRNA clusters have now been observed in all 3 of the major mammalian lineages. It is interesting to note that most of these clusters are found on the X chromosome and are predominantly associated with the male reproductive system, whereas the one very large cluster that is on an autosome displays placenta-specific expression. All these clusters are thought to be rapidly evolving through duplication and divergence mechanisms, and all display a degree of variation in mature miRNA sequences that is greater than in other miRNA families. Evolution of the type seen here is well known in multigene families, such as the mammalian RNase A family, in which new members are generated by perhaps several cycles of duplication and subsequent divergence (Cho and Zhang 2006). In the case of the M. domestica cluster, even the associated phenomenon of pseudogenization is observed. Finally, in at least 2 of the large mammalian miRNA clusters, rapid evolution is associated with, or at least accompanied by, the presence of flanking transposons.
National Institutes of Health (RR014214).