|Home | About | Journals | Submit | Contact Us | Français|
Retroposition, a leading mechanism for gene duplication, is an important process shaping the evolution of genomes. Retrogenes are also involved in the gene structure evolution as a major player in the process of intron deletion. Here, we demonstrate the role of retrogenes in intron gain in mammals. We identified one case of “intronization,” the transformation of exonic sequences into an intron, in the primate specific retrogene RNF113B and two independent “intronization” events in the retrogene DCAF12L2, one in the common ancestor of primates and rodents and another one in the rodent lineage. Intron gain resulted from the origin of new splice variants, and both genes have two transcript forms, one with retained intron and one with the intron spliced out. Evolution of these genes, especially RNF113B, has been very dynamic and has been accompanied by several additional events including parental gene loss, secondary retroposition, and exaptation of transposable elements.
The majority of protein-coding genes in eukaryotes are interrupted by introns that are removed from the pre-mRNA by a RNA–protein complex called the spliceosome (Cavalier-Smith 1985; Crick 1979). Introns and the splicing machinery have been found in all eukaryotic species with fully sequenced genomes (Chow et al. 1977; Roy and Gilbert 2006). Comparative genomic studies have revealed striking conservation of intron positions in distant eukaryotes such as animals and plants (Fedorov et al. 2002; Rogozin et al. 2003; Carmel et al. 2007). On the other hand, many genome-wide comparisons of eukaryotic species demonstrated multiple intron losses and intron gains (Roy et al. 2003; Cho et al. 2004; Qiu et al. 2004; Coulombe-Huntington and Majewski 2007b; Li et al. 2009). However, it was found that intron gain is a very rare event in vertebrate evolution (Loh et al. 2007) and no intron gains into intact conserved coding regions of mammalian genes are known (Roy et al. 2003; Coulombe-Huntington and Majewski 2007a).
Comparative gene structure studies have not revealed any intron gain into existing exons in mammals. The only reported new introns were acquired, by and large, by either a fusion of retrogene with host genes or de novo from the genomic environment as a result of new exon capture (O'Neill et al. 1998; Vinckenbosch et al. 2006; Sela et al. 2007; Baertsch et al. 2008; Fablet et al. 2009). Here, we report two retrogenes, RNF113B and DCAF12, where the exon sequence was split by creation of a new intron as the result of mutations and emergence of new splice sites. The introns discovered by us represent cases of intron creation via recruitment of exonic sequence (intronization) proposed by Irimia et al. (2008) and Lahn and Page (1999).
RNF113A is a retrogene encoding a ring finger protein of unknown function and is present in the genomes of all vertebrates. Interestingly, in mammalian genomes, only intronless copy exist, whereas in all other vertebrates, a ten-exon parental gene is present and no retrogenes were detected. Genomic sequence analysis showed that there are two copies of RNF113 in primates, rodents, carnivores, and even-toed ungulates and only one in the genomes of the other mammals we studied. The first copy of RNF113 was retroposed into the intronic region of NDUFA1 gene in the genome of the mammalian ancestor. Following the retroposition, the parental gene was lost. This likely took place before the divergence of Prototheria (Monotremes) and Theria (Marsupials and Placentals) because in the genomes of all species representing these lineages, the multiexon form of RNF113 is absent. After the mammalian radiation the RNF113A retrogene was duplicated, by retropositions or segmental duplications, in several lineages. Analysis of genomic locations of these copies suggests that the duplication events were independent in each lineage. For example, in rodents, the RNF113 copy (RNF113A2) was inserted into an intron of the 2900006K08Rik gene, whereas the primate specific gene, RNF113B, was copied into an intron of the FARP1 gene. The primate specific duplication happened before Old World Monkeys and New World Monkeys diverged (fig. 1).
After the retroposition/duplication, the primate specific RNF113B gene underwent rapid evolution including intron gain. The presence of the intron is surprising, however, it is supported by several GenBank mRNA sequences (accession numbers: AF539427, BC025388, and BC017585). To confirm the existence of the intron and learn about its origin, we compared RNF113B sequences from available primate genomes (human, marmoset, macaque, orangutan, and chimpanzee) with sequences of other mammalian RNF113A genes. Sequence alignment revealed that the intron of RNF113B is not a de novo insertion but rather originated from the exonic sequence (fig. 2a). A double point mutation, AG → GT, generated the donor site (fig. 2a). The origin of acceptor site is not so clear. One possibility is that a point mutation, GG → AG, created acceptor site. Another option is that the acceptor site was brought during the exonization of L1 element, merged at the 3’ end of RNF113B (fig. 2b). The newly generated splice sites together with the branch site and the polypyrimidine tract likely enabled recognition of the new intron by the U2 spliceosome (fig. 2a). The 105 bp intron contains 59 nucleotides of previously coding sequence and 46 nucleotides from the 3’ UTR.
Generation of splice sites most probably occurred in the primate specific RNF113B copy since neither human RNF113A gene, which gave a rise to primate RNF113B, nor RNF113A genes from other mammals have AG or GT at the donor and acceptor positions. Splicing signals were formed before the Old World Monkeys and New Monkeys split. Interestingly, loss of the splicing boundaries subsequently converted the intron into a “retained intron” in some primates. In rhesus, for example, acceptor was lost due to a point mutation (AG → AA change) (fig. 1b).
The creation of splicing signals was accompanied not only by exaptation of an L1 element but also by exonization of an Alu element. The L1 element inserted within the 3′ end of the gene could have contributed the acceptor site and provided a new polyA signal used for the new splice variant (fig. 2a). The complete AluSx element transposed upstream the gene was exapted at the 5′ end and most probably delivered some regulatory elements.
Sequencing of the human RNF113B cDNA using primers flanking the intronic sequence revealed that RNF113B produces two variant transcripts. One variant has two exons, as described above, and the other one is a single exon transcript similar to RNF113A. Consequently, most primates have three transcripts of RNF113: one from the RNF113A retrogene and two from the RNF113B (fig. 2b). Rodents, cow, and dog have two transcripts, each coming from different copy of RNF113, and all other mammals have only one RNF113 transcript. The presence of the splice variants in the retrogene is very surprising and has only been reported once before (Lahn and Page 1999).
A second case involves DCAF12 (DDB1 and CUL4 associated factor 12), which encodes a WD repeat-containing protein that interacts with the COP9 signalosome (Jin et al. 2006). Although the gene is present in vertebrate and insect genomes, only placental mammals have retrocopies of this gene. One copy, DCAF12L2, has the same location in all placental mammals and therefore most likely was retroposed in the placental mammals ancestor. Another copy, DCAF12L1, is present only in Euarchontoglires (a clade which includes rodents and primates). It likely emerged as a result of tandem duplication of DCAF12L2 as it is located next to the DCAG12L1 gene. There were two events that changed the splicing pattern in DCAF12L2. First, an intronization event occurred in the common ancestor of primates and rodents. Second, an alternative donor site emerged in rodents only (fig. 3). The limited available data and sequence divergence make any conclusions in regard to the exact pattern of splice site evolution infeasible. However, there is convincing experimental evidence confirming both splicing events (fig. 3): splicing at the shared rodent–primate intron, boundaries are confirmed by two expressed sequence tags (ESTs) (AK034343 and AK047360), and usage of the rodent alternative donor site is confirmed by four ESTs (AK038557, BC068319, AK034472, and AK039767).
Numerous studies revealed a tendency of retrogenes to be expressed exclusively in testis. It was suggested that the hypertranscription present in the meiotic and postmeiotic spermatogenic cells makes possible transcription of DNA that is usually not transcribed. This may facilitate transcription of retrocopies in the testis during their early evolution (reviewed in (Kaessmann et al. 2009). Another hypothesis explains the high expression of retrogenes in testis by the fact that these are, in most cases, retrocopies of spermatogenesis-related genes located on the X chromosome. Because the X chromosome is inactivated during meiosis, retroposition to autosomes enables escape from inactivation and expression during spermatogenesis (Turner 2007).
The retroposition of both genes studied here, RNF113 and DCAF12, was in the opposite direction, from autosomes to chromosome X. In the case of RNF113, the parental gene is detectable by sequence similarity as an apparent pseudogene on chromosome 9. The parental multiexon DCAF12 gene is coincidentally also located on chromosome 9. RNF113A and both DCAF12 retrogenes are on chromosome X. We surveyed the expression pattern of all human RNF113 transcripts (one from RNF113A and two from RNF113B) in 16 human tissues (fig. 4) (for methods, see Supplementary Material online). RNF113A was expressed in all studied tissues, including testes. Interestingly, RNF113B exhibited tissue-specific splicing; while the unspliced form of RNF113B was expressed in all tissues but testis, the spliced variant was expressed in testis, prostate, thymus, and lung. Both RNF113B splice variants were present in thymus, prostate, and lung, but in all of these tissues, the form with the intron spliced out had much lower expression level than the single exon primary form. Relatively high expression of the new form of RNF113B, form with the intron spliced out, was observed only in testis.
According to the EST data, the human DCAF12 gene is widely expressed. EST sequences present in the dbEST database represent almost 40 libraries and show the highest expression in testis and trachea. The retrogene DCAF12L1 is expressed only in kidney and testis and a second human retrogene, DCAF12L2, is expressed in eye and testis. Therefore, both retrogenes show very different expression patterns than their parental genes, with very limited and low expression level and notable expression in testis.
Retroposition, a major mechanism for gene duplication, is an important process shaping the evolution of genomes (Brosius 1991; Marques et al. 2005). Our study confirms the unusual role of retrogenes in shaping the genomes and underscores the importance of mobile elements in evolution. It also reveals that retrogenes may be responsible for a wealth of species-specific features including species-specfic introns and splice variants.
Previous analyses of introns in the vertebrate genomes did not uncover any intron gain in mammals (Roy et al. 2003). Our study clearly shows that creation of introns has occurred during mammalian evolution. The failure of previous studies to find intron gains can be explained by the fact that they were focused on different intron gain mechanisms and did not consider exon intronization. In addition, they looked at conserved among studied species genes, while we focused on young and in many cases lineage-specific retrogenes.
Interestingly, the retrogenes studied here exhibit testis-specific expression typically associated with genes escaping from the X chromosome despite their opposite history (retroposition from autosome to X). This biased expression pattern may not be exclusively related to meiotic genes, sex chromosome inactivation, and dosage compensation (Marques et al. 2005; Vinckenbosch et al. 2006; Potrzebowski et al. 2008). The same pattern of high expression level in testis is observed in young, primate-specific splice variant of retrogene RNF113B as well as in both retroposed copies of DCAF12 retroposed on the human X chromosome. The older, unspliced variant of RNF113B, as well as an earlier retrocopy RNF113A, displays more diverse expression patterns. Therefore, testis-specific expression could be a common feature of all newly evolved transcripts regardless of their chromosomal localization and may reflect a transcriptional noise due to “hypertranscription” in testis, facilitating the activation of new transcripts (Kleene et al. 1998).
The small number of observed intron gain in retrogenes may reflect that this is a rare event. Alternatively, the low number of observations could reflect the difficulties in identification of such events. One major complication lies in annotation problems and the common expectation that retrogenes do not have introns. Genome-wide comparative studies currently underway have already showed that intron gain in retrogenes could be more frequent than we expected but that annotations remain a major obstacle in uncovering this phenomenon.
We thank Jurgen Brosius and two reviewers for their comments and insightful suggestions. I.B.R. was supported by the Intramural Research Program of the National Library of Medicine at National Institutes of Health/U.S. Department of Health and Human Services.