The aim of the present study was to determine the evolutionary history of the POMC
neuronal enhancer nPE2. We first demonstrated that nPE2 orthologs are highly conserved in their nucleotide sequence in all placental and nonplacental mammals, but absent in other vertebrates. We then performed a systematic search for nPE2 paralogs in all available mammalian genomes and identified three short sequences similar to opossum nPE2 within the opossum genome. The use of these four sequences as queries in further BLAST searches revealed that they are highly similar to various members of the marsupial CORE-SINE retroposon family MAR1. We named the use of progressive searches of genome databases to reconstruct the origin of functional novelties from evolutionary relics “in silico paleogenomics” to distinguish it from the term “paleogenomics,” which is more commonly used in genomic research involving DNA samples obtained from fossil specimens. Our findings are consistent with the hypothesis that an ancient CORE-SINE retroposon was mobilized into the POMC
locus and exapted as a neuronal enhancer in the lineage leading to mammals more than 170 MYA. Around 30 to 40 million years later, after the split that led to marsupials [22
], a group of CORE-SINEs now known as MAR1s started to colonize the marsupial genomes, remaining active until very recently (see Results
and ; also [20
]). This is in clear contrast to the evolution of CORE-SINEs in placental mammals, which lost transposable activity around 100 MYA and remain now as fossil sequences [28
]. The fact that nPE2 is more similar to MAR1s seems to be fortuitous, and suggests that MAR1s are more similar to the ancestral CORE-SINE that was exapted into nPE2 than all other members of the superfamily. The abundance of similar copies of MAR1s within marsupial genomes was key to uncovering the evolutionary origin of nPE2 and indicates that marsupial genomes represent a uniquely positioned source from which to trace the evolutionary origin of mammalian genes. Evidence that nPE2 derives from the exaptation of a CORE-SINE is based on the relatively high percentage of identity between opossum nPE2 and MAR1s (). The similarity is especially remarkable in the core region (59%) and even higher along the 45 bp of its 3′ end (71%). This level of identity is comparable to that reported between different MAR1s (MAR1a and MAR1b cores are 63% identical) and to an ancient LF-SINE exapted as a cell-specific enhancer of ISL1
, which are 61% identical in their most similar region [21
]. To our knowledge, the ISL1
enhancer and nPE2 are the sole functionally proven examples of enhancers whose sequences are derived from ancient retroposons, and nPE2 is the first one discovered to have originated from a member of the CORE-SINE family.
To dissect the regions of nPE2 involved in POMC
neuronal enhancer function, we performed a deletional analysis in transgenic mice and identified two essential nonadjacent 45-bp sequences: regions 1 and 3. Region 3 is almost absolutely conserved among all species ( and ), suggesting that the array of transcription factors binding to it has probably been constant since the origin of mammals. Interestingly, the 5′ and 3′ halves of region 3 seem to be mutually redundant, since they can be independently removed without impairing reporter gene expression in hypothalamic POMC neurons (deletion of regions 2 or 4). The presence of two A + T-rich motifs (AATTAAAA and AATTGAAA) with potential binding sites for homeodomain transcription factors in each half of region 3 is provocative. In contrast to region 3, the essential region 1 admits many base substitutions, microinsertions, and microdeletions ( and ). However, it is well known that cis
-acting elements can differ in sequence and still play similar functions, either due to degeneracy in binding site specificity [38
] or compensatory mutations in other sites [39
]. Region 1 is derived from the 5′ tRNA-like portion of the consensus MAR1, whereas region 3 is derived from the core. This observation is in agreement with other examples of exaptation showing that functionally relevant SINE-derived sequences may come from different portions of the original retroelement [17
]. Based on our findings, it is difficult to know if the CORE-SINE inserted upstream of POMC
functioned as an enhancer immediately upon its insertion, as proposed for some Alu elements that carry potential binding sites for nuclear receptors [40
]. Alternatively, the retroposon insertion initially provided adequate raw material for the accumulation of favorable mutations until it evolved into a novel neuronal POMC
enhancer and became fixed in the lineage leading to mammals, before 170 MYA [22
Although nPE2 is a mammalian novelty, all jawed vertebrates studied to date, including birds, amphibians, and fishes, express POMC in ventral hypothalamic neurons, suggesting that an nPE2-independent regulatory mechanism must control neuronal POMC expression in other vertebrates. This is consistent with our recent findings showing that the entire 5′ flanking region of POMC from the pufferfish Tetraodon nigroviridis is capable of directing the expression of a reporter gene to POMC pituitary cells but not to POMC hypothalamic neurons of transgenic mice (unpublished data). The ability of nonmammalian vertebrates to express POMC in ventral hypothalamic neurons suggests that the appearance of nPE2 probably replaced the function of an earlier POMC neuronal enhancer. This puzzle will be resolved when neuronal POMC regulatory elements and their cognate trans-acting factors from other vertebrates are identified.
Another important conclusion from our study is that exaptation of CORE-SINEs is probably not restricted to nPE2. From several thousand exonic, intronic, and intergenic sequences that we found in the human genome to be derived from the core region of CORE-SINE retroposons, nine of them constitute strongly suggestive examples of exaptation since they are highly conserved among all mammalian ortholog loci. There is a growing list of SINE retroposition events that may have contributed to evolutionary novelties in mammals [9
], but the vast majority of reported examples correspond to lineage-specific SINEs like Alu and B1 elements present in the primate and rodent genomes, respectively. Since Alu and B1 retrotransposition events are relatively modern, their derived sequences are likely to be easily recognized. However, not all these cases should be considered examples of exaptation until novel adaptive functions followed by purifying selection are confirmed.
More recently, several high-throughput studies detected the presence of transposed element sequences that are likely to have been exapted since they are under purifying selection, although their functional properties have not yet been tested [16
]. For example, an ancient SINE family that was active in amniotes (mammals, birds, and reptiles) was discovered and named AmnSINE [18
]. More than 1,000 AmnSINE-derived instances were found in the human genome and around 10% of them have been under purifying selection in mammals and likely contributed to adaptive novelties in this class. Another recent study demonstrated the existence of thousands of human transposed element fragments under strong purifying selection mostly located near developmental genes [16
]. Last year, the discovery was reported of several ultraconserved functional sequences in terrestrial vertebrate genomes that originated from ancient exaptation events of a LF-SINE, which had been active until recently in the living fossil fish coelacanth [21
]. Unlike the case of nPE2, recognition of those elements as derived from a LF-SINE was facilitated by the remarkably high level of conservation between the functional tetrapod sequences and the coelacanth retroposon, which must have diverged around 410 MYA.
In summary, our study documents the evolutionary history of a mammalian regulatory element that originated from an ancient retroposition event. The difficulty in detecting the origin of nPE2 as an exapted CORE-SINE retroposon illustrates the underestimation of this phenomenon and encourages the finding of the many more thousands of examples of retroposon-derived functional elements still hidden within the genomes and whose discovery will help us to better understand the dynamics of gene evolution and, at a larger scale, the origin of macroevolutionary novelties that led to the appearance of new species, orders, or classes.