The recent availability of large amounts of genome sequence from diverse taxa has allowed for high-resolution mapping of syntenic chromosomal segment order in efforts to understand the evolutionary trajectory of specific genomic regions. Murphy et al [12
] examined orthologous genomic sequences of syntenic blocks among a broad array of eutherian species and found that breakpoint locations are often reused between divergent species and that these sites strongly correlated with centromere locations in several species. Ruiz-Herrera et al [25
] examined the Murphy et al. dataset and found that not only is there a link between breakpoints and centromeres in karyotype evolution, but EBs also coincide with fragile sites and chromosomal breakpoints identified in human cancers [25
]. These studies suggested that EBs might continue to carry "signals" of both past breakpoint activity as well as a propensity for further instability under cellular stress; however, these studies did not examine EB sequences in a phylogenetic context.
More recently, mapping the trajectory of chromosome segments along species phylogenies in marsupial lineages has shown that breakpoint reuse often coincides with centromere emergence [3
], lending support to the hypothesis that EB serve as latent centromeres [22
]. Thus, we can predict that the EBs characterized as latent centromeres might retain common sequence features between divergent taxa. Marsupialia offers an ideal system to study genomic rearrangements and breakpoint reuse; this infraclass represents one of the most well characterized mammalian lineages with respect to chromosome arrangement. Over 70% of extant species have been karyotyped ([32
] and reviewed in [7
]) and the chromosome trajectories of many families, genera and species have been determined (e.g. [3
]). With comparatively little marsupial sequence data available, cross-species reciprocal chromosome painting has been effectively used to delineate conserved chromosome segments (orthologous chromosome blocks) and to identify convergent breakpoint reuse [3
Our study utilizes a comparative sequencing approach to test the hypothesis that EBs and CEN share specific sequence features and that such features are retained during periods of genomic instability and species evolution. We have identified specific interspersed repeats, endogenous retroviruses (ERVs) and long interspersed nuclear elements (L1s), enriched in EBs and CEN. These particular groups of repetitive elements (ERVs and L1s) are also found at several breaks of synteny between human and gibbon [34
] as well as two breakpoints examined between human and chimp [35
]. We also show that the interspersed repeat distribution of CENs and EBs differs dramatically compared to that of a previously analyzed euchromatic region (the CFTR
]. In human tumor cell lines, chromosome 3 shows regions of recurrent instability. The distribution of repeats at these loci has a very similar increase of both L1 and ERV elements [36
Through BAC mapping and comparative sequence analyses, we show that the EB on tammar 1q is orthologous to human 14q32.33. This locus has been identified as an EB [25
], is known to undergo translocations associated with cancer [26
], and has been identified as a neocentromere [29
]. We have analyzed repeat content of the tammar EB and surrounding EU and compared them to the repeat distribution of the orthologous human region, 14q32.33, including the immunoglobulin heavy chain region (IGH). As in tammar, the human orthologous EB carries a significant enrichment of ERVs and L1s, with frequencies of both sequences similar to that observed for tammar CEN. These data suggest that repeat content defines distinct chromosome domains and is a conserved feature of mammalian genomes. Moreover, CEN and EBs are enriched for both ancient ERV and recent L1 activity, indicating these regional domains, and subsequent instability that manifests as chromosome rearrangement or centric shifts, is directly linked to the activity of mobile DNA. It is worth noting that the primary satellite sequence found in the Cetacea
is derived from an ancestral mammalian L1 element [37
The enrichment of ERVs, and specifically HERV-K retrotransposons, in 14q32.33 is of particular interest given that this class contains primate specific lineages of elements and thus must be recently derived. HERV-K retroviruses consist of 10 different families of human MMTV-like elements, denoted as HMLs 1–10 [38
]. Some of these families, such as HML-2, are characterized by recent activity in the genome and contain intact open reading frames (ORFs) that encode functional proteins [39
], while other families, such as HML-3 and HML-5, have not been active for tens of millions of years [30
]. The prominent element in the human breakpoint examined is denoted in Repbase as HERV-K22, an HML-5 element [38
]. Last active prior to the split of Old World and New World primates, this element would have integrated into this location long before hominoid divergence, and thus has been retained despite breakpoint activity in this region. Moreover, the integration of an HML-5 member in this region parallels an integration of another ancient HERV-K related element, KERV, in the orthologous region within the Metatherian lineage (Meu1q).
KERV, while ancient in origin, has retained a cellular function in active centromeres through recruitment of specific centromere proteins and production of novel small RNAs in marsupial and eutherian lineages [20
]. Likewise, transcription of HERV-K [41
] elements has also been retained, although functional coding sequences for either class of elements have not been identified nor has any involvement with cellular function been examined. Thus, not only is there a tight correlation between EBs and CEN as regional domains involved in genome rearrangement, instability and karyotypic evolution, there is a tight correlation between specific sequences found in these regions (i.e. HERV-K type elements). Two scenarios may explain the presence of these elements at orthologous EB: either HERV-K replaced KERV elements within a eutherian ancestor at the region orthologous to 14q32.33, or the KERV and HERV-K elements independently integrated into orthologous EB. Understanding the integration preference sites for each respective class may shed light on the order of integration events.
Given the predisposition of the EB on Meu1q and Hsa14q32.33 for continuous rearrangement through double-strand breaks and ENC formation within both marsupials and humans, the coincidence of specific classes of retroelements at these regions implies they may be integral to the underlying mechanism for prolonged instability. A recent study of double-strand repair mechanisms in yeast showed that those breaks that give rise to chromosome aberrations were repaired by homologous recombination (HR) between nonallelic Ty retrotransposons [42
]. In light of the finding that HR between nonallelic repeat elements contributed to a large portion of the structural variation in the human genome [43
], it is intriguing to consider that sustained activity of retroelements, not necessarily through transposition, but rather through an inherent propensity for HR between elements at distant genomic locations may contribute to both the evolutionary novelty of the genome but also to its innate instability.