Whole genome sequence data from an ever increasing number of organisms is providing increased potential into the understanding of the processes and mechanisms of genomic rearrangements that have occurred during animal evolution. Such insights have mainly been gained through genome comparisons of species that exhibit various degrees of phylogenetic relatedness on the tree of life. One evolutionary process that is believed to have had a significant impact on vertebrate evolution has been the occurrence of up to four rounds of whole genome duplications (WGD) in their ancestral past. It is now generally accepted that the first duplication event (1R) occurred at the base of vertebrate evolution and was followed by a 2R WGD that preceded the divergence of the Sarcopterygian lineages (including the lobe-finned fishes), but before the divergence of ray-finned (Actinopterygian) fishes at about 500 million years ago (MYA). These polyploidization events were later followed by an additional round of whole genome doubling in the ray-finned fish lineage (i.e., 3R) ~320–350 MYA [1
]. A gradual decay of the genomes subsequent to WGD, mainly through asymmetric gene losses and interchromosomal rearrangements had largely obliterated many of the traces of those ancient events [8
]. However, the availability of the whole genome sequence data from diverse organisms, have allowed scientists to infer the proto-karyotypes of not only the teleost ancestor but also the ancestor to all vertebrates [3
]. According to a recent model, 10 (denoted as chromosomes A'-J' in this study) and possibly up to 13 chromosomes, constituted the genome of the ancient vertebrate ancestor, prior to the 1R WGD [15
]. Two subsequent polyploidization events along with some major genome rearrangements, caused the number of chromosomes to increase to about 40–52 in various gnathostomes [15
]. In the ray-finned fish lineage, however, intensive fusions reduced the number of linkage groups to about 12–13 (denoted as chromosomes A-M) [3
]. Following the 3R duplication, a doubling of this chromosome number would be expected, and is indeed observed in many present day extant teleosts (i.e., modal linkage group numbers in teleosts are 24–25, 2n = 48–50) [e.g., [3
Among fishes, it has long been hypothesized that salmonids have originated from an autotetraploid ancestor (i.e., 4R) [1
]. The possession of a genome size and chromosome arm numbers that are approximately twice the number of those detected in closely related species (i.e., NF = 96–104), the observation of multivalent formation during meiosis, and the identification of many duplicated loci pairs that assign to the homeologous chromosome arms provided support for an autopolyploid origin. Furthermore, the observation of meiotic segregation patterns that match both disomic and tetrasomic ratios, is an indication that salmonids are still in the process of reverting back to the diploid state [1
]. Recent efforts in the construction of genetic and physical maps for various fishes in this family have resulted in the identification of many duplicated markers that map or assign to two different linkage groups, which likely arose from a single chromosome in the salmonid ancestor [e.g., [18
]]. Further, characterization of genes and expressed sequence tags (EST) with multiple copies that localize to different linkage groups, and also the phylogenetic relationship among these duplicates [e.g., [24
]] are all in accordance with the proposed evolutionary scenario suggested for this family.
A primary focus of many recent genomic studies in salmonids has been the identification of the 4R chromosomal segments [e.g., [19
]]. Although many of the expected homeologies (i.e., the most recent WGD paralogous chromosomal segments) have so far been identified in several of these fishes, the assignments are still incomplete for any one species. Furthermore, the association of the 4R duplicated homeologous regions to their ancestral counterparts (i.e., 3R and older chromosomal affinities, which we generally refer to as paralogous segments) is incompletely understood at present, although recent data on the pairwise associations of the 4R chromosomal segments in rainbow trout (Oncorhynchus mykiss
) and Atlantic salmon (Salmo salar
] support the proposed WGD evolutionary model for teleosts [14
]. Therefore, for every duplicated 3R chromosomal segment in zebrafish (Danio rerio
) and medaka (Oryzias latipes
), up to 4 whole-arm orthologous regions (i.e., two sets of homeologs) can be identified in salmonids [27
]. It is evident that information on these genomic arrangements would be of particular interest, as they may help to elucidate the inter-relationships associated with transcriptome and functional genomics studies, and they also provide more precise explanations regarding the distribution of duplicated regions throughout the genomes of vertebrates.
A main objective in this study is to identify segments of the rainbow trout genome with a possible shared ancestry, representative of not only the 4R WGD, but also of the earlier events. However, a major challenge in detecting anciently derived inter-chromosomal regions in any organism stems from the unbalanced gene losses between paralogous segments [8
]. Therefore, to partially correct for this uneven pseudogenization among paralogs, we mainly focused on genetically localizing a subset of conserved noncoding elements (CNE), with the assumption that the rate of retention between duplicated CNE should be greater than their up- or downstream target regions. It has been suggested that many CNE possess gene regulatory functions [28
] and genomic regions surrounding CNE blocks appear to undergo intense purifying selection, highlighting their potential adaptive importance [30
]. Hence examination of copy number and distribution of CNE elements within the salmonid genome may provide researchers with greater insights into the chromosomal affinities of more ancient paralogous chromosome arms.
CNE, some up to several hundreds of bases in length, have been reported among all classes of vertebrates with some elements showing greater sequence conservation or overlaps within certain lineages [31
]. Although, noncoding elements that were initially reported through the whole genome comparison between human (Homo sapiens
) and pufferfish (Takifugu rubripes
) appear to be highly preserved among all jawed vertebrates [28
], greater CNE divergence has been detected among teleost species, suggesting that the rates of sequence evolution may be somewhat accelerated in fishes [3
]. Interestingly, CNE are essentially absent from invertebrates and urochordates [32
] and only around 50 single copy elements have been identified in cephalochordates [34
]. It has been postulated that such a high inter-species sequence conservation, which often even exceeds those detected for protein coding regions [37
], is a likely consequence of negative selection, probably due to essential functional properties [38
]. Nonetheless, although the regulatory function of some CNE have been supported through in vivo
enhancer assays [e.g., [28
]], deletions of large genomic regions in mice that contain many conserved elements resulted in no detectable phenotypic variation [41
]. This suggests that at least a fraction of these constrained elements might not be functionally important. Counter to this interpretation, is the knowledge that many of these elements may have arisen through both segmental and WGD events and thus might exhibit an extensive redundancy in functional enhancer or silencer properties within the cis-regulatory motifs they possess. Hence multiple copies of related regulatory modules, some of which may be on the order of only 15–20 bp in length, may be scattered throughout the genome, making their complete elimination next to impossible.
The larger intact tracts of signature CNE elements are typically dispersed unevenly throughout the genome, with a tendency to congregate in clusters, usually proximate to genes involved in animal development [28
]. CNE can be located within the untranslated or the intronic regions of the genes, although a majority of them are found at distances from several hundred Kbp to over Mbp in either direction from their targeted gene sites, usually within the gene desert segments of the genome [29
]. These features make CNE extremely useful for comparative genomics studies, as they can be treated as 'anchor' markers to examine the relative distribution of duplicated chromosomal segments throughout the genomes of any study organism. Such 'anchors' can then be utilized to investigate the loss or retention of orthologous genes which are syntenic within these paralogous regions (i.e., among species conserved synteny studies).
In the present study, we first revisited the distribution of conserved noncoding elements in zebrafish, humans, and medaka in order to gain a better understanding of their genome wide characteristics in rainbow trout. Of the current teleost species with more complete genetic information, zebrafish, a member of the Ostariophysan lineage, is considered the most closely related to salmonids [44
]. This fish has a typical teleost karyotype of 50 chromosomes (i.e., n = 25). We characterized and mapped CNE elements located within each of the zebrafish linkage groups onto the genetic map of rainbow trout. We then inferred the possible chromosomal affinities of these elements in the ancient ray-finned fish ancestor prior to the 3R WGD. We also report a list of genes whose syntenic association to these elements appear to have remained unchanged across various vertebrate species that we investigated. The amplification efficiency of the newly developed CNE based primers were tested in two other salmonid species, Atlantic salmon and Arctic charr (Salvelinus alpinus
), where some polymorphic markers were further localized to their respective homologous chromosomes. Through the segregation analysis of duplicated CNE along with the synteny comparison with other organisms, we identified 53 duplicated segments within the rainbow trout genome, with possibly 40 of these regions related to the earliest vertebrate 1R, 2R or 3R WGD.