|Home | About | Journals | Submit | Contact Us | Français|
The evolutionary history of α-satellite DNA, the major component of primate centromeres, is hardly defined because of the difficulty in its sequence assembly and its rapid evolution when compared with most genomic sequences. By using several approaches, we have cloned, sequenced, and characterized α-satellite sequences from two species representing critical nodes in the primate phylogeny: the white-cheeked gibbon, a lesser ape, and marmoset, a New World monkey. Sequence analyses demonstrate that white-cheeked gibbon and marmoset α-satellite sequences are formed by units of ~171 and ~342 bp, respectively, and they both lack the high-order structure found in humans and great apes. Fluorescent in situ hybridization characterization shows a broad dispersal of α-satellite in the white-cheeked gibbon genome including centromeric, telomeric, and chromosomal interstitial localizations. On the other hand, centromeres in marmoset appear organized in highly divergent dimers roughly of 342 bp that show a similarity between monomers much lower than previously reported dimers, thus representing an ancient dimeric structure.
All these data shed light on the evolution of the centromeric sequences in Primates. Our results suggest radical differences in the structure, organization, and evolution of α-satellite DNA among different primate species, supporting the notion that 1) all the centromeric sequence in Primates evolved by genomic amplification, unequal crossover, and sequence homogenization using a 171 bp monomer as the basic seeding unit and 2) centromeric function is linked to relatively short repeated elements, more than higher-order structure.
Moreover, our data indicate that complex higher-order repeat structures are a peculiarity of the hominid lineage, showing the more complex organization in humans.
The centromere is a highly specialized region of a chromosome essential for the correct chromosome segregation during mitosis and meiosis in eukaryotic cells. Primate centromeres are mainly composed of repeated DNA, known as α-satellite DNA, made up of a basic 171-bp unit organized as tandemly repeated units (Maio 1971; Manuelidis 1978). Human α-satellite has been classified into two types according to its organization and sequence properties: higher-order α-satellite and monomeric α-satellite.
The higher-order structure is based on multiple copies of the 171 bp monomers, assembled into subfamilies at constant unit periodicity. In each subfamily, monomers of the same unit (1a, 1b) differ greatly in primary sequence and are not necessarily any closer in sequence similarity than each is to monomers from different subfamilies. In contrast, monomers within different units at specific periodic distance are virtually identical, sharing high sequence similarity (<2% sequence divergence) (1a–2a) (fig. 1). Higher-order structures are composed of these monomers organized into multimeric repeat units ranging in size from 2 to 5 Mb. Organization and unit periodicity are specific to each human centromere (Willard and Waye 1987b; Lee et al. 1997), or to a small group of chromosomes, identifying the different suprachromosomal families (SFs) (Choo et al. 1988; Jorgensen et al. 1988).
The monomeric α-satellite has no detectable higher-order periodicity and its monomers are far less homogeneous than the higher-order repeat (HOR) units (Warburton and Willard 1990; Alexandrov et al. 2001; Rudd and Willard 2004). Phylogenetic analyses suggested that the higher-order α-satellite DNA emerged more recently than the monomeric repeat (Alkan et al. 2004). Recent hypotheses state that the higher-order α-satellite evolved from ancestral arrays of monomeric α-satellite and was subsequently transposed to the centromeric regions of all great-ape chromosomes (Warburton et al. 1996; Alexandrov et al. 2001; Schueler et al. 2001, 2005; Kazakov et al. 2003).
α-Satellite DNA, like other tandemly repeated sequences, undergoes concerted evolution, showing greater similarity within a species than between species (Willard and Waye 1987a). The evolutionary process is known as molecular drive and includes mechanisms such as unequal crossing-over, gene conversion, and transposition (Dover 1982). Thanks to these molecular mechanisms, the structure and genomic organization of centromeric DNA can change very rapidly. Fluorescent in situ hybridization (FISH) studies with human chromosome–specific α-satellite probes against great-ape chromosomes, in fact, have demonstrated that only the organization of the X chromosome α-satellite subset is conserved among closely related species (Baldini et al. 1992; Archidiacono et al. 1995). Furthermore, the rapid α-satellite DNA evolution has been confirmed comparing its organization among Primates. For example, every human α-satellite SF map to nonorthologous chromosomes in chimpanzee, despite the fact that alphoid sequences in human and chimpanzee share high homology. Similarly, comparisons between ape and Old World monkey α-satellite DNA indicate two radically distinct patterns of centromeric organization and chromosome distribution (Haaf and Willard 1998).
Due to this rapid diversification and complex structure, studies on centromeric sequence and organization have been uncoupled from genomewide efforts to sequence genomes. In fact, for each human chromosome assembly, the largest gaps correspond to the centromere gap located between the most proximal p and q arm contigs (Rudd and Willard 2004). For other primate and mammalian genomes, the location and sequence of centromere repeats often remain uncharacterized due to the difficulties in assembling these portions of the genome from whole-genome shotgun sequence (WGSS).
In order to gain an understanding of the evolution, biology, and organization of centromeric sequences, we have cloned and characterized α-satellite sequences from two different primate species: a lesser ape, the white-cheeked gibbon (Nomascus leucogenys, NLE), and a New World monkey (NWM), the common marmoset (Callitrix jacchus, CJA), using two complementary approaches. Our results reveal new features in the organization and evolution of centromeric sequences: 1) Both NLE and CJA alphoid sequences lack HOR or subfamily organization thus supporting the hypothesis that HOR structure arose specifically in the great ape–human lineage of evolution (Alkan et al. 2004); 2) CJA sequence analysis reveals a dimeric structure of ~342 bp largely different from the previously reported dimeric structure in other Primates; and 3) in NLE alphoid sequences are detected at the centromeres, telomeric, and interstitial regions. We hypothesize that these noncentromeric loci enriched in repetitive alphoid sequences might represent regions of evolutionary genomic instability and partially could explain the high evolutionary rearrangement rates of white-cheeked gibbon karyotype.
Metaphase preparations were obtained from a lymphoblastoid cell line of C. jacchus (CJA) and N. leucogenys (NLE), kindly provided by S. Muller (Munchen). Human (HSA) metaphase spreads were prepared from Phytohemagglutinin-stimulated peripheral lymphocytes of normal donors by standard procedures.
DNA extraction from BACs and plasmids has already been reported (Ventura et al. 2001). FISH experiments were essentially performed as previously described (Ventura et al. 2003). Briefly, DNA probes were directly labeled with Cy3-dUTP (Perkin–Elmer) or Fluorescein-Deoxicitidinetriphospate (dCTP) (Fermentas) by nick translation. Two hundred nanograms of labeled probe was used for the FISH experiments. Hybridization was performed at 37 °C in 2× sodium chloride, sodium citrate (SSC), 50% (v/v) formamide, 10% (w/v) dextran sulfate, 5 mg of COT1 DNA (Roche), and 3 mg of sonicated salmon sperm DNA in a volume of 10 μl. Posthybridization washing was at 60 °C in 0.1× SSC (three times, high stringency). Washes of FISH experiments were performed at lower stringency: 37 °C in 2× SSC, 50% formamide (X3), followed by washes at 42 °C in 2× SSC (X3).
Digital images were obtained using a Leica DMRXA epifluorescence microscope equipped with a cooled CCD camera (Princeton Instruments). Cy3 (red), fluorescein (green), and 4′,6-diamidino-2-phenylindole (DAPI) (blue) fluorescence signals, detected with specific filters and recorded separately as grayscale images. Pseudocoloring and merging of images were performed using Adobe PhotoShop software.
DNA probes were directly labeled with Cy5-dUTP by PCR labeling; 200 ng of labeled probe was used for the FISH experiments. The use of PCR labeling avoids the possible contamination from genomic DNA by nick translation labeling of PCR products.
PCR labeling was carried out in a final volume of 20 μl that contained 100 ng PCR product, 2.5 μl reaction buffer 10×, 2 μl MgCl2 50 mM, 0.5 μl each primer 10 μM, 0.5 μl dACG 2mM, 2.5 Cy5-dUTP 1 mM, 5 μl BSA 2%, and 0.3 μl Taq polymerase 5 U/μl.
Library-hybridization was carried out according to the protocol available at CHORI BACPAC resources (http://bacpac.chori.org/highdensity.htm). The CH271 segment 1 represents a 5.0-fold clone coverage library (http://www.chori.org/bacpac/).
Human genomic DNA and common marmoset genomic DNA were obtained from human lymphoblast cell lines by standard methods.
α27 (CATCACAAAGAAGTTTCTGAGAATGCTTC) and α30 (TGCATTCAACTCACAGAGTTGAACCTTCC) primers were used to amplify genomic DNA by Polymerase Chain Reactions. They were obtained from the most conserved regions of human alphoid consensus (Waye and Willard 1986; Choo et al. 1991).
The PCR cycling parameters used were as follows: 2 min initial denaturation at 94 °C, followed by 10 cycles of: 95 °C for 15 s, 60 °C for 30 s, and 72 °C for 1 min; followed by 20 cycles of 94 °C for 15 s, 58 °C for 30 s, and 72 °C for 1 min (20 s more each cycle). Final extension was at 72 °C for 7 min (and then at hold 12 °C).
The reaction mixture consisted of 5 μl dNTPs (10×), 0.5 μl each primer (10 μM), 0.3 μl Platinum Taq DNA polymerase (5 U/μl), 1.5 μl MgCl2 (50 mM), 5 μl reaction buffer (Invitrogen) (10×), 3 μl of DNA template (50 ng/μl), and water up to 25 μl.
PCR products were analyzed by 1% agarose gel electrophoresis. They were labeled and used as a probe for FISH experiments on HSA and CJA metaphase spreads.
CJA alphoid cloned sequences were analyzed using the NCBI Blast 2 Sequences tool (http://blast.ncbi.nlm.nih.gov/bl2seq/wblast2.cgi) (BlastN program), using 1 as reward for a match, −2 as penalty for a mismatch, and 5 and 2 as open and extension gap penalties. We identified a conserved T-rich motif every 171 bp, GTTTTG(A, T or/)GTTTTAGA, and we used this motif to distinguish in CJA alphoid sequences the monomers of 171 bp. The monomers of 342 and 171 bp were multi-aligned using ClustalW algorithm, and consensus sequences were extracted using a modified version of the MaM software (Alkan et al. 2005). MaM returned one character for every column of the ClustalW alignment by “compressing” all the information in a given base position into a single character. For example, if all the bases in a column are G, then the consensus character for that position is G. However, if there are substitutions in a column, then all observed characters are encoded into a single marker: Y for pyrimidines (C or T), R for purines (A or G), N for any (A, C, G, and T), etc.
All pairwise sequence alignments for NLE sequences were performed with an in-house implementation of the Needleman–Wunsch global alignment algorithm (Needleman and Wunsch 1970). The divergence of two sequences is then computed by calculating the ratio of the Hamming distance of the aligned sequences, and the alignment length (Hamming 1950). For all pairs of sequences, the divergence ratio calculation is also repeated for the reverse complement of the second sequence.
C410 is a specific NLE alphoid DNA obtained by chromatin immunoprecipitation of NLE lymphoblastoid cells with rabbit polyclonal antibodies directed against human anti-CENP-C centromeric protein (S. Trazzi et al., manuscript in preparation).
PCR products were cloned in pCR-XL-TOPO using the standard protocol Topo cloning XL PCR kit (invitrogen).
Genomic DNAs from marmoset lymphoblastoid cell lines were prepared by following standard procedures (Maniatis et al. 1982). Endonuclease digestions were performed using a 4-fold excess of enzyme under the conditions suggested by the suppliers. Gel electrophoresis was performed in 1× tris-acetate (TAE) (1× TAE = 40 mM Tris-acetate, 1 mM ethylenediaminetetraacetic acid, EDTA). Genomic DNAs were run in a 0.8% agarose gel for 16–18 h, denatured, and DNA transferred to Hybond membrane (Amersham), using as transfer buffer 20× SSC (1× SSC = 150 mM sodium chloride, 15 mM sodium citrate, pH 7). Clone inserts (50 ng) were labeled with 32P-dCTP (3,000 Ci/mmol; Amersham) by using random oligomer priming. Filters were exposed and developed using storm imaging system.
Consensus sequence obtained from gibbon BAC end sequences: CACTTGCAGTTTCTACAGAAAGAGTGTTTCAAAACTGCTCAATCAAAAGTAAGGTTCAACTCTGTTAGTTGAATGCACAGAACAGAAAGAAGTTTCACAGAATGCTTCTGTGTAGTTTTTATTTGAAGATATTCCTTTTTCCACTATAGGCCTCTTAGCGCTCTGAATGTCCACTTGCAGTTTCTACAGAAAGAGTGTTTCAGAACTGCTCAATCAAAAGTAAGGTTCAACTCTGTTAGTTGAATGCACAGAACAGAAAGAAGTTTCACAGAATGCTTCTGTGTAGTTTTTATTTGAAGATATTCCTTTTTCCACTATAGGCCTCTTAGCGCTCTGAATGTCCACTTGCAGTTTCTACAGAAAGAGTGTTTCAGAACTGCTCAATCAAAAGTAAGGTTCAACTCTGTTAGTTGAATGCACAGAACAGAAAGAAGTTTCACAGAATGCTTCTGTGTAGTTTTTATTTGAAGATATTCCTTTTTCCACT primers designed on gibbon consensus sequence: α NLE_F: TCAACTCTGTTAGTTGAATGCACA and α NLE_R: CTCTTTCTGTAGAAACTGCAAGTG.
Insert sequences from gibbon plasmid clones: pA_12FJ346627, pGAMMA_3FJ346628, pGAMMA_5FJ346629, pGAMMA_6FJ346630, pGAMMA_8FJ346631, pGAMMA_23 FJ346632, pGAMMA_24 FJ346633, pGAMMA_34 FJ346634, pGAMMA_35 FJ346635,p GAMMA_36 FJ346636, pGAMMA_37 FJ346637, pGAMMA_38 FJ346638, pGAMMA_41 FJ346639, pGAMMA_42 FJ346640, pGAMMA_43 FJ346641, pK_64 FJ346642, pK_19 FJ346643, pK_23, FJ346644, pK_25 FJ346645, pK_30 FJ346646, pK_41 FJ346647, pK_43 FJ346648, pK_51 FJ346649, pK_69 FJ346650, pK_1 FJ346651, pK_3 FJ346652, pK_7 FJ346653, pK_20 FJ346654, pK_21 FJ346655, pK_34 FJ346656, pK_63 FJ346657, and pK_68 FJ346658;
Gibbon BAC end sequences: CH271_0010P22_B1 FJ346579, CH271_0010P22_G1 FJ346580, CH271_0046E21_G1 FJ346581,CH271_0046E21_B1 FJ346582, CH271_0002J18_G1 FJ346583,CH271_0002J18_B1 FJ346584, CH271_0029E05_B1 FJ346585, CH271_0029E05_G1 FJ346586, CH271_0032G05_G1 FJ346587, CH271_0032G05_B1 FJ346588, CH271_0036I10_B1 FJ346589, CH271_0036I10_G1 FJ346590, CH271_0023A02_G1 FJ346591, CH271_0023A02_B1 FJ346592, CH271_0059L04_G1 FJ346593, CH271_0059L04_B1 FJ346594, CH271_0083K12_B1 FJ346595, CH271_0083K12_G1 FJ346596, CH271_0042E17_G1 FJ346597, CH271_0042E17_B1 FJ346598, CH271_0005P01_G1 FJ346599, CH271_0005P01_B1 FJ346600, CH271_0015O18_B1 FJ346601, CH271_0015O18_G1 FJ346602, CH271_0039I23_B1 FJ346603, CH271_0039I23_G1 FJ346604, CH271_0054E20_B1 FJ346605, CH271_0054E20_G1 FJ346606, CH271_0096M12_B1 FJ346607, CH271_0096M12_G1 FJ346608, CH271_0015L04_G1 FJ346609, CH271_0015L04_B1 FJ346610, CH271_0024A08_B1 FJ346611, CH271_0024A08_G1 FJ346612, CH271_0048A09_B1 FJ346613, CH271_0048A09_G1 FJ346614, CH271_0084N15_G1 FJ346615, CH271_0084N15_B1 FJ346616, CH271_0072O18_G1 FJ346617, CH271_0072O18_B1 FJ346618, CH271_0047O10_B1 FJ346619, CH271_0047O10_G1 FJ346620, CH271_0027P04_b1 FJ346621, CH271_0027P04_G1 FJ346622, CH271_0091K12_G1 FJ346623, CH271_0091K12_B1 FJ346624, CH271_0007O15_G1 FJ346625, and CH271_0007O15_B1 FJ346626).
Insert sequences from marmoset plasmid clones: C1.1.19 FJ867326, C1.1.74 FJ867327, C2.1.37 FJ867328, C2.1.65 FJ867329, C2.1.73 FJ867330, C3.1.5 FJ867331, C3.1.8 FJ867332, C4.1.13 FJ867333, C4.2.21 FJ867334, C5.1.12 FJ867335, C6.1.1 FJ867336, C6.1.29 FJ867337, C7.1.3 FJ867338, and C7.1.4 FJ867339.
The goal of this work was to characterize the structure and map the location of alphoid sequences in the N. leucogenys (NLE) and C. jacchus (CJA) genomes. A variety of complementary experimental and computational methods was employed. We initially performed α-satellite PCR on genomic DNA from NLE, CJA, and human (HSA) using human alphoid primers (α27/α30). The PCR products, α27/α30-NLE, α27/α30-CJA, and α27/α30-HSA, were used as probes to perform cross-species (NLE, CJA, and HSA) FISH experiments to validate the centromeric localization and find sequence homology among species. The probe α27/α30-NLE tested on NLE metaphases generated consistent signals for both centromeres and telomeres of all white-cheeked gibbon chromosomes as well as interstitial heterochromatic regions on chromosomes 3, 5, 9, and 14 (location defined according to NLE standard karyotype by Rens et al. 2001). In humans, α27/α30-NLE gave strong signals on chromosomes 11, 17, and X; weak signals on the other chromosomes; and no signals on the Y chromosome (fig. 2A and B). No signals were detected on marmoset (CJA) metaphase chromosomes using α27/α30-NLE probe. No differences were observed in signal pattern or distribution under low or high stringency hybridization conditions (table 1). The probe α27/α30-CJA did not show any signal in all the tested species suggesting high sequence divergence between human and CJA. The α27/α30-HSA showed signals only in human, hybridizing to all the centromeres except for the Y chromosome (fig. 2C, table 1).
Because our initial results suggested considerable divergence between human and CJA centromeric sequences, we used CJA-specific α-satellite to identify and characterize centromeres in marmoset.
To analyze in detail the sequence organization of NLE centromeric DNA, we subcloned and sequenced 32 clones from α27/α30-NLE α-satellite PCR (sequences accession numbers are reported at the end of this paragraph). Analysis of the sequences by RepeatMasker indicated that all of them were composed entirely of α-satellite. We performed comparative FISH experiments using each of the 32 clones as probes on both white-cheeked gibbon and human chromosome metaphase spreads. Three different hybridization patterns were observed for NLE: 1) exclusively centromeric with variable signal intensity (10/32); 2) to the centromeres and telomeres of all the chromosomes and interstitial regions on chromosomes 3, 5, 9, and 14 (12/32); and 3) to the telomeres of all the chromosomes and interstitial regions on chromosomes 3, 5, 9, and 14 (10/32). The chromosome mapping for each clone is shown in supplementary table S1, Supplementary Material online (the clone p_K68 showed Y chromosome specificity). Identical patterns were observed for clones mapping to centromeric regions at low and high stringency conditions. None of the gibbon-derived clones generated signals on HSA metaphases.
The localization of satellite at interstitial regions was previously reported by Chen et al. (2007), but their mapping is inconsistent with our results based on the standard N. leucogenys karyotype (Rens et al. 2001). We located the interstitial signals to chromosomes 3, 5, 9, and 14, whereas Chen et al. reported interstitial signals for chromosomes 3, 5, 8, and 11. To solve this discrepancy in mapping data, we performed cohybridization experiments on NLE using the centromeric probe pGAMMA_41 showing interstitial signals, and human and NLE probes at known mapping (Roberto et al.). Such experiments helped us to establish positively the identities of the NLE chromosomes involved in the rearrangements (fig. 2E).
Due to the mapping of alphoid sequences to telomeric regions, the α-satellite telomeric probe pK_7 was further used in a cohybridization experiment with a telomeric-derived ttaggg probe (telomere PNA FISH, kit Cy3, DAKO) (fig. 2D). At the level of metaphase resolution, the signals completely overlapped even though sequence analysis of the clone pK_7 did not reveal any similarity with telomeric sequences.
In order to explain the different chromosomal locations of the NLE alphoid sequences, we selected two clones for each hybridization pattern (six in total) for restriction analysis (pGamma_35, pGamma_43 as centromeric and telomeric clones; pK_23, pK_7 as telomeric clones; and pK_19, pGamma_8 as centromeric clones). All showed identical HaeIII restriction digest patterns displaying bands in multiples of 171 bp in length, further supporting their alphoid nature. We compared these patterns with those obtained by Southern blot analysis of the NLE genome. We obtained the same identical characteristic HaeIII ladder pattern for three different probes (PCR product α27/α30-NLE, the centromeric clone [pGamma_8] and the centromeric–telomeric clone [pGamma_35]) (data not shown).
Further, we generated a multiple sequence alignment of the 32 cloned sequences. For each alignment, a divergence score ranging from 0 (sequence identity) to 0.882 (highest sequence divergence) was obtained (supplementary table S2, Supplementary Material online). Notably, the degree of sequence identity did not correspond with FISH hybridization pattern. Both high and low sequence similarity clones had the same mapping pattern as well as those showing different hybridization patterns.
To avoid potential biases in the PCR amplification, we screened the large-insert genomic gibbon BAC library CH271 segment 1 (http://www.chori.org/bacpac/) for clones containing centromeric DNA using three different probes: α27/α30-NLE, α27/α30-HSA, and C410. The latter was a specific NLE alphoid DNA obtained by immunoprecipitation with antibody against CENP-C centromeric protein and was used as centromeric positive control in our experiments. These three different hybridization experiments gave the same positive clones (n = 422).
Seventy-four BACs, corresponding to the strongest hybridization signals, were selected and used in FISH experiments on NLE chromosomes under high stringency conditions as well as on HSA under low stringency conditions. These BAC probes gave strikingly different hybridization patterns on NLE chromosomes: telomeres, centromeres, both centromeres and telomeres, telomeres and interstitial regions, or centromeres and interstitial regions. Only a small subset of clones showed signals at centromeric regions on human chromosomes (supplementary table S3, Supplementary Material online). We analyzed BAC end sequences generated from a subset of these (n = 140 end sequences; see Materials and Methods) using RepeatMasker (sequence accession numbers are reported at the end of this paragraph): 110/140 detected alphoid sequences, 20/140 detected other repetitive elements (long interspersed nuclear elements, short interspersed nuclear elements, and associated tandem repeats) and 10/140 did not show any similarity with known classes of repetitive sequences (supplementary table S4, Supplementary Material online). These latter sequences were analyzed by Blast versus human genome and showed mapping to the human region 3p12.3 and to the pericentromeric regions 22q11.1 and 20p11.1 that correspond, respectively, to the NLE chromosomes 21, 7, and 13 pericentromeric regions (supplementary table S5, Supplementary Material online). All of them mapped in segmental duplications as reported by UCSC genome Browser.
We specifically searched for NLE HOR patterns by comparing (Blast 2) the α-satellite sequences generated from the 110 bac end sequences and the longest sequences obtained from the genomic DNA subclones (the centromeric and telomeric clones pGamma_35 and pGamma_43, the telomeric clones pK_23 and pK_7, and the centromeric clones pK_19 and pGamma_8). The first 171 bp of each selected sequence was used as query versus the complete sequence from which the 171-bp monomeric unit was derived. High similarity percentage ranging 76–84% was found between the extracted block of 171 bp and each 171-bp monomeric repeats in the complete sequence, but no periodicity was discovered (supplementary table S6, Supplementary Material online), suggesting an alphoid organization characterized by tandemly repeated monomeric units without HOR.
Further support for the lack of HOR within white-cheeked gibbon was obtained by Blast 2 sequence analysis on known HOR structures in macaque and human alphoid sequences. The analysis was carried out on two macaque dimeric sequences ch250-379m3-sp6 (Pike et al. 1986) and ch250-317d4-sp6 (the latter sequence was obtained by a Blast of the clone ch250-379M3-sp6 against the trace archive, http://www.ncbi.nlm.nih.gov/BLAST/Blast.cgi?PAGE=Nucleotides&PROGRAM=blastn&BLAST_SPEC=TraceArchive&BLAST_PROGRAMS=megaBlast&PAGE_TYPE=BlastSearch) and, respectively, on pentameric, dimeric, and monomeric human clones pHS53 (Zaitsev and Rogaev 1986), pSE16-2 (Alexandrov et al. 1993a), and Z12013 (Alexandrov et al. 1993b). The analysis performed on the sequences of ch250-317d4-sp6, pHS53, and pSE16-2, which contain HORs, showed a recurrent trend with a periodicity of 2, 5, and 2, respectively, of the similarity percentage, between the first block of ~171 bp and the other 171-bp blocks inside the complete sequence (supplementary table S7, Supplementary Material online). The Z12013 Blast 2 sequence output, instead, showed a consistent and similar percentage among all the 171-bp units present in the complete sequence (supplementary table S7, Supplementary Material online), the same result obtained for the NLE alphoid sequences. Further sequence comparisons were performed between white-cheeked gibbon consensus and human consensus sequences and revealed sequence identities between 80.1% and 91.2% (supplementary table S8, Supplementary Material online).
Because the use of human primers in our approach could lead to the generation of products that are not entirely of NLE origin, we generated gibbon specific primers from NLE consensus sequences obtained by multi-alignment of all gibbon BES with Megalign software. These primers were tested by PCR on NLE genomic DNA and the amplification product (αBES) was used as probe in FISH experiments on NLE and HSA metaphase. Under low and high stringency conditions, signals for both centromeres and telomeres of all white-cheeked gibbon chromosomes as well as interstitial heterochromatic regions of chromosome 3, 5, 9, and 14 were detected, whereas no signals were detected on human metaphase chromosomes (table 1). The different FISH results obtained with α27/α30-NLE and αBES probes on human metaphases could be explained by different primer origin, human and NLE specific, respectively.
Because the family of the gibbons (Hylobatidae) is divided into four genera: Hylobates, Hoolock, Nomascus, and Symphalangus (Geissmann 2002; Mootnick and Groves 2005), we tested the isolated N. leucogenys centromeric sequences on Hylobates lar. We performed FISH experiments using α27/α30-NLE and αBES: Only signals at centromeric level were observed, thus showing that alphoid centromeric sequences between Hylobates and Nomascus genera are shared, but telomeric and interstitial localizations of alphoid sequences are specific of NLE. The unavailability of samples for the other genera prevented us to define the organization in the other genera.
The gibbon karyotype is known to be extensively rearranged when compared with the human and to the ancestral primate karyotype. Evolutionary breakpoint (EB) refinement of the white-cheeked gibbon (N. leucogenys, NLE) has been performed by Roberto et al. with respect to the human genome. They provided a detailed clone framework map of the gibbon genome and refine the location of 86 EBs to <1-Mb resolution. Comparisons of NLE breakpoints with those of other gibbon species revealed variability in the position, suggesting that chromosomal rearrangement has been a longstanding property of this particular ape lineage. Using the list of refined breakpoints by Roberto et al. (2007), we found that the location of our clones giving signals on interstitial alphoid regions matched to four EBs on chromosomes 3, 5, 9 (evolutionary reciprocal translocations), and 14 (evolutionary inversion) NLE specific. In this regard, two or three color FISH experiments were performed using human and white-cheeked gibbon BAC clones to identify exactly the EB with our alphoid clone. At the cytogenetic level, we found a perfect overlapping between the interstitial signals and the mentioned EBs. No specific NLE clone was reported by Roberto et al. for chromosome 14 in the white-cheeked gibbon; for this reason, only the human clone (RP11-133M22) was used to compare the EB and interstitial α-satellite locations on NLE14 (supplementary table S9, Supplementary Material online, and fig. 2E).
We compared the localizations of interstitial heterochromatic blocks and previously reported ancestral centromeres or neocentromeres (Cardone et al. 2002, 2006, 2007, 2008; Ventura et al. 2004; Misceo et al. 2005; Carbone et al. 2006a; Ventura et al. 2007; Stanyon et al. 2008), but no evidence of colocalization was found.
(Insert sequences from plasmid clones: pA_12FJ346627, pGAMMA_3FJ346628, pGAMMA_5FJ346629, pGAMMA_6FJ346630, pGAMMA_8FJ346631, pGAMMA_23 FJ346632, pGAMMA_24 FJ346633, pGAMMA_34 FJ346634, pGAMMA_35 FJ346635,p GAMMA_36 FJ346636, pGAMMA_37 FJ346637, pGAMMA_38 FJ346638, pGAMMA_41 FJ346639, pGAMMA_42 FJ346640, pGAMMA_43 FJ346641, pK_64 FJ346642, pK_19 FJ346643, pK_23, FJ346644, pK_25 FJ346645, pK_30 FJ346646, pK_41 FJ346647, pK_43 FJ346648, pK_51 FJ346649, pK_69 FJ346650, pK_1 FJ346651, pK_3 FJ346652, pK_7 FJ346653, pK_20 FJ346654, pK_21 FJ346655, pK_34 FJ346656, pK_63 FJ346657, pK_68 FJ346658; BAC end sequences: CH271_0010P22_B1 FJ346579, CH271_0010P22_G1 FJ346580, CH271_0046E21_G1 FJ346581,CH271_0046E21_B1 FJ346582, CH271_0002J18_G1 FJ346583,CH271_0002J18_B1 FJ346584, CH271_0029E05_B1 FJ346585, CH271_0029E05_G1 FJ346586, CH271_0032G05_G1 FJ346587, CH271_0032G05_B1 FJ346588, CH271_0036I10_B1 FJ346589, CH271_0036I10_G1 FJ346590, CH271_0023A02_G1 FJ346591, CH271_0023A02_B1 FJ346592, CH271_0059L04_G1 FJ346593, CH271_0059L04_B1 FJ346594, CH271_0083K12_B1 FJ346595, CH271_0083K12_G1 FJ346596, CH271_0042E17_G1 FJ346597, CH271_0042E17_B1 FJ346598, CH271_0005P01_G1 FJ346599, CH271_0005P01_B1 FJ346600, CH271_0015O18_B1 FJ346601, CH271_0015O18_G1 FJ346602, CH271_0039I23_B1 FJ346603, CH271_0039I23_G1 FJ346604, CH271_0054E20_B1 FJ346605, CH271_0054E20_G1 FJ346606, CH271_0096M12_B1 FJ346607, CH271_0096M12_G1 FJ346608, CH271_0015L04_G1 FJ346609, CH271_0015L04_B1 FJ346610, CH271_0024A08_B1 FJ346611, CH271_0024A08_G1 FJ346612, CH271_0048A09_B1 FJ346613, CH271_0048A09_G1 FJ346614, CH271_0084N15_G1 FJ346615, CH271_0084N15_B1 FJ346616, CH271_0072O18_G1 FJ346617, CH271_0072O18_B1 FJ346618, CH271_0047O10_B1 FJ346619, CH271_0047O10_G1 FJ346620, CH271_0027P04_b1 FJ346621, CH271_0027P04_G1 FJ346622, CH271_0091K12_G1 FJ346623, CH271_0091K12_B1 FJ346624, CH271_0007O15_G1 FJ346625, and CH271_0007O15_B1 FJ346626).
Characterization of marmoset centromeric DNA posed additional challenges due to the considerable sequence divergence of New World monkey α-satellite DNA when compared with human. Although α-satellite DNA is frequently not assembled as part of whole-genome sequencing projects, we previously demonstrated that higher-order α-satellite DNA is well represented in such data sets. We therefore downloaded the marmoset whole-genome shotgun (WGS) sequence library for CJA from NCBI Trace Archive (http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?). We followed the method described in Alkan et al. (2007) to extract and classify alphoid sequences from the WGS data. We first constructed a library of ~171 bp alphoid monomers using α-satellite sequences from previously characterized New World monkeys: Cebus apella (GenBank: L07926), Cebuella pygmaea (L07928), Chiropotes satanas (L07929), Callicebus moloch (L07930), and C. jacchus. The WGS reads that contain alphoid-like sequences were detected by aligning against the α-satellite set using Blast (parameters: −v 10,000 −b 10,000). CJA α-satellite monomeric repeat units were then extracted from the WGS reads using the RepeatMasker tool and clustered into sets where the pairwise divergence between any pairs of sequences within a set is at most 2% (Alkan et al. 2007). We obtained seven different clusters shared in two main branches by this method and constructed the phylogenetic tree of the clustered sequences using ClustalW (fig. 3).
We generated marmoset-specific primers (supplementary table S10, Supplementary Material online) from consensus sequences derived from the seven clusters and tested them by PCR on marmoset and human genomic DNAs. No amplification was detected from human DNA. CJA amplification products were tested in FISH experiments on CJA and HSA, respectively. In CJA, signals were detected on different chromosomes with dissimilar signal intensity at low and high stringency conditions (table 2). The same PCR probes did not generate any signals on human metaphase chromosomal spreads.
The seven PCR products were subcloned (n = 386 clones) and differential insert sizes were identified by PCR. Accordingly, we further grouped the clones into distinct classes (supplementary table S11, Supplementary Material online). A subset of clones representing each class (59/386) was tested by FISH on CJA metaphases and gave different hybridization patterns (fig. 4, supplementary table S11, Supplementary Material online). Homogeneous hybridization patterns were detected among clones grouped in the same class. Neither SF organization was detected nor correspondence between cluster and map location was observed suggesting a complex heterogeneity of α-satellite DNA.
To further investigate the sequence structure of the marmoset centromeres, we sequenced 14 clones (accession numbers are reported at the end of this paragraph) selecting the largest inserts for each cluster. No significant sequence similarity was observed between these 14 sequences and representative human alphoid sequences (X07685, Z12013, and Z12009 and M28031, M28032, and M28033). Similar results were obtained by comparison of our sequences with other known primate centromeric sequences including chimpanzee (L08574 and X97003), gorilla (AJ509823 and M62744), and macaque (X04006).
We analyzed the 14 sequences using Blast 2 Sequences (http://blast.ncbi.nlm.nih.gov/bl2seq/wblast2.cgi). The first 171 bp of each sequence was used as a query versus the complete sequence from which the 171 bp were extracted. The query sequence matched multiple sites in the entire sequence of the subject with a periodicity of 342 bp between high-identity matches (79–99%). Further, 34 171 bp monomers were extracted from the 14 clones sequences, and phylogenetic analysis based on a multiple sequence alignment of the 34 monomers showed that 171-bp units grouped into two distinct clades with 50–60% sequence divergence between them (supplementary fig. S1, Supplementary Material online). The similarity between first and second monomers is much lower than previously reported for human and macaque (40–50% in marmoset vs. 75–89% in human and macaque) (supplementary table S7, Supplementary Material online); therefore, marmoset shows a greater divergence rate between monomers in dimeric structures than other Primates. According to these results, we identified in the marmoset repetitive alphoid units with an ancient and more divergent dimeric structure. Moreover, Southern blot analysis confirmed the 342-bp periodicity with a ladder of hybridizing bands of 342-bp spacing (fig. 5). In the light of these results, we conclude that in CJA α-satellite DNA is organized as units of 342 bp, representing the ancient dimeric structure. We found no evidence of HOR structure among the 342 bp, but our data are limited to the insert size of the subcloned sequences we analyzed (the longest consisted of three consecutive 342-bp units). Unlike human and great-ape genomes, analysis of long-range, end-sequence pair data also did not show the presence of any higher-order α-satellite DNA within the marmoset.
(Insert sequences from plasmid clones: C1.1.19 FJ867326, C1.1.74 FJ867327, C2.1.37 FJ867328, C2.1.65 FJ867329, C2.1.73 FJ867330, C3.1.5 FJ867331, C3.1.8 FJ867332, C4.1.13 FJ867333, C4.2.21 FJ867334, C5.1.12 FJ867335, C6.1.1 FJ867336, C6.1.29 FJ867337, C7.1.3 FJ867338, and C7.1.4 FJ867339).
In the present study, we have analyzed the structure and organization of centromeres in the white-cheeked gibbon (lesser ape) and in marmoset (New World monkey) to gain an insight into centromere satellite organization and evolution. Due to differences in sequence divergence among centromeric sequences in human, gibbon, and marmoset, we used two different approaches to isolate and characterize the centromeric sequences in these species.
We used human sequence–derived degenerate primers to isolate centromeric sequences in NLE. The availability of CENP immunoprecipitation C410 further helped us to prove the ability of our method to identify centromeric alphoid sequences. Comparative FISH analyses moreover confirmed the similarity between NLE α sequences and human centromeres. In particular, the centromeric sequences of human 11, 17 and X chromosomes, grouped in SF 3 (SF3, Jabs and Persico 1987), share the highest degree of similarity with white-cheeked gibbon alphoid sequences as shown by hybridization using the probe α27/α30-NLE. Moreover, our data support the independent evolution of human Y chromosome centromeric sequences: Both α27/α30-NLE and α27/α30-HSA gave signals on all the human chromosomes except Y, showing high divergence between Y and the rest of human centromeric sequences. Therefore, the Y-specific variants of α DNA previously described (Wolfe et al. 1985) likely diverged from the common branch of α DNA before the formation of other chromosome-specific variants (Alexandrov et al. 1988).
We further confirmed by Southern blot and sequence analysis that the 171-bp monomeric unit in NLE, as revealed by HaeIII restriction enzyme digest, lacked any HOR. High-order organization is reduced in complexity in chimpanzee (Alkan et al. 2007) and orangutan (Haaf and Willard 1998), supporting the idea that this highly organized structure reached its most complexity within the human lineage.
The chromosomal localization of alphoid sequences we found in N. leucogenys is quite intriguing. The clones we obtained showed three distinct localizations: centromeres, both centromeres and telomeres, and centromeres/telomeres/interstitial regions. However, no relevant differences were found at the sequence level and no common sequence motif was observed between clones sharing the same map location. These results support the idea that organization and repetition of sequences are crucial to defining the chromosomal localization more than the sequence itself.
Previous studies carried out in Hylobates and Symphalangus showed a quite different pattern on chromosomal distribution of DAPI-positive heterochromatin between these two genera of gibbons. Terminal, interstitial, and paracentric bands have been reported for Hylobates, whereas no interstitial heterochromatin have been detected in Symphalangus (Wijayanto et al. 2005). Our results showed a much more extent of heterochromatin accretion in white-cheeked gibbon compared with Hylobates and Symphalangus genera and disclose the alphoid nature of interstitial and terminal heterochromatin in Nomascus. Taking into consideration this point, the alphoid centromeric/telomeric signals could represent exchange between centromeric and telomeric repetitive elements occurred during the speciation of the N. leucogenys, as reported for duplicons in human (Bailey et al. 2002).
Further detailed mapping comparison between our α-satellite clones with interstitial signals and chromosomal EBs in white-cheeked gibbon (http://www.biologia.uniba.it/gibbon, Roberto et al. 2007) by cohybridization experiments have shown a clear association between them. In particular, NLE chromosomes 3, 5, 9, and 14 showed interstitial signals that overlapped to the EBs specific of NLE previously reported by Roberto et al. (2007) for these chromosomes (fig. 2E). The presence of segmental duplications and various classes of repetitive elements, such as LINE L1, has been recently reported in NLE chromosomal rearrangement breakpoints, suggesting a more complex rearrangement mechanism than simply nonallelic homologous recombination or nonhomologous end joining (Carbone et al. 2006b; Girirajan et al. 2009). According to our findings, the interstitial alphoid regions we detected on chromosomes 3, 5, 9, and 14 could represent a wider accretion of repetitive elements in rearrangement breakpoints thus underpinning a common destiny of the evolution of these regions. In particular, the clustering of repetitive elements in these regions could represent “scars” of evolutionary translocations or inversions occurred during the evolution of N. leucogenys. Even if we found the clustering of alphoid sequences at four EBs, because this is a N. leucogenys specific pattern, it cannot address the general question of the high evolutionary rate of breakpoints in the group of gibbons.
Due to the greater sequence divergence in marmoset, species-specific sequences from WGSSs were obtained and analyzed in detail. Our sequencing data show that the CJA unit is 342 bp in length without any HOR organization, which is in agreement with the satellite organization reported previously for three New World species: C. satanas, Pithecia irrorata, and Cacajo melanocephalus (Alves et al. 1994). Further, in C. satanas and C. melanocephalus, the monomeric repeat unit is 550 bp, whereas in Pithecia, similarly to CJA, the 340 bp monomer accounts for a substantial proportion of the satellite mass. Because the evolutionary distance between CJA and Pithecia (von Dornum and Ruvolo 1999), it can be supposed that in the NWM group, the monomeric ancestral unit was a 340-bp unit that evolved to a 550-bp unit in Chiropotes and Cacajao. However, the absence of HOR in CJA cannot be ruled out, as large contiguous sequences have not yet been generated. Based on the insert sizes of our clones (c2.1.73, 1,142 bp), we could not detect any HOR greater than three monomeric units.
In humans and other studied Primates, the α-satellite unit size has been reported as 171 bp (Rudd et al. 2006; Alkan et al. 2007). This unit has been variously organized during the course of primate evolution, creating human-specific HORs, monomeric gibbon structure, or dimeric structures as in macaque. In Callithrix, it is likely that two of these ancestral monomers fused generating the specific ancient dimer in NWM and no further homogenization occurred between monomers so generating the actual highly divergent dimers.
In the light of our and all published data, we propose a complex model for primate satellite evolution involving genomic amplification, unequal crossover and sequence homogenization. Starting with a 171-bp basic monomeric repeat unit, the centromeric α-satellite evolved by amplification, acquiring increasingly complex genomic structures. In the Platyrrini lineage, two 171-bp units were firstly amplified in dimeric unit and later the two monomers in the same dimer began to acquire differences by the decrease of sequence homogenization, thus forming the specific New World monkeys dimeric repeat unit (~342 bp). In the Catharrini ancestor, the 171-bp unit continued to amplify and undergo unequal crossover and homogenization thus forming the dimeric structure common to all the centromeres as reported in macaque, baboon, and African green monkey (Musich et al. 1980; Pike et al. 1986). In contrast within the anthropoid lineage, the 171 bp monomer amplified and differentiated in monomeric structure (gibbon and orangutan, present work and Haaf and Willard 1998) or higher-order organization as reported in human (Willard et al. 1989; Arn and Jabs 1990). In any case, the sequence and the detailed organization differ, with the basic 171-bp repeat unit being the only common theme, supporting the notion that centromeric function is linked to relatively short repeated elements, more than sequence specific units (supplementary fig. S2, Supplementary Material online). These data can, moreover, support the idea that also neocentromeres can seed in a repetitive reach DNA domain lacking satellite DNA (Ventura et al. 2004, 2007). The comparison of the evolutionary history of the primate centromeres with other mammalian genomes will likely provide even more insights.
This work was supported by MIUR (Ministero Italiano della Universita’ e della Ricerca; Cluster C03, Prog. L.488/92) and European Commission (INPRIMAT, QLRI-CT-2002-01325) are gratefully acknowledged for financial support. This work was also supported in part by R01 GM058815 (to E.E.E.). E.E.E. is an investigator of the Howard Hughes Medical Institute. Mammalian cell lines were kindly provided by the Cambridge Resource Centre.