Centromeric DNA is highly variable among species, and yet, it is essential for chromosome segregation. The proteins that form the kinetochore machinery must interact with this DNA for proper centromere function. The dynamic nature of this relationship is thought to result from female meiotic drive, in which the expanding centromere sequences act selfishly to exploit the asymmetric production of female gametes (
Zwick et al. 1999). To balance the resulting skewed transmission of genomic loci, kinetochore proteins adapt to ameliorate drive by restoring epigenetic control of centromere function and, thus, the random segregation of chromosomes (
Henikoff et al. 2001;
Malik and Henikoff 2002;
Dawe and Henikoff 2006;
Malik and Bayes 2006).
Comparative genome sequencing can be used to identify genomic regions that have been conserved or have diverged throughout evolution. Broad comparisons of genome sequences from species residing on distant branches of the evolutionary tree can reveal loci that have remained relatively unchanged over time (
Pennacchio and Rubin 2001). Conservation implies essential (or universal) function, and such sequences are thought to be constrained by negative selection. Conversely, genomic regions associated with elevated rates of divergence among closely related species indicate positive selection (
Boffelli et al. 2003). We have applied these principles to perform evolutionary analyses of the genomic regions containing the foundation kinetochore protein genes
CENP-A,
-B, and
-C. These proteins provide part of the interface between centromeric DNA and the outer kinetochore (
Amor et al. 2004) and, as such, are potential mediators of meiotic drive (
Henikoff et al. 2001;
Malik and Henikoff 2002;
Dawe and Henikoff 2006;
Malik and Bayes 2006).
By generating and analyzing orthologous genomic sequences from a diverse set of primates, we have, for the first time, demonstrated positive selection acting on mammalian CENP-A. This protein is a histone H3 variant that plays a central role in the centromere-specific nucleosome (
Allshire and Karpen 2008). Prior studies found positive selection acting on Drosophila (
Malik and Henikoff 2001;
Malik et al. 2002) and Arabidopsis (
Talbert et al. 2004) homologs of human CENP-A. However, a broad evolutionary comparison of mammalian (human, chimpanzee, mouse, rat, and bovine) CENP-A homologs failed to reveal evidence of positive selection (
Talbert et al. 2004).
Throughout evolution, the CENP-A histone-fold domain has been highly conserved, unlike the highly variable (with respect to length and sequence) N-terminal tail (
Yoda et al. 2000;
Henikoff et al. 2001). Although very little data exist regarding posttranslational modification of CENP-A (
Zeitlin et al. 2001), extensive characterization of the closely related histone H3 protein has identified modifications of one threonine, seven lysine, four arginine, and two serine residues within the N-terminal tail (
Kouzarides 2007). Interestingly, the identity and modification status of only one of these sites (serine 10 in histone H3 and serine 7 in CENP-A) is conserved in the CENP-A protein. In fact, at each of the other modified histone H3 positions, a different amino-acid residue is present and predicted to be modified in CENP-A.
The major feature of CENP-A evolution highlighted by our comparative analyses is the presence of species-specific DNA sequences, especially in the N-terminal tail. Such variation affects potential posttranslational modification sites and points to the intriguing possibility that each species has a unique combination of centromeric DNA and CENP-A protein sequence. This feature of CENP-A evolution supports both an ongoing genetic conflict at the centromere that is linked with speciation (
Henikoff et al. 2001) and epigenetic compensation for rapidly evolving DNA (
Dawe and Henikoff 2006).
Homologs of human CENP-C have been shown to be subject to positive selection in all species examined, including some mammals (
Talbert et al. 2004). Residues under positive selection in mammalian CENP-C were shown to lie within the central DNA-binding region. Interestingly, we found signatures of positive selection throughout the CENP-C protein; in fact, each region of this protein that has been previously demonstrated to be functionally important was found to contain residues under positive selection.
Although we did not detect evidence of positive selection acting on CENP-B, our analyses highlight an intriguing relationship between CENP-B and CENP-C. The CENP-C-interaction domain of the CENP-B protein is highly variable among primate species, yet the rest of the protein is otherwise highly conserved. Positioned in the central portion of the protein, the CENP-C-interaction domain appears to have been subjected to numerous insertion and deletion events throughout evolution. CENP-B binds to DNA via its N-terminus and forms homodimers at its C-terminus. It is thus intriguing that the one evolutionarily dynamic region of CENP-B represents the portion of the protein that interacts with other kinetochore components.
Species-specific length variation within the central domain of the CENP-B protein may enable the resulting dimer to “reach” binding sites within alpha-satellite DNA that are uniquely positioned within each species. Emergence of the CENP-B box within alpha-satellite DNA 15–25 Ma in the primate lineage (
Haaf et al. 1995) has been followed by continued evolution of the frequency and organization of CENP-B boxes within centromeric regions (
Schueler et al. 2001,
2005). Recent coevolution of CENP-B and CENP-C (or other kinetochore proteins) may account for improved artificial chromosome formation by alpha-satellite DNA that contains CENP-B boxes versus that which lacks them (
Harrington et al. 1997;
Ikeno et al. 1998;
Masumoto et al. 1998;
Ohzeki et al. 2002). Although CENP-B does not appear to be necessary (
Hudson et al. 1998;
Kapoor et al. 1998;
Perez-Castro et al. 1998) or sufficient (
Sullivan and Schwartz 1995;
Sullivan and Willard 1998) for centromere activation, it may have recently evolved an important role in current centromere function. CENP-B-binding sites may represent the most recent efforts of “selfish centromeric DNA” to gain genetic control. In the ongoing conflict between genetic and epigenetic control of centromere function (
Dawe and Henikoff 2006), kinetochore proteins may continuously be evolving to compensate for improved centromere function via the emergence of new CENP-B-binding sites within centromeric DNA.
In summary, our comparative genomic studies provide new insights relevant to the evolution of primate centromeres and the epigenetic mechanisms controlling chromosome transmission. The latter involves a complex cellular choreography, with centromeric DNA and the kinetochore proteins that associate with it being central elements. Studying the evolution of these elements continues to reveal many interesting species- and lineage-specific findings. Further understanding the basis for these evolutionary changes may provide valuable clues about centromere function and, perhaps, speciation.