PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
 
Cell. 2010 September 3; 142(5): 810–821.
PMCID: PMC2982257

The Protein Composition of Mitotic Chromosomes Determined Using Multiclassifier Combinatorial Proteomics

Summary

Despite many decades of study, mitotic chromosome structure and composition remain poorly characterized. Here, we have integrated quantitative proteomics with bioinformatic analysis to generate a series of independent classifiers that describe the ~4,000 proteins identified in isolated mitotic chromosomes. Integrating these classifiers by machine learning uncovers functional relationships between protein complexes in the context of intact chromosomes and reveals which of the ~560 uncharacterized proteins identified here merits further study. Indeed, of 34 GFP-tagged predicted chromosomal proteins, 30 were chromosomal, including 13 with centromere-association. Of 16 GFP-tagged predicted nonchromosomal proteins, 14 were confirmed to be nonchromosomal. An unbiased analysis of the whole chromosome proteome from genetic knockouts of kinetochore protein Ska3/Rama1 revealed that the APC/C and RanBP2/RanGAP1 complexes depend on the Ska complex for stable association with chromosomes. Our integrated analysis predicts that up to 97 new centromere-associated proteins remain to be discovered in our data set.

Keywords: CELLBIO, PROTEINS, SYSBIO

Abstract

Graphical Summary

An external file that holds a picture, illustration, etc.
Object name is fx1.jpg

Highlights

► Method to define the protein composition of a complex nonpurifiable organelle ► Combines genetics and proteomics to study complexes in whole chromosomes ► Use of machine learning to uncover functional relationships between proteins ► Comprehensive list of >4000 mitotic chromosome-associated proteins

Introduction

As cells enter mitosis, chromosomes undergo a remarkable series of physiological and structural transformations known as chromosome condensation. This process involves individualization of the chromosomal territories to create the characteristic mitotic chromosome morphology and maturation of the kinetochores so that chromosomes can align and segregate on the mitotic spindle. Our understanding of the mechanisms underlying chromosome condensation is still fragmentary. These processes can be fully understood only when all components of mitotic chromosomes have been identified and functional relationships between them determined. We have developed a new approach that we term multiclassifier combinatorial proteomics (MCCP) to do this.

The current list of mitotic chromatin proteins reported in proteomic studies is surprisingly short. Early analyses described 62 and 79 proteins, respectively, in mitotic chromosome scaffolds (Morrison et al., 2002; Gassmann et al., 2005). A later study identified > 250 proteins that bound to sperm chromatin in Xenopus egg extracts in vitro, revealing the kinetochore protein Bod1 (Porter et al., 2007). Other studies identified ~240 proteins, subsequently corrected to roughly 50 bona fide putative structural proteins (Uchiyama et al., 2005; Takata et al., 2007). In a targeted study, 98 proteins were identified as shared in isolated telomeres from wild-type and ALT cells (Dejardin and Kingston, 2009). Despite these efforts, currently available proteomics reports miss a significant fraction of known mitotic chromosomal proteins, particularly kinetochore components.

Biochemical analysis of important chromosome substructures such as kinetochores is extremely challenging. The kinetochore is one of the most complex cellular substructures (Cheeseman and Desai, 2008), with over 120 constituents described by a range of approaches (Earnshaw and Rothfield, 1985), including targeted proteomic studies (Obuse et al., 2004; Foltz et al., 2006; Okada et al., 2006; Hori et al., 2008; Amano et al., 2009). Biochemical dissection of kinetochores is complicated by the fact that it is not known to what extent the constituent protein complexes can be recovered in soluble form from chromosomes with their relevant intermolecular associations intact. As described here, those problems can be circumvented by purifying and analyzing whole mitotic chromosomes.

Purifying large cellular structures or organelles free of contaminants is virtually impossible. Genuine components have been distinguished from contaminants in such preparations by subtractive (Schirmer et al., 2003) or quantitative (Foster et al., 2003) proteomics by determining the difference between two near-identical fractions, one enriched and the other depleted of the target structure. In protein correlation profiling, a set of known components was used to define a common intensity profile across neighboring biochemical fractions from sucrose gradients during purification of organelles and this was used to select other proteins that show a similar profile (Andersen et al., 2003). These methods do not recognize cellular background proteins that adhere to the structure of interest due to nonspecific hydrophobic or electrostatic interactions. This type of contamination is particularly relevant for vertebrate mitotic chromosomes.

In the present study, we identified approximately 4000 polypeptides in highly purified chromosomes. We developed a statistical approach for analysis of proteomic data to confirm which known and uncharacterized proteins from this long list are chromosomal. An experimental test of our method led to the identification of 32 chromosomal proteins, including 13 kinetochore-associated proteins. Key to our analysis is the innovative use of stable isotope labeling with amino acids in cell culture (SILAC) (Ong et al., 2002), plus development of a framework to integrate data from multiple classifiers, including nonproteomic classifiers, to reveal proteins of interest and determine functional relationships between protein complexes in the context of whole chromosomes.

Results

Identification of the Proteome of Isolated Mitotic Chromosomes

We isolated mitotic chromosomes from chicken DT40 cells by a refinement of the polyamine method (Lewis and Laemmli, 1982) (Figures 1A and 1B). These preparations are negative for porin in immunoblots (Figure 1C), indicating that mitochondrial contamination (common in chromosome preparations) is low.

Figure 1
Proteomic Analysis of Mitotic Chromosome Proteins

Proteomic analysis (Cox and Mann, 2008; de Godoy et al., 2008) of 250 μg of total chromosomal protein identified 4029 proteins in 28 functional categories (Figure 1D and Table S1 available online), including essentially all previously described chromosomal proteins.

When vertebrate cells enter mitosis, the nuclear envelope breaks down and chromosomes are newly exposed to cytoplasmic proteins, organelles and cellular membranes. Since highly positively charged histones contribute ~38% of the chromosome mass and an equivalent amount is highly negatively charged DNA, many charged nonchromosomal proteins exhibit strong adventitious binding to chromosomes. These “hitchhikers” differ from conventional contaminants (e.g., mitochondria), as they are physically associated with the chromosomes before cell lysis and apparently cannot be separated by conventional purification protocols. Thus, the 1331 cytoskeletal, cytoplasmic, mitochondrial, membrane and receptor proteins found in our preparations, may be physically associated with chromosomes following nuclear envelope disassembly, but many not be functionally relevant.

A Classifier Approach to Identify Genuine Mitotic Chromosome Proteins

The presence of hitchhiker proteins complicates the definition of what constitutes a “true” chromosomal protein, as well as the design of biochemical control experiments. For example, comparing mitotic to interphase chromatin is of limited use, since the latter is shielded from the cytoplasm by the nuclear envelope. In such a comparison, cytoplasmic hitchhiker proteins would be scored as mitosis-specific chromosomal proteins.

Here, we describe an approach to study the complex chromosomal proteome that both identifies proteins that merit further study and reveals functional relationships between all chromosomal proteins. We quantify the chromosomal association of each protein in a series of quantitative proteomics experiments, mostly using SILAC technology (Ong et al., 2002). Each experiment provides an independent measure of a protein's association with mitotic chromosomes, which we term a “classifier.” Integration of the data obtained with all classifiers enables us to detect patterns in the behavior of groups of proteins that reveal shared membership in protein complexes as well as functional dependency relationships.

The experimental protocols that define the five proteomic classifiers are shown in Figure 1E and all classifiers are summarized below.

Classifier I: Abundance estimation

To estimate the amounts of individual proteins in mitotic chromosomes, we used an established protocol (Rappsilber et al., 2002; Ishihama et al., 2005, 2008) to calculate a scaled protein abundance index based on the number of peptides observed and the number of times that each peptide is observed (spectral count) for each protein. This calculation and its validation are discussed in Extended Experimental Procedures.

In the conventional pie chart of Figure 1D, all proteins are weighted equally, independent of their actual abundance in isolated chromosomes. A more informative representation of chromosome composition is obtained by normalizing each class by its mass, obtained by multiplying the estimated abundance by the predicted molecular mass of each protein (Figure 1F). As expected, histones comprise the bulk of mitotic chromosomal protein (48%). Overall 68% of the protein mass is annotated as chromosomal.

Classifier II: Enrichment in Chromosomes

We expected core chromosomal components like histones or structural proteins would be more abundant in isolated chromosomes than in cytoplasmic extracts. The reverse would be true of background proteins. We therefore mixed isolated chromosomes from mitotic DT40 cells grown in light medium with an equal mass of protein from post-chromosomal extracts of parallel cultures grown with heavy SILAC medium (Figure 1E). Classifier II was calculated as the ratio of light/heavy peaks for each protein. Among the 20% most enriched proteins, chromosomal proteins outnumbered nonchromosomal proteins by 3 to 1. Conversely, among the 20% least enriched proteins, background proteins outnumbered chromosomal proteins 6 to 1.

Classifier III: In Vitro Exchange on Chromosomes

We ranked proteins based on their ability to stably bind to chromosomes during an incubation in cytosol. A crude light chromosome fraction obtained by gentle centrifugation was mixed with an excess of heavy post-chromosomal extract and incubated to allow proteins to exchange at 14°C for 30 min (Figure 1E). The chromosomes were then subjected to rigorous purification. All heavy proteins identified must have bound to the chromosomes during the incubation in vitro. Classifier III is the light/heavy ratio for each protein identified in this experiment.

The most stable chromosomal-associated proteins were histones (average classifier III = 63), topoisomerase IIα (classifier III = 61) and condensin I (average classifier III = 59). Interestingly, condensin II was more exchangeable (average classifier III = 14). In contrast, the ratios for ~75% of ribosomal proteins were in the range from 0.45 to 3.0 (Figure 1G), indicating significant binding to chromosomes from cytosol during the incubation.

Although we sought to optimize purity rather than preserve functionality of chromosomes, the exchange experiment revealed that at least one aspect of kinetochore function was retained in purified chromosomes. Similar to one recent study (Kulukian et al., 2009), kinetochores of the purified chromosomes can recruit spindle checkpoint proteins Mad2, Bub3, and BubR1 from cytosol (Figures 1H and 1I) but not Mad1 (Figure 1J).

Classifier IV: SMC2 Dependency

We used a conditional genetic knockout of SMC2 in DT40 SMC2ON/OFF cells (Hudson et al., 2003) to compare the composition of mitotic chromosomes formed in the presence or absence of condensin, which is required for structural integrity of mitotic chromosomes. DT40 SMC2ON cells were cultured in SILACheavy medium. To obtain chromosomes depleted of condensin, cells grown in SILAClight medium were cultured with doxycycline for 30 hr to shut down SMC2 expression prior to the nocodazole block (SMC2OFF). Equal numbers of mitotic cells from the two different populations were mixed and mitotic chromosomes isolated (Figure 1E). Classifier IV is the heavy/light ratio (SMC2ON/SMC2OFF) for each of the proteins identified in this experiment.

SMC2-depleted chromosomes contained 5.8% of the wild-type level of SMC2. It is not possible to isolate chromosomes from cultures completely lacking SMC2 as SMC2 is an essential gene and dead cells do not accumulate in mitosis. These chromosomes were similarly depleted of all condensin I and II subunits.

Classifier V: Ska3/Rama1 Dependency

To demonstrate the targeted analysis possible with our approach, we compared the association of kinetochore proteins with chromosomes in cells with or without Ska3/Rama1/C13orf3. Ska3/Rama1, which was identified in this analysis as a chromosomal protein, was described in several recent publications (Daum et al., 2009; Gaitanos et al., 2009; Raaijmakers et al., 2009; Theis et al., 2009; Welburn et al., 2009).

To obtain chromosomes depleted of Ska complex, we isolated a genetic knockout of the Ska3/Rama1 gene (Figures S5A and S5B). Ska3/Rama1−/− cells (homozygous knockouts are viable) were grown in SILAClight medium (Figure 1E). Equal numbers of Ska3/Rama1−/− and wild-type DT40 (cultured in SILACheavy medium) mitotic cells from the two different populations were mixed and mitotic chromosomes isolated. Classifier V is the heavy/light ratio (wild-type/Ska3/Rama1−/−) for each protein identified in this experiment.

Figure S5
Generation of a Knockout Cell Line and Phenotypic Analysis of Ska3/RAMA1, Related to Figure 5

Classifier VI: Domain Analysis

We added an additional nonproteomic classifier to our analysis using the protein domains found in chromosomal and nonchromosomal proteins (red/pink and green wedges in Figure 1D). This made use of bioinformatic analysis in order to segregate chromosomal from nonchromosomal proteins, but importantly did not consider a protein's relevance to mitosis. We counted how often each domain was observed in chromosomal and nonchromosomal proteins and assigned it this frequency as a score (Table S2). Classifier VI was then determined for each protein based on the sum of its domain scores.

Multiclassifier Combinatorial Proteomics

Traditional one-dimensional analysis (e.g., sorting the various proteins according to their value for each classifier) was of limited utility, as the data lacked a clear boundary between chromosomal (red/pink in Figure 2A) and nonchromosomal proteins (green in Figure 2A) for each classifier (Figure 2A and Figures S2A–S2E).

Figure 2
Combining Classifiers Increases Specificity
Figure S2
Smoothed Histograms Showing the Distribution of Proteins Across Classifiers and RF Analysis, Related to Figure 2

By contrast, when classifiers were combined, our ability to identify chromosomal proteins was vastly improved. For example, an enrichment of centromeric or chromosomal proteins relative to nonchromosomal proteins was obtained when classifiers I (abundance) and II (enrichment) were plotted (Figures 2C–2E). The clustering of protein complex subunits in this plot (Figure 2D) reflects both their relative stoichiometry (x axis), and the similar degree to which subunits in a complex are all present either on or off chromosomes (y axis). Members of the APC/C, Ndc80 and Mis12 complexes form closely knit clusters. It is important to note that this was achieved in the context of entire chromosomes and without requiring solubilization of the complexes.

We used random forest (RF) analysis, a machine learning approach, to progress beyond two-dimensional analyses and integrate the information present in all proteomics classifiers. This analysis offered two powerful benefits. First, it enabled us to work with data sets that contain missing values. This is a significant advantage in proteomics studies where not every protein is observed in every experiment, as seen in Figure 2A and Figure S2F. Second, RF analysis allowed us to use any descriptor of our proteins as a classifier and integrate it into our overall analysis. Here, we also included a bioinformatic analysis of the distribution of protein domains in our analysis distinguishing chromosomal from nonchromosomal proteins (classifier VI).

In brief, RF is a decision tree analysis that separates data sets into “true” and “false” groups. The decision trees are trained on defined data sets and randomly built to optimize the separation between them. Analysis of the experimental data set then occurs by running each protein through all trees and adding up its overall RF score (i.e., the fraction of trees that scored it as “true”). RFs perform much better on training data than application data, so their performance is evaluated by ten-fold cross-validation. The training data are split into random sections of 90% for training and 10% for evaluation, so that successively the entire set is used for evaluation. Here, the two training data sets chosen were “nonchromosomal” (green wedges in Figure 1D) and “chromosomal” (red + pink wedges in Figure 1D), and the RF score for a given protein is the fraction of trees that scored it as “chromosomal.”

RF analysis readily discriminated chromosomal from nonchromosomal proteins. In the RF rug plot of Figure 2A, which represents the ranked list of proteins generated by RF analysis, the left side is predominantly red, while the right side is predominantly green. To reach the 500th chromosomal protein on the RF-ranked list only 229 nonchromosomal proteins are included (Figure 2B). In contrast, 410-671 nonchromosomal proteins would be included when considering ranked lists from individual classifiers. Note that the RF-based sorting was done on the complete data set, including proteins that failed to be observed with some classifiers. Therefore, adding information from additional experiments did not decrease the number of proteins covered.

The advantage of combining classifiers can be statistically expressed by ROC curves (Figure S3A), with increased area under the curve (AUC) for our combined analysis when compared to each of the individual classifiers (AUCRF(cI-V) = 0.81, AUCscI-V = 0.41-0.76). The combined classifiers assigned 88.8% of our gold standard, the 125 centromere proteins, correctly as chromosomal, at the cut-off that minimizes misassignment of chromosomal versus nonchromosomal proteins. This specificity was further improved when bioinformatic domain analysis (cVI) was integrated with our proteomic classifiers (AUCRF(cI-VI) = 0.97; identification of the first 500 chromosomal proteins yields 17 nonchromosomal proteins, Suppl. Figures S3B and S3C, proteins lacking known domains are excluded from this boost, Figure S3D). Now, 92.4% of the centromere proteins were assigned as chromosomal. In summary, RF analysis provides us with a tool for productively combining the outcomes of our individual proteomics classifier experiments and further empowering our analysis by including data from other sources.

Figure S3
The Performance of the Random Forest, Related to Figure 3

If results of a random forest based on the five proteomic classifiers plus the bioinformatics-based classifier VI were plotted against those from the initial random forest analysis (Figure 3A), a near-perfect separation of the training data was achieved. Only a single chromosomal protein of the training set and two nonchromosomal proteins were misassigned when placing manually a separation line.

Figure 3
Random Forest Analysis Predicts New Proteins of Interest

Using ten-fold cross-validation, we found that 118 centromere proteins positioned right and only 7 left of the separation line (Figure 3B). This compares to 14 and 9 centromere proteins being missed when using the one-dimensional ranked lists by classifiers I-V and I-VI, respectively. Accepting the line as a threshold returns known centromere proteins with a yield of 94.4% and all other chromosomal proteins with 93.1% success. In contrast, 83.1% of nonchromosomal proteins are rejected. Thus, the classifier approach is sufficiently powerful to suggest chromosomal proteins from among hitherto uncharacterized proteins.

Identification of New Chromosomal Proteins

To test the predictive power of our RF analysis, we cloned and tested the location in mitosis of 50 previously uncharacterized proteins including 15 without known domains: 34 predicted to be and 16 predicted not to be on chromosomes in mitosis.

Reasoning that important proteins would be conserved, we expressed GFP-tagged human homologs of these chicken proteins in U2OS cells. Remarkably, 30 of 34 cloned proteins from the chromosomal region were confirmed as chromosomal, contrasting with only 2 of the 16 predicted ab initio to be nonchromosomal. This confirms the power of our analysis and indicates a success rate of 88%, with 44 of 50 tagged proteins localizing as predicted. Of 50 newly cloned proteins, 13 were associated with kinetochores in mitosis, 12 had a more general distribution on mitotic chromosomes and 7 others were perichromosomal, a class whose new members we propose to term chromosome periphery proteins (cPERPs A-G) (Figures S4B–S4G and Table S3). The chromosome periphery (perichromosomal layer) is enriched in ribosomal and nucleolar hitchhiker proteins, and is of unknown function (Van Hooser et al., 2005).

Figure S4
Chromosomal Localization of Novel Proteins Identified in This Study, Related to Figure 4

The new centromere proteins all appeared to localize to the outer kinetochore, relative to CENP-C and HEC1 as standards (Figures 4B–4J, Figure S4A). In keeping with established nomenclature, we propose to name these proteins CENP-Y, CENP-Z and CENP-27 through CENP-33. Beyond ‘Z’, the 26th letter of the basic modern Latin alphabet, we propose to designate the new proteins with numbers starting with CENP-27.

Figure 4
Cluster Analysis and Imaging of New Kinetochore Proteins

Functional Analysis of New Kinetochore-Associated Proteins

We focused our initial functional analysis on kinetochore proteins. Clustering analysis (Gentleman et al., 2004) allowed us to combine data for proteins identified by all classifiers, and look for informative groupings. This revealed a striking tendency for functionally related proteins to form clusters, as exemplified by members of the NDC80, CPC, Nup107-160 and APC/C complexes (Figure 4). Interestingly, our clustering sorted CENP-27 as a component of the APC/C. This was confirmed and the protein named APC16 in three recent reports (Hutchins et al., 2010; Kops et al., 2010; Hubner et al., 2010). To further test the predictive value of cluster analysis for proteomic data sets, we examined two kinetochore proteins in greater detail.

Ska3/Rama1 and Functional Analysis of Kinetochore Subcomplexes

C13orf3 was located adjacent to Ska2 in the cluster analysis. This protein now known as Ska3/Rama1 has been suggested to be involved in microtubule attachment to kinetochores (Gaitanos et al., 2009; Raaijmakers et al., 2009; Welburn et al., 2009) or coordination of the spindle checkpoint response (Daum et al., 2009; Theis et al., 2009). We analyzed the kinetochore proteome in the presence or absence of Ska3/Rama1 (defined as classifier V) in order to determine the role of this protein in kinetochore structure (Figure 5).

Figure 5
Ska3/Rama1 Dependency for Chromosomal Association and Analysis of CENP-27/APC16

A map of the Ska3/Rama1 locus in DT40 is shown in Figures S5A and S5B, together with a targeting strategy for inactivating the gene. The Ska3/Rama1 gene is not essential for life in DT40 cells (Figure S5C). However, these cells struggle to achieve a normal chromosome alignment, and show a ~3× increase in mitotic index (Figure S5D and S5E), a ~3× increase in the percentage of apoptotic cells and a ~6x increase in the number of bi-nucleated cells.

Proteomic analysis of isolated chromosomes revealed that loss of Ska3/Rama1 was accompanied by the loss of Ska1 and Ska2. Loss of the Ska complex caused no systematic changes in the chromosomal association of proteins of the constitutive centromere-associated network (CCAN), Knl-1/Mis12/Ndc80 (KMN), Mis18, Ndc80, CPC, and Nup107-160 kinetochore subcomplexes. However, striking changes were seen in the levels of the APC/C, RanBP2/RanGAP1, spindle checkpoint, Rod/Zw10/Zwilch (RZZ), and dynein/dynactin complexes. We confirmed that the RanBP2/RanGAP1 complex is indeed depleted from kinetochores when Ska3/Rama1 is deleted in HeLa cells (Figure S5F and S5G). Attempts to confirm the specific kinetochore depletion of the APC/C were uninformative, as we were unable to reproducibly obtain kinetochore staining for the APC/C in HeLa cells using four independent antibodies.

We conclude that combining genetic and SILAC analysis provides a powerful new method for analysis of multicomplex protein superstructures.

A Protein Involved in Chromosome Alignment and Spindle Organization

A second new kinetochore-associated protein, CENP-32/C9orf114, was sorted on our kinetochore cluster diagram next to CLASP1 and CLASP2, two paralogues known to be involved in the regulation of microtubule dynamics. Like CLASPs, CENP-32/C9orf114 mapped to the outer kinetochore (Figure 6A), and its depletion caused a significant accumulation of cells in later prometaphase (Figure S6A) with misaligned chromosomes (Figure 6B). These cells frequently had bipolar spindles, however 60% of those spindles exhibited remarkable abnormalities where centrosomes appeared to have detached from the poles (Figures 6C and 6D and Figure S6B). In one remarkable case, the centrosomes appeared at the midzone of a bipolar spindle (Figure 6C9-12).

Figure 6
Initial Characterization of CENP-32
Figure S6
Characterization of CENP-32, Related to Figure 6

Discussion

Multiclassifier Combinatorial Proteomics

The approach described here for analysis of the proteome of vertebrate mitotic chromosomes can be used to study any complex proteome. The key approach of combining classifier data can in principle be expanded indefinitely and can include nonproteomics data sets such as the bioinformatic protein domain analysis used here. Other classifiers that could be used in the future include microarray, protein interaction (e.g., two hybrid screens or pull-down), protein phosphorylation, and localization data and, indeed, data from any experimental approach in which the proteins of interest are sorted systematically.

We first showed that plotting pairs of classifiers against one another improved our ability to delineate potential chromosomal proteins. As that approach could not be generalized when the number of classifiers exceeded three, we adopted a random forest (RF) analysis approach. This allowed us to integrate information from all classifiers into decision trees on which known and unknown proteins could be classified. Importantly, RF analysis handles missing values systematically. This is crucial when not every protein is observed in every experiment. In contrast, cluster analysis, which has been used both in this study and in other recent work (Theis et al., 2009; Neumann et al., 2010), can only integrate data for proteins that have a value for every classifier.

Integrity of the Isolated Chromosomes

Our methods focused on optimizing the purity of the chromosomes. Thus, our list of proteins is likely to represent the minimal, stably associated components of mitotic chromosomes. Nonetheless kinetochores of isolated chromosomes retain some function, as judged by their ability to recruit components of the mitotic checkpoint complex from cytoplasm. This may be because chromosomes were isolated from nocodazole-treated cells, with kinetochores actively engaged in spindle checkpoint signaling.

New Insights into Kinetochore Functional Organization

Remarkably, although no biochemical enrichment for centromeres was performed, our data set contained all known centromeric subcomplexes, with peptides from 125 reported centromere proteins (eight present as multiple isoforms) (Table S1). We identified all members of the CCAN, KMN, Mis12 and Mis18 complexes and all members of the RZZ complex except Zwint (which is not yet annotated in the chicken genome). Our success in identifying centromere and telomere proteins may be explained because 66 of the 78 chicken DT40 chromosomes are microchromosomes whose purification provides a natural enrichment for centromeres and telomeres since the chromosome arms are so short.

We combined genetics with whole proteome analysis in order to identify complexes and structural dependencies in their “native environment” (e.g., kinetochore proteins in actual kinetochores). Chromosomes lacking Ska3/Rama1 were depleted for the entire Ska complex, confirming that these three proteins are interdependent for chromosome binding. Similarly, depletion of key condensin subunit SMC2 caused a loss of all seven subunits of the condensin I and II complexes from chromosomes. Importantly, this analysis did not require tagging of any proteins or attempts to solubilize functional complexes from large subcellular structures.

In addition to these primary effects of depletion, loss of Ska3/Rama1 also caused a significant secondary depletion of the APC/C and RanBP2/RanGAP1 complexes from chromosomes but had no consistent effect on most other kinetochore proteins. Importantly, all members of the secondary-depleted complexes also behaved coordinately. Our results (e.g., the behavior of CENP-27/APC16 as a component of the APC/C) confirm the utility of single protein depletion analysis for the identification of protein complexes and determination of their mutual interdependencies for association with chromosomes.

Our data suggest that the Ska complex may provide a docking site for the APC/C in the outer kinetochore. Alternatively, sumoylation by RanBP2 may have a role in APC/C binding to chromosomes. The RanBP2-RanGAP1 complex is known to be involved in kinetochore-microtubule interactions and localization of several spindle checkpoint proteins (Joseph et al., 2004). We note that among the recent spate of publications on Ska3/Rama1, our observations appear to support a role in integration and regulation of the spindle checkpoint response (Daum et al., 2009; Theis et al., 2009).

Our whole-proteome analysis revealed that Cdc20 behaved like a member of the APC/C and was distinct from other components of the spindle checkpoint pathway with respect to its Ska complex dependency. Spindle checkpoint components associate with one another in cytoplasm as a mitotic checkpoint complex (MCC), containing BubR1, Bub3, Cdc20 and Mad2. Our data suggest that once the MCC associates with chromosomes, Cdc20 stably associates with the APC/C.

What Classes of Kinetochore Proteins Remain to Be Discovered?

We identified 13 kinetochore-associated proteins among previously uncharacterized proteins and, as discussed below, we predict that many more remain to be described. We therefore asked whether there is any functional relationship between these new proteins. That is, what sorts of kinetochore proteins had been missed in the many previous genetic and biochemical screens? An interesting answer has emerged.

Since the new kinetochore proteins were identified solely based on their occurrence in chromosomes, they could potentially represent a wide range of functions. Nevertheless, it is striking that five of the new centromere proteins (namely, CENP-28, −29, −31, −35, and −36) are subunits of complexes that modify and/or bind histones (Figure 6E). Yeast orthologs of two of these proteins (namely, CENP-28/C1orf149 and CENP-29/CFDP1) contribute to NuA4 histone acetyltransferase (HAT) and SWR1 ATP-dependent chromatin remodelling complexes, respectively. These complexes are known to share components (Wu et al., 2005) and together stimulate the exchange of histone H2A for H2A.Z, following acetylation of H2A or H4 (Altaf et al., 2010). A third centromere-associated protein, CENP-36/MSL1v1, is necessary for the activity of MOF, another HAT, on nucleosomal H4 (Li et al., 2009). Finally, CENP-31/PHD6 and CENP-35/PHF2 each contain PHD (plant homeodomain) zinc fingers, which are usually associated with chromatin-mediated transcriptional regulation. The PHD of CENP-35, which also contains a JmjC (likely histone demethylase) domain, appears to be required for demethylation of H3 at the promoters of ribosomal RNA genes (Wen et al., 2010).

Why were these proteins not discovered earlier as kinetochore-associated? One likely explanation is that they may have essential functions in other chromatin regions as well. Thus, mutations might have pleiotropic phenotypes not recognized as specific for mitosis. Furthermore, their association might depend on a fully assembled kinetochore and thus be lost when attempting other than whole chromosome analysis.

Characterization of a Kinetochore Protein

CENP-32 is required both for chromosome alignment and for association of the centrosomes with the poles of the bipolar spindle during metaphase. This latter phenotype is very similar to an unusual spindle morphology phenotype seen in Drosophila cells following depletion of the CLASP homolog Mast/Orbit (Maiato et al., 2002). Indeed, in our analysis, CENP-32 clusters with CLASP1 and CLASP2. A yeast homolog of CENP-32 interacts with CBF5, an enzyme involved in the posttranscriptional modification of rRNA, that was shown to bind to budding yeast centromeres and microtubules (Jiang et al., 1993). Bioinformatics analysis suggests that CENP-32 is a member of the SPOUT family of methyltransferases but is atypical in possessing a possible RNA-binding OB fold inserted into its catalytic domain (Tkaczuk et al., 2007). It is tempting to speculate that CENP-32 may function at kinetochores by interacting with an as-yet unknown RNA.

How Complex Is the Kinetochore?

MCCP analysis allows us to predict how many more kinetochore proteins remain to be identified in our data set. In the plot of Figure 3, where the chromosomal proteome is displayed in two dimensions, we found 35% of novel tagged proteins from region R and 6% from region L to associate with the kinetochore during mitosis. These regions have 224 and 287 as-yet uncharacterized novel proteins, respectively. Assuming no bias in the proteins we cloned, this suggests that approximately 97 more kinetochore proteins remain to be discovered. Taking into account the 13 kinetochore-associated proteins confirmed in our work, this roughly doubles the currently known protein complexity of the kinetochore during mitosis, confirming it as one of the most complex cellular substructures.

Conclusions

Multiclassifier combinatorial proteomics and the data sets described here open the door to the identification of all functional components of mitotic chromosomes despite the adventitious binding of cellular background proteins during mitosis. Furthermore, MCCP can be extended by adding additional classifiers to delineate protein complexes and define functional dependencies between them in the context of intact mitotic chromosomes. This will serve both as a starting point for systematic determination of the full range of functions involved in mitotic chromosome segregation, and as a basis for the development of detailed structural and functional interaction maps of key chromosomal subdomains. MCCP should also prove useful for the analysis of other cellular structures that lack defined boundaries, e.g., membrane associated complexes like the post-synaptic density.

Experimental Procedures

Preparation of Mitotic Chromosomes

DT40 cells were incubated with Nocodazole for 13 hr, resulting in a mitotic index of 70%–90%. Mitotic chromosomes were isolated in the polyamine-EDTA buffer system optimized for chicken DT40 cells (Lewis and Laemmli, 1982). 19.3 OD260 units were obtained from pooling the material of four independent preparations totaling 7.5 × 109 DT40 cells and solubilized in SDS-polyacrylamide gel electrophoresis (SDS-PAGE) sample buffer.

Preparation of Chromosome-Free Mitotic Cell Extracts

Nocodazole blocked DT40 cells were dounce-homogenized under hypotonic conditions. Mitotic chromosomes were removed by centrifuging the supernatant twice at 10,000 x g and discarding the pellets to prepare a cell extract free of chromosomes.

To measure the ratios between chromosomal and nonchromosomal proteins, SILAC based mass spectrometry was performed with 150 μg of labeled cell extract from 7.0 × 106 cells and 150 μg of nonlabeled proteins contained in isolated chromosomes from 2.0 × 109 cells.

To measure the exchange ratio, we isolated mitotic chromosomes from roughly 1.0 × 109 cells. Mitotic chromosomes were pelleted after centrifuging at 3000 × g and mixed into 10 ml cell extract that were made from 3.0 × 108 cells. This mixture was incubated at 14°C for 30 min. Finally, we re-isolated the chromosomes as described above.

Mass Spectrometric Analysis

Proteins were separated into a high and a low molecular weight fraction by SDS-PAGE, in-gel digested using trypsin (Shevchenko et al., 2006), and fractionated into 30 fractions each using SCX. The individual SCX fractions were desalted using StageTips (Rappsilber et al., 2003) and analyzed using LC-MS on a LTQ-Orbitrap (Thermo Fisher Scientific) coupled to HPLC via a nanoelectrospray ion source. The six most intense ions of a full MS acquired in the orbitrap analyzer were fragmented and analyzed in the linear ion trap. The MS data were analyzed using MaxQuant (Cox and Mann, 2008) and proteins identified by searching MS and MS/MS data using the MASCOT search engine (Matrix Science, UK). For more details, see the Extended Experimental Procedures.

Extended Experimental Procedures

Cell Culture

DT40 cells with the SMC2 conditional knockout or Ska3/RAMA1 knockout were maintained in RPMI 1640 medium supplemented with 10% (v/v) FBS, 1% CS, 100 U/ml penicillin, 100 μg/ml streptomycin (GIBCO-BRL) at 39°C and 5% CO2 in a humid incubator. For labeling Lysine and Arginine by carbon 13, cells were maintained in RPMI without L-Lysine and L-Arginine (GIBCO-BRL) supplemented with 10% (v/v) FBS dialyzed 10,000 molecular weight cut-off (Sigma), 100 μg/ml U-13C6-L-Lysine:2HCl, 30 μg/ml U-13C6-L-Arginine:HCl, 100 U/ml penicillin, 100 μg /ml streptomycin (GIBCO-BRL) at 37°C and 5% CO2 in a humid incubator.

For RNAI, U2OS or HeLa cells in exponential growth were seeded onto coverslips and grown overnight at 37°C in RPMI/10% FBS (GIBCO-BRL), 5% CO2. 200 ng of plasmid DNA was administered to the cells at 30%–40% confluence by transfection with FuGENE6 (Roche) in complete medium without antibiotics. Cells were maintained in this medium for 24 hr.

Preparation of Mitotic Chromosomes

DT40 cells at densities of 8–10 × 105/ml were incubated with 0.5 μg /ml Nocodazole for 13 hr, resulting in a mitotic index of 70%–90%. Mitotic chromosomes were isolated in the polyamine-EDTA buffer system optimized for chicken DT40 cells (Lewis and Laemmli, 1982). 19.3 OD260 units were obtained from pooling the material of four independent preparations totaling 7.5 × 109 DT40 cells, sonicated for 15 min, the proteins precipitated with TCA and the pellet was solubilized in SDS-polyacrylamide gel electrophoresis (SDS-PAGE) sample buffer.

Preparation of Chromosome-Free Mitotic Cell Extracts

Nocodazole blocked cells were treated with hypotonic conditions using 40mM KCl and dounced with 10 vigorous strokes. Mitotic chromosomes were removed by centrifuging the supernatant twice at 10,000 x g and discarding the pellets to prepare a cell extract free of chromosomes.

To measure the ratios between chromosomal and nonchromosomal proteins, SILAC based mass spectrometry was performed with 150 μg of labeled cell extract from 7.0 × 106 cells and 150 μg of nonlabeled proteins contained in isolated chromosomes from 2.0 × 109 cells.

To measure the exchange ratio, we isolated mitotic chromosomes from roughly 1.0 × 109 cells. Mitotic chromosomes were pelleted after centrifuging at 3,000 x g and mixed into 10 ml cell extract that were made from 3.0 × 108 cells. This mixture was incubated at 14°C for 30 min. Finally, we re-isolated the chromosomes as described above.

Control Experiments

Control experiments in which beads loaded with DNA were exposed to cytoplasm did not clearly differentiate between chromosomal proteins and contaminants (data not shown). Although those experiments did identify many obvious contaminants that can bind to DNA, they also yielded many true chromosomal proteins (e.g., histones and topoisomerases) whose presence in the cytoplasmic fractions was likely due to unavoidable nuclear damage during extract preparation. Likewise, although procedures have been published for the proteomic analysis of interphase chromatin from C. elegans sperm (Chu et al., 2006) and from rice (Tan et al., 2007), comparing mitotic to interphase chromatin is of limited use. Interphase chromatin is shielded from the cytoplasm by the nuclear envelope. Thus in such a comparison, many cytoplasmic hitchhikers proteins would be scored as mitosis-specific chromosomal proteins.

Sample Preparation and Fractionation for Mass Spectrometry

300 μg protein were separated into a high and a low molecular weight fraction on a 1mm 4%–12% tris-glycine NuPAGE gel in MOPS buffer (Invitrogen), fixed with 50% methanol, 5% acetic acid, and stained with Invitrogren's Colloidal Blue kit. The two fractions were excised, and proteins were digested using trypsin at an enzyme-to-protein ratio of 1:50 as described (Shevchenko et al., 1996). In brief, proteins were reduced in 10 mm DTT for 30 min at 37°C, alkylated in 55 mm iodoacetamide for 20 min at room temperature in the dark, and digested overnight at 37°C with 12.5 ng/μl trypsin (proteomics grade, Sigma). The digestion medium was then quenched with phosphoric acid (Sigma-Aldrich) until pH 3.0 was reached. The sample was then diluted with Buffer A (5 mM KH2PO4, 10% acetonitrile, pH 3.0) to 20% acetonitrile and fractionated using strong cation exchange chromatography (2.1-mm × 20-cm polysulfethyl A column, Poly LC, Columbia, MD) on an UltiMate 3000 HPLC instrument (Dionex UK Ltd., Surrey, UK). Peptides were separated using Buffer A, Buffer B (Buffer A with 1 M KCl), a flow rate of 200 μl/min, and a 50 min gradient as follows: 0–1 min, 0% Buffer B; 1–5 min, raise to 0.5% Buffer B; 5–14 min, raise to 1.0% Buffer B; 14–41 min, raise to 70% Buffer B (curve 7 equation, CHROMELEON software version 6.80, Dionex UK Ltd.); 41–42 min, 70% Buffer B. Fractions were collected every 1 min. All the fractions were diluted with 0.1% TFA at a ratio of 1:1 and desalted using StageTips (Rappsilber et al., 2003) for subsequent LC-MS/MS analysis.

Mass Spectrometric Analysis

An LTQ-Orbitrap mass spectrometer (Thermo Fischer Scientific) coupled on line to an Agilent 1100 binary nanopump and an HTC PAL autosampler (CTC Analytics) was used to analyze the fractionated peptides. The analytical column with a self-assembled particle frit (Ishihama et al., 2002) was prepared by packing C18 material (ReproSil-Pur C18-AQ 3 μm, Dr. Maisch, GmbH) into a spray emitter (75-μm inner diameter, 8-μm opening, 70-mm length; New Objectives) using an air pressure pump (Proxeon Biosystems). 5% acetonitrile, and 0.5% acetic acid constituted mobile phase A, while mobile phase B comprised acetonitrile and 0.5% acetic acid. Peptides were loaded onto the column at a flow rate of 0.7 μl/min and eluted at a flow rate of 0.3 μl/min. The gradient used for each fraction varied from 2 to 4 hr depending on the estimated concentration of peptides in the fraction. For a 2-h gradient run, elution used a gradient from 0% B to 20% B over 75 min and then from 20% B to 80% B over 13 min. Orbitrap Fourier-transform mass spectra were recorded at 30,000 resolution, and the six most intense ions were selected in the ion trap for fragmentation (normal scan; wideband activation; 750,000 and 15,000 ions fill target for MS and MS/MS respectively; maximum fill time, 150 ms; dynamic exclusion for 60 s).

Data Processing and Protein Identification

MS data were processed using MaxQuant 1.0.11.2 (Cox and Mann, 2008) using the default settings (slice peaks: true; recalibrate: true; correct mass errors: true; max. peptide FDR: 0.01; max. peptide FPR: 0.1; max. protein FDR: 0.01; min. peptide length: 6; peptides used for protein quantitation: razor). Searches were conducted using Mascot version 2.1.3 (Matrix Sciences) for database searches against the chicken IPI database (version 3.49) in target/decoy configuration. Parameters used for the Mascot search were Trypsin/P for cleavage, Oxidation (M) and Acetyl (Protein N-term) for variable modifications, Label:13C(6) (R) and Label:13C(6) (K) for SILAC labels, monoisotopic mass, 7 ppm for MS tolerance and 0.5 Da for MS2 tolerance. Protein identification required two peptides, of which at least one was required to be unique in the database. A minimum of two peptide SILAC ratios was require to report a protein ratio.

Proteins that share some but not all of the observed peptides were combined for the purpose of abundance estimation into a group, with a summed, unique spectral count and summed mass of all group members minus sequence redundancies provided that they were at least the length of a tryptic peptide. The group was then split following a tree based on sequence similarities between proteins. At the first branch of the tree, for each subgroup the count was estimated again based on the unique spectral count and unique mass. The abundances of the subgroups were accepted if their sum was within a two-fold window of the total group's abundance. This was continued until each protein obtained an estimation of its own abundance or until the sum of abundances deviated from the group estimate on the previous level by more than two-fold (“out of range”).

To obtain the name and functions of each protein, we referred to the following online databases: International Protein Index (IPI; http://www.ebi.ac.uk/IPI/IPIhelp.html), Ensembl (http://www.ensembl.org), Uniprot (http://beta.uniprot.org), Entrez Gene (http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene) and InterPro (http://www.ebi.ac.uk/interpro). Over 3,000 individual searches were performed to generate the correspondence between IPI numbers and gene names listed in Table S1.

Replicate Analyses

All experiments minimized the impact of biases introduced due to fluctuations in the preparation by using pooled material from four independent preparations of mitotic chromosomes, as described above. In addition, the experiments were performed in duplicate and thus classifier values are averages of two measurements. Finally, no protein measurement is considered alone but only in conjunction with measurements of the same protein for other classifiers in a multidimensional analysis. Alternatively, members of the same complex are considered together. Both strategies provide a safeguard mechanism against artifacts resulting from outliers despite pooling and duplicate analysis.

Classifier I: Abundance

In order to estimate the abundance of proteins in mitotic chromosomes based on mass spectrometric data we rely on emPAI (Rappsilber et al., 2002; Ishihama et al., 2005, 2008) in slightly modified form. Briefly, emPAI is calculated based on the number of peptides observed (Table S4) and the number of times that each peptide is observed (spectral count) for each protein as follows:

emPAI=10[(spectral count)/(protein-specific normalization)]1.

Protein molecular weight was used here as a specific normalization, to adjust for differences between proteins in the number of observable peptides.

In order to estimate absolute abundance of proteins based on emPAI we have made three amendments to emPAI. First, a proportionality constant a was introduced, reflecting the fact that emPAI might in the best case be directly proportional to protein abundance. Second, a scaling constant b was introduced. The values obtained by emPAI have been shown to estimate stoichiometry within a factor of 1.7 ± 0.8 accuracy for proteins varying 50-fold in their abundance {Ishihama, 2005 #1013}. However, it is unclear how this accuracy would scale when considering stoichiometries that span several orders of magnitude. These adjustments give the following formula:

estimated copy number=a(10[b(spectral count)/(protein-specific normalization)]1).

Third, calibrating emPAI by using internal standards for which the absolute abundance is known (or can be reliably estimated) and that span the dynamic range in protein abundance allows minimizing the scaling effects and determination of the proportionality and scaling constants. Further calibration and a full evaluation of our approach will be required before mechanistic conclusions should be drawn from estimated stoichiometries in our data. Such an evaluation exceeds the needs for the analysis presented here, however. In the present study, we use the estimated abundances in order to estimate the proportions by which different protein classes contribute to mitotic chromosomes and to construct classifier I. It is remarkable (though not emphasized in the text) that members of protein complexes whose abundances were all estimated independently, nonetheless turned out to consistently have similar values for classifier I. Ultimately, the success of our RF analysis demonstrates that classifier I must suffice in its current form.

Independent experimental evidence supporting our estimation of copy numbers is provided by Coomassie-based quantitation of histones, which reveals that these proteins contribute 60% to the protein mass of mitotic chromosomes. This is remarkably close to our emPAI-based estimate of 48%. Since histones are so abundant, their estimation involves multiplication of by far and away the largest numbers. Thus even a minor scaling error could easily skew the proportion of histones versus all other proteins, particularly since the remainder of chromosomes is composed of many low abundance proteins. To illustrate this point, taking emPAI values without our scaling factor and proportionality constant returns histones as 100% of the chromosomal protein. In contrast, using spectral count alone would see histones at 10% of the chromatin protein mass.

It should also be noted that we predict 24 copies of CENP-A to be present at each of the 156 DT40 centromeres. This compares to 25-40 copies of CENP-A as measured at DT40 kinetochores recently by quantitative super-resolution microscopy (Ribeiro et al., 2010). The close coincidence of these two values measured by two independent approaches offers further support for our classifier I.

For approximate scaling of emPAI we used three proteins for which quantities had either been measured or estimated in chicken cells: CENP-H (4680 molecules/cell measured by quantitative fluorescence – (Ribeiro et al., 2009)), topoisomerase IIα (400,000 molecules/mitotic cell measured in similar chicken cell lines by quantitative immunoprecipitation) and histone H4 (5 × 107 molecules/cell estimated based on a diploid genome size of 2.4 × 109 base pairs and 190 base pairs per nucleosome – J. Allan, personal communication). The proportionality and scaling constants in our modified equation for emPAI were approximated to provide values for the three reference proteins that approximate most closely the reported or predicted values. This resulted in the proportionality constant a = 285097 and the scaling constant b = 0.01055. Taking for the three proteins the reported/predicted and emPAI calculated abundance (Table S1) reveals a linear relationship over several orders of magnitude (Figure S1A). This shows that the two constants can be chosen to fit all three data points. Abundance determined for all proteins observed in our analyses was reproducible when using the modified emPAI equation with the approximated constants, as shown by plotting the results from two independent experiments against one another (Figure S1B). In this analysis, 60% of the determinations were within a factor of ± 2x and 96% were within ± 10x. For comparison, the reproducibility of all classifier experiments is plotted in Figure S1C.

Figure S1
Proteomic Analysis of Mitotic Chromosome Proteins Gave Reproducible Classifier Values, Related to Figure 1

Classifier VI: Domain Analysis

We included prior knowledge in the form of domain annotations as part of our analysis. In order to obtain value from domain annotations, we first counted how many chromosomal and nonchromosomal proteins each domain was found in (Table S2), and expressed these as a proportion of all the training-set proteins the domain was found in (i.e., so that these two proportions add up to 1). For each protein, we then summed the proportions for all domains found in that protein, giving us a sum of chromosomal domain proportions and a sum of nonchromosomal domain proportions. We used these two values as variables for each protein in classifier VI. We allowed the relative importance of these columns to be determined by the random forest itself.

Example Calculations for the Classifiers

Abundance Estimate Calculation

Spc24 (IPI00576606) was identified 36 independent times in our reference experiment. This means that 36 individual fragmentation spectra were matched to peptide sequences from Spc24. This number is known as the spectral count. The protein abundance index (PAI) attempts to normalize for the size of the protein, since larger proteins have a greater chance of being seen because they can generate more peptides. In this case we divide the spectral count by the molecular weight of this protein in kDa, 22.831, to give a PAI of 1.58.

Plugging this into our calibrated emPAI equation gives the following:

MSMS Count=36; MW=22.831; PAI=MSMS Count/MW=36/22.831=1.57680346896763; a=285097.187780369;b=0.01054633137937;Abundance estimate=a(10ˆ(bPAI)1)=285097.187780369(10ˆ(0.010546331379371.57680346896763)1)=11128.3013920301

And so we estimate that there are 11,128 Spc24 molecules per cell-equivalent in our preparation. Dividing by the number of kinetochores in DT40 cells (156), this gives 71.3 copies of Spc24 per kinetochore.

Ratio-Based Classifiers

Classifiers II-V are taken directly from the MaxQuant Ratio (the heavy-to-light ratio of signals from all observed peptides of a protein are taken together to give the heavy-to-light ratio of this protein) calculated for each protein (Cox and Mann, 2008). For classifiers II and III those ratios are inverted in order that for each classifier, higher values should represent enrichment for chromosomal proteins.

Domains Calculation

For example, INCENP (IPI00823144) is annotated with 3 domains, an Aurora-related kinase (ARK) binding domain (PF03941), a tropomyosin signature domain (PR00194) and a glutamic acid-rich region (PS50313). Table S5 shows the number of TRUE (chromosomal) and FALSE (nonchromosomal) proteins that each domain is associated with in our dataset. We calculated the proportion of true and false proteins that each domain was associated, e.g., the tropomyosin domain was found in 20 TRUE proteins and 42 FALSE protein, so the proportions are 20/62 = 0.32 and 42/62 = 0.68 for TRUE and FALSE respectively. We then summed these proportion for all domains in each protein, as shown at the bottom of the table.

These two sums provide the two columns for this classifier.

Combination of Classifiers by Random Forest Analysis

Combination of classifiers was performed using the WEKA implementation of random forest (RF) (Breiman, 2001; Frank et al., 2004; Hall et al., 2009). We defined two classes of proteins whose assignment as chromosomal or nonchromosomal was in general nonambiguous: centromere, telomere, chromosomal periphery, other chromosomal, replication and repair, transcription factors, RNA polymerases, histones and histone modifying proteins were labeled as true and cytoskeletal, cytoplasmic, mitochondrial, membrane and receptor proteins were labeled as false. Note that the choice of training data sets to establish the random forest is crucial. Our choice of the training data was validated in retrospect by the characterization of the 50 novel cloned proteins as shown in Figure 3. We asked the RF to train on the training data and give values on all of our identified proteins. However, additional steps were taken to improve the value of the dataset prior to combination. Each classifier was subject to k-nearest-neighbors (k-NN) (Cover and Hart, 1980) using in-house software, and these values were added to the raw classifier data to be used in the RF. The k-NN was optimized at 50 nearest neighbors using receiver operator characteristic (ROC) across a range of values from 10 to 100. In addition to k-NN, we also added the number of missing data points in a protein as another variable to the random forest.

Random forests work by constructing a series of decision trees, the aim of each of which is to discriminate different classes of data (e.g., true and false). The trees are built randomly, so at each node (decision point) in the tree any classifier could be used. Then the data are divided on this classifier, each decision giving rise to another node where another random decision is made. Eventually, a node will adequately segregate the data on which it operates into two classes, and no more nodes are needed. Each tree is capable of distinguishing different classes of data, but a single tree does not perform definitively on data that was not in the training set, so multiple trees are used. The summation of all trees provides a “vote” on each data point to decide which class (true/chromosomal, false/nonchromosomal) the data point (e.g., a protein) is likely to be.

The data were subjected to RF using WEKA, via Java. The number of trees used was optimized using 10-fold cross-validation and ROC. The optimal number of trees was found to be 450. The depth of the trees was not limited. 10-fold cross-validation is a standard method of evaluating RF, since RF are susceptible to over-training and evaluation on the training set gives deceivingly good results, not likely to be representative of how the RF will perform on new data.

WEKA does not enable output of the 10-fold cross-validation data itself, so to obtain these data for proteins in our training set, we randomly split our training set into 10 portions of roughly equal size (within 10%) and use 9 of the ten portions to train the RF, which was then used to produce values for the tenth portion. This was repeated until each protein had a value from an RF in which it was not a member of the training set. All the RF data we show on chromosomal (true) and nonchromosomal (false) proteins, except in Figure 3A, are derived from these cross-validation data.

The output of the random forest, in this context, is a number between 0 and 1 that represents the probability that a protein is chromosomal as opposed to nonchromosomal. It does not consider any other class of protein, and makes the implicit assumption that our training set (in this case 772 chromosomal and 1331 nonchromosomal proteins) adequately represents the unknown proteins that we would like to classify.

RNAi and Indirect Immunofluorescence Microscopy

100 pmol of siRNA (Control: AACGTACGCGGAATACTTCGAdT; Ska3/Rama1: ATGGATATAGTCCACGTGTCAdT - Hs_C13orf3_5; CENP-27: AAGCCGCTGAGTGAAGTAAGAdT – Hs_C10orf104_7; CENP-23: TCGCAGGACCCTCGCACCAAAdT - Hs_C9orf114_3, QIAGEN) was administered to U2OS or HeLa cells at 30%–40% confluence by transfection with OligoFectamine (Invitrogen) in complete medium without antibiotics. Cells maintained in this medium for 48 (Ska3/Rama1 and CENP-32), 72 (CENP-27) hours were fixed for 5 min. with 4% (v/v) paraformaldehyde (Electron Microscopy Services) in PBS or for 20 min. with Methanol in −20°C. After permeabilization with 0.15% (v/v) Triton X-100 in PBS, coverslips were blocked with 1% (v/v) BSA in PBS. Cells were probed with antibodies against: CENP-C (1:600, rabbit 554), ACA (1:200, human), α-tubulin (B512, 1:2000, mouse; Sigma), RanBP2 (1:1000, goat) (Hutten et al., 2008), RacGAP1 (N-19, 1:100, goat, Santa-Cruz Biotechnology), SBP (1:300, mouse 20), Pericentrin (ab4448, 1:1000, rabbit, Abcam), γ-tubulin (AK15, 1:1000, rabbit; Sigma). Cells were washed three times with cytoskeleton buffer for 5 min, fluorescence-labeled secondary antibodies were applied at 1:200, and the DNA was counterstained with DAPI at 0.1 μg/ml.

Bioinformatics Analysis

Multiple sequence alignments were produced with T-Coffee (Notredame et al., 2000) using default parameters, slightly refined manually and viewed with the Belvu program (Sonnhammer and Hollich, 2005). The amino acid coloring scheme indicates average BLOSUM62 scores (correlated with amino acid conservation) for each alignment column: red (greater than 3), violet (between 3 and 1.5) and light yellow (between 1.5 and 0.5).

Sequence names (and UniProt accession codes in parentheses) for the analysis of CENP-32 are as follows: hCENP32 (Q5T280) Homo sapiens; Fly (Q8MSZ6) Drosophila melanogaster; Worm (Q10950) Caenorhabditis elegans; Plant (Q6NLH7) Arabidopsis thaliana; YGR283C (P53336) Saccharomyces cerevisiae; YMR310C (Q04867) S. cerevisiae; S_pombe (O13641) Schizosaccharomyces pombe; 1K3R (O26109) Methanobacterium thermoautotrophicum protein of known structure.

The ribbon representation and electrostatic surface potential map for a homology model of the human CENP-32 protein was based on the published crystal structure of the CENP-32 homolog from M. thermoautotrophicum (Zarembinski et al., 2003) and obtained using MODELER (Marti-Renom et al., 2000). Images and electrostatic potentials were created with Pymol (http://pymol.sourceforge.net) and APBS (Baker et al., 2001).

Among the other novel kinetochore proteins, CENP-30/FBOX28 contains an F-box domain characteristic of proteins that link substrates with E3 ubiquitin ligases. Skp1, a member of the SCF complex (Skp1-Cullin-F-box) regulates the stability of kinetochore protein Ctf13 in budding yeast (Kaplan et al., 1997), and with Sgt1 (another component of an E3 ubiquitin ligase) is involved in checkpoint regulation (Kitagawa et al., 1999). Thus, CENP-30 may be involved in regulating protein turnover within the kinetochore.

Acknowledgments

We dedicate this paper to Uli Laemmli on the occasion of his 70th birthday. We thank Mayumi Takahashi for assistance with preparation of the Ska3/Rama1 knockout, Frauke Melchior for anti-RanBP2 and David Tollervey, Margarete Heck, Robin Allshire, Iain Cheeseman, Kumiko Samejima, Susana Ribeiro and Jan Bergmann for critical reading of the manuscript. This work was supported by a European Community Marie Curie Excellence Grant (JR), the MRC (CPP), EMBO (LSP) and the Wellcome Trust (WCE, JR). WCE and JR are Principal and Senior Research Fellows of The Wellcome Trust, respectively.

Notes

Published: September 2, 2010

Footnotes

Supplemental Information includes Extended Experimental Procedures, six figures, and five tables and can be found with this article online at doi:10.1016/j.cell.2010.07.047.

Supplemental Information

Table S1. List of All Proteins Identified in This Study:

Details of the proteins identified across the five proteomic experiments. Column “TF” refers to whether the protein is considered true (chromosomal) or false (nonchromosomal) in the training set. Columns “Localization,” “Function,” “Name” and “Complex” are various annotations we have made to the proteins. “Domain Accessions” are the accession numbers from Ensembl of domains annotated in each protein. “cI” through “cV” are the values for classifiers I..V and “k-NN cI” through “k-NN cV” are the nearest neighbor results for each classifier. “Q” is the number of missing values among the classifiers. “True Domain Score and “False Domain Score” are as described in Supplementary Experimental Procedures. In brief they represent the trueness (tendency to occur on chromosomal proteins) or falseness (tendency to occur on nonchromosomal proteins) of domains annotated in each protein. Columns “Cloned IPI,” “Cloned Ensembl,” “Cloned MW,” “Cloned Localization” and “Cloned Name” give information about the novel/unknown proteins we cloned in this study. The remaining columns give the results of random forest analysis, with columns names ending “TF” giving the predicted class of each protein and columns ending “probability T” giving the proportion of trees that voted a protein as true (chromosomal). “RF(cI..V)” refers to classifiers I through V being combined using random forest analysis. “10CV” denotes that the values are the result of 10-fold cross-validation.

Table S2. Frequency of Domains in Training Set:

This table provides the accession numbers for all domains annotated for proteins in our dataset. The frequency of each domain in chromosomal and nonchromosomal proteins is also given. These data were used to generate classifier VI. Notably, 92 chromosomal and 120 nonchromosomal proteins had no domains annotated and so could not be segregated based on classifier VI alone.

Table S3. Novel Proteins Tagged in This Study:
Table S4. List of All Peptides Identified in This Study:
Table S5. Example of Domain-Based Classifier Calculation:
Document S1. Article Plus Supplemental Information:

References

Altaf M., Auger A., Monnet-Saksouk J., Brodeur J., Piquet S., Cramet M., Bouchard N., Lacoste N., Utley R.T., Gaudreau L. NuA4-dependent acetylation of nucleosomal histone H4 and H2A directly stimulates incorporation of H2A.Z by the SWR1 complex. J. Biol. Chem. 2010 in press. [PubMed]
Amano M., Suzuki A., Hori T., Backer C., Okawa K., Cheeseman I.M., Fukagawa T. The CENP-S complex is essential for the stable assembly of outer kinetochore structure. J. Cell Biol. 2009;186:173–182. [PMC free article] [PubMed]
Andersen J.S., Wilkinson C.J., Mayor T., Mortensen P., Nigg E.A., Mann M. Proteomic characterization of the human centrosome by protein correlation profiling. Nature. 2003;426:570–574. [PubMed]
Andrade M.A., Ponting C.P., Gibson T.J., Bork P. Homology-based method for identification of protein repeats using statistical significance estimates. J. Mol. Biol. 2000;298:521–537. [PubMed]
Cheeseman I.M., Desai A. Molecular architecture of the kinetochore-microtubule interface. Nat. Rev. Mol. Cell Biol. 2008;9:33–46. [PubMed]
Cox J., Mann M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 2008;26:1367–1372. [PubMed]
Daum J.R., Wren J.D., Daniel J.J., Sivakumar S., McAvoy J.N., Potapova T.A., Gorbsky G.J. Ska3 is required for spindle checkpoint silencing and the maintenance of chromosome cohesion in mitosis. Curr. Biol. 2009;19:1467–1472. [PMC free article] [PubMed]
de Godoy L.M., Olsen J.V., Cox J., Nielsen M.L., Hubner N.C., Frohlich F., Walther T.C., Mann M. Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast. Nature. 2008;455:1251–1254. [PubMed]
Dejardin J., Kingston R.E. Purification of proteins associated with specific genomic Loci. Cell. 2009;136:175–186. [PMC free article] [PubMed]
Earnshaw W.C., Rothfield N. Identification of a family of human centromere proteins using autoimmune sera from patients with scleroderma. Chromosoma (Berl) 1985;91:313–321. [PubMed]
Finn R.D., Mistry J., Tate J., Coggill P., Heger A., Pollington J.E., Gavin O.L., Gunasekaran P., Ceric G., Forslund K. The Pfam protein families database. Nucleic Acids Res. 2010;38:D211–D222. [PMC free article] [PubMed]
Foltz D.R., Jansen L.E., Black B.E., Bailey A.O., Yates J.R., Cleveland D.W. The human CENP-A centromeric nucleosome-associated complex. Nat. Cell Biol. 2006;8:458–469. [PubMed]
Foster L.J., De Hoog C.L., Mann M. Unbiased quantitative proteomics of lipid rafts reveals high specificity for signaling factors. Proc. Natl. Acad. Sci. USA. 2003;100:5813–5818. [PubMed]
Gaitanos T.N., Santamaria A., Jeyaprakash A.A., Wang B., Conti E., Nigg E.A. Stable kinetochore-microtubule interactions depend on the Ska complex and its new component Ska3/C13Orf3. EMBO J. 2009;28:1442–1452. [PubMed]
Gassmann R., Henzing A.J., Earnshaw W.C. Novel components of human mitotic chromosomes identified by proteomic analysis of the chromosome scaffold fraction. Chromosoma. 2005;113:385–397. [PubMed]
Gentleman R.C., Carey V.J., Bates D.M., Bolstad B., Dettling M., Dudoit S., Ellis B., Gautier L., Ge Y., Gentry J. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80. [PMC free article] [PubMed]
Hori T., Amano M., Suzuki A., Backer C.B., Welburn J.P., Dong Y., McEwen B.F., Shang W.H., Suzuki E., Okawa K. CCAN makes multiple contacts with centromeric DNA to provide distinct pathways to the outer kinetochore. Cell. 2008;135:1039–1052. [PubMed]
Hubner N.C., Bird A.W., Cox J., Splettstoesser B., Bandilla P., Poser I., Hyman A., Mann M. Quantitative proteomics combined with BAC TransgeneOmics reveals in vivo protein interactions. J. Cell Biol. 2010;189:739–754. [PMC free article] [PubMed]
Hudson D.F., Vagnarelli P., Gassmann R., Earnshaw W.C. Condensin is required for nonhistone protein assembly and structural integrity of vertebrate mitotic chromosomes. Dev. Cell. 2003;5:323–336. [PubMed]
Hutchins J.R., Toyoda Y., Hegemann B., Poser I., Heriche J.K., Sykora M.M., Augsburg M., Hudecz O., Buschhorn B.A., Bulkescher J. Systematic analysis of human protein complexes identifies chromosome segregation proteins. Science. 2010;328:593–599. [PMC free article] [PubMed]
Ishihama Y., Oda Y., Tabata T., Sato T., Nagasu T., Rappsilber J., Mann M. Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol. Cell. Proteomics. 2005;4:1265–1272. [PubMed]
Ishihama Y., Schmidt T., Rappsilber J., Mann M., Hartl F.U., Kerner M.J., Frishman D. Protein abundance profiling of the Escherichia coli cytosol. BMC Genomics. 2008;9:102. [PMC free article] [PubMed]
Jiang W., Middleton K., Yoon H.J., Fouquet C., Carbon J. An essential yeast protein, CBF5p, binds in vitro to centromeres and microtubules. Mol. Cell. Biol. 1993;13:4884–4893. [PMC free article] [PubMed]
Joseph J., Liu S.T., Jablonski S.A., Yen T.J., Dasso M. The RanGAP1-RanBP2 complex is essential for microtubule-kinetochore interactions in vivo. Curr. Biol. 2004;14:611–617. [PubMed]
Kops G.J., van der Voet M., Manak M.S., van Osch M.H., Naini S.M., Brear A., McLeod I.X., Hentschel D.M., Yates J.R., 3rd, van den Heuvel S., Shah J.V. APC16 is a conserved subunit of the anaphase-promoting complex/cyclosome. J. Cell Sci. 2010;123:1623–1633. [PubMed]
Kulukian A., Han J.S., Cleveland D.W. Unattached kinetochores catalyze production of an anaphase inhibitor that requires a Mad2 template to prime Cdc20 for BubR1 binding. Dev. Cell. 2009;16:105–117. [PMC free article] [PubMed]
Letunic I., Copley R.R., Schmidt S., Ciccarelli F.D., Doerks T., Schultz J., Ponting C.P., Bork P. SMART 4.0: towards genomic data integration. Nucleic Acids Res. 2004;32:D142–D144. [PMC free article] [PubMed]
Lewis C.D., Laemmli U.K. Higher order metaphase chromosome structure: evidence for metalloprotein interactions. Cell. 1982;29:171–181. [PubMed]
Li X., Wu L., Corsa C.A., Kunkel S., Dou Y. Two mammalian MOF complexes regulate transcription activation by distinct mechanisms. Mol. Cell. 2009;36:290–301. [PMC free article] [PubMed]
Maiato H., Sampaio P., Lemos C.L., Findlay J., Carmena M., Earnshaw W.C., Sunkel C.E. MAST/Orbit has a role in microtubule-kinetochore attachment and is essential for chromosome alignment and maintenance of spindle bipolarity. J. Cell Biol. 2002;157:749–760. [PMC free article] [PubMed]
Marín I. Evolution of chromatin-remodeling complexes: comparative genomics reveals the ancient origin of “novel” compensasome genes. J. Mol. Evol. 2003;56:527–539. [PubMed]
Morrison C., Henzing A.J., Jensen O.N., Osheroff N., Dodson H., Kandels-Lewis S.E., Adams R.R., Earnshaw W.C. Proteomic analysis of human metaphase chromosomes reveals topoisomerase II alpha as an Aurora B substrate. Nucleic Acids Res. 2002;30:5318–5327. [PMC free article] [PubMed]
Neumann B., Walter T., Heriche J.K., Bulkescher J., Erfle H., Conrad C., Rogers P., Poser I., Held M., Liebel U. Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes. Nature. 2010;464:721–727. [PMC free article] [PubMed]
Obuse C., Iwasaki O., Kiyomitsu T., Goshima G., Toyoda Y., Yanagida M. A conserved Mis12 centromere complex is linked to heterochromatic HP1 and outer kinetochore protein Zwint-1. Nat. Cell Biol. 2004;6:1135–1141. [PubMed]
Okada M., Cheeseman I.M., Hori T., Okawa K., McLeod I.X., Yates J.R., Desai A., Fukagawa T. The CENP-H-I complex is required for the efficient incorporation of newly synthesized CENP-A into centromeres. Nat. Cell Biol. 2006;8:446–457. [PubMed]
Ong S.E., Blagoev B., Kratchmarova I., Kristensen D.B., Steen H., Pandey A., Mann M. Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol. Cell. Proteomics. 2002;1:376–386. [PubMed]
Porter I.M., McClelland S.E., Khoudoli G.A., Hunter C.J., Andersen J.S., McAinsh A.D., Blow J.J., Swedlow J.R. Bod1, a novel kinetochore protein required for chromosome biorientation. J. Cell Biol. 2007;179:187–197. [PMC free article] [PubMed]
Raaijmakers J.A., Tanenbaum M.E., Maia A.F., Medema R.H. RAMA1 is a novel kinetochore protein involved in kinetochore-microtubule attachment. J. Cell Sci. 2009;122:2436–2445. [PubMed]
Rappsilber J., Ishihama Y., Mann M. Stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal. Chem. 2003;75:663–670. [PubMed]
Rappsilber J., Ryder U., Lamond A.I., Mann M. Large-scale proteomic analysis of the human spliceosome. Genome Res. 2002;12:1231–1245. [PubMed]
Schirmer E.C., Florens L., Guan T., Yates J.R., 3rd, Gerace L. Nuclear membrane proteins with potential disease links found by subtractive proteomics. Science. 2003;301:1380–1382. [PubMed]
Shevchenko A., Tomas H., Havlis J., Olsen J.V., Mann M. In-gel digestion for mass spectrometric characterization of proteins and proteomes. Nat. Protoc. 2006;1:2856–2860. [PubMed]
Takata H., Uchiyama S., Nakamura N., Nakashima S., Kobayashi S., Sone T., Kimura S., Lahmers S., Granzier H., Labeit S. A comparative proteome analysis of human metaphase chromosomes isolated from two different cell lines reveals a set of conserved chromosome-associated proteins. Genes Cells. 2007;12:269–284. [PubMed]
Theis M., Slabicki M., Junqueira M., Paszkowski-Rogacz M., Sontheimer J., Kittler R., Heninger A.K., Glatter T., Kruusmaa K., Poser I. Comparative profiling identifies C13orf3 as a component of the Ska complex required for mammalian cell division. EMBO J. 2009;28:1453–1465. [PubMed]
Tkaczuk K.L., Dunin-Horkawicz S., Purta E., Bujnicki J.M. Structural and evolutionary bioinformatics of the SPOUT superfamily of methyltransferases. BMC Bioinformatics. 2007;8:73. [PMC free article] [PubMed]
Uchiyama S., Kobayashi S., Takata H., Ishihara T., Hori N., Higashi T., Hayashihara K., Sone T., Higo D., Nirasawa T. Proteome analysis of human metaphase chromosomes. J. Biol. Chem. 2005;280:16994–17004. [PubMed]
Van Hooser A.A., Yuh P., Heald R. The perichromosomal layer. Chromosoma. 2005;114:377–388. [PubMed]
Welburn J.P., Grishchuk E.L., Backer C.B., Wilson-Kubalek E.M., Yates J.R., 3rd, Cheeseman I.M. The human kinetochore Ska1 complex facilitates microtubule depolymerization-coupled motility. Dev. Cell. 2009;16:374–385. [PMC free article] [PubMed]
Wen H., Li J., Song T., Lu M., Kan P.Y., Lee M.G., Sha B., Shi X. Recognition of histone H3K4 trimethylation by the plant homeodomain of PHF2 modulates histone demethylation. J. Biol. Chem. 2010;285:9322–9326. [PubMed]
Wu W.H., Alami S., Luk E., Wu C.H., Sen S., Mizuguchi G., Wei D., Wu C. Swc2 is a widely conserved H2AZ-binding module essential for ATP-dependent histone exchange. Nat. Struct. Mol. Biol. 2005;12:1064–1071. [PubMed]

Supplemental References

Andrade, M.A., Ponting, C.P., Gibson, T.J., and Bork, P. (2000). Homology-based method for identification of protein repeats using statistical significance estimates. J. Mol. Biol. 298, 521–537. [PubMed]
Baker, N.A., Sept, D., Joseph, S., Holst, M.J., and McCammon, J.A. (2001). Electrostatics of nanosystems: application to microtubules and the ribosome. Proc. Natl. Acad. Sci. USA 98, 10037–10041. [PubMed]
Breiman, L. (2001). Random Forests. Mach. Learn. 45, 1.
Chu, D.S., Liu, H., Nix, P., Wu, T.F., Ralston, E.J., Yates, J.R., 3rd, and Meyer, B.J. (2006). Sperm chromatin proteomics identifies evolutionarily conserved fertility factors. Nature 443, 101–105. [PMC free article] [PubMed]
Cover, T.M., and Hart, P.E. (1980). Nearest Neighbor Pattern Classification. IEEE Transactions on Information Theory, IT-13(1):21–27, January 1967. In Pattern Recognition, K.S. Fu, ed. (Hong Kong, Chinese University Press).
Cox, J., and Mann, M. (2008). MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372. [PubMed]
Finn, R.D., Mistry, J., Tate, J., Coggill, P., Heger, A., Pollington, J.E., Gavin, O.L., Gunasekaran, P., Ceric, G., Forslund, K., et al. (2009). The Pfam protein families database. Nucleic Acids Res. 38, D211–D222. [PMC free article] [PubMed]
Frank, E., Hall, M., Trigg, L., Holmes, G., and Witten, I.H. (2004). Data mining in bioinformatics using Weka. Bioinformatics 20, 2479–2481. [PubMed]
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I.H. (2009). The WEKA Data Mining Software: An Update. SIGKDD Explorations 11, 1.
Hutten, S., Flotho, A., Melchior, F., and Kehlenbach, R.H. (2008). The Nup358-RanGAP complex is required for efficient importin alpha/beta-dependent nuclear import. Mol. Biol. Cell 19, 2300–2310. [PMC free article] [PubMed]
Ishihama, Y., Oda, Y., Tabata, T., Sato, T., Nagasu, T., Rappsilber, J., and Mann, M. (2005). Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein. Mol. Cell. Proteomics 4, 1265–1272. [PubMed]
Ishihama, Y., Rappsilber, J., Andersen, J.S., and Mann, M. (2002). Microcolumns with self-assembled particle frits for proteomics. J. Chromatogr. A 979, 233–239. [PubMed]
Ishihama, Y., Schmidt, T., Rappsilber, J., Mann, M., Hartl, F.U., Kerner, M.J., and Frishman, D. (2008). Protein abundance profiling of the Escherichia coli cytosol. BMC Genomics 9, 102. [PMC free article] [PubMed]
Kaplan, K.B., Hyman, A.A., and Sorger, P.K. (1997). Regulating the yeast kinetochore by ubiquitin-dependent degradation and Skp1p-mediated phosphorylation. Cell 91, 491–500. [PubMed]
Kitagawa, K., Skowyra, D., Elledge, S.J., Harper, J.W., and Hieter, P. (1999). SGT1 encodes an essential component of the yeast kinetochore assembly pathway and a novel subunit of the SCF ubiquitin ligase complex. Mol. Cell 4, 21–33. [PubMed]
Letunic, I., Copley, R.R., Schmidt, S., Ciccarelli, F.D., Doerks, T., Schultz, J., Ponting, C.P., and Bork, P. (2004). SMART 4.0: toward genomic data integration. Nucleic Acids Res. 32, D142–D144. [PMC free article] [PubMed]
Lewis, C.D., and Laemmli, U.K. (1982). Higher order metaphase chromosome structure: evidence for metalloprotein interactions. Cell 29, 171–181. [PubMed]
Marin, I. (2003). Evolution of chromatin-remodeling complexes: comparative genomics reveals the ancient origin of “novel” compensasome genes. J. Mol. Evol. 56, 527–539. [PubMed]
Marti-Renom, M.A., Stuart, A.C., Fiser, A., Sanchez, R., Melo, F., and Sali, A. (2000). Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29, 291–325. [PubMed]
Notredame, C., Higgins, D.G., and Heringa, J. (2000). T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217. [PubMed]
Rappsilber, J., Ishihama, Y., and Mann, M. (2003). Stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal. Chem. 75, 663–670. [PubMed]
Rappsilber, J., Ryder, U., Lamond, A.I., and Mann, M. (2002). Large-scale proteomic analysis of the human spliceosome. Genome Res. 12, 1231–1245. [PubMed]
Ribeiro, S.A., Gatlin, J.C., Dong, Y., Joglekar, A., Cameron, L., Hudson, D.F., Farr, C.J., McEwen, B.F., Salmon, E.D., Earnshaw, W.C., et al. (2009). Condensin regulates the stiffness of vertebrate centromeres. Mol. Biol. Cell 20, 2371–2380. [PMC free article] [PubMed]
Ribeiro, S.A., Vagnarelli, P., Dong, Y., Hori, T., McEwen, B.F., Fukagawa, T., Flors, C., and W.C. Earnshaw, W.C. (2010). A super-resolution map of the vertebrate kinetochore. Proc. Natl. Acad. Sci. USA, in press. [PubMed]
Shevchenko, A., Tomas, H., Havlis, J., Olsen, J.V., and Mann, M. (2006). In-gel digestion for mass spectrometric characterization of proteins and proteomes. Nat. Protoc. 1, 2856–2860. [PubMed]
Sonnhammer, E.L., and Hollich, V. (2005). Scoredist: a simple and robust protein sequence distance estimator. BMC Bioinformatics 6, 108. [PMC free article] [PubMed]
Tan, F., Li, G., Chitteti, B.R., and Peng, Z. (2007). Proteome and phosphoproteome analysis of chromatin associated proteins in rice (Oryza sativa). Proteomics 7, 4511–4527. [PubMed]
Zarembinski, T.I., Kim, Y., Peterson, K., Christendat, D., Dharamsi, A., Arrowsmith, C.H., Edwards, A.M., and Joachimiak, A. (2003). Deep trefoil knot implicated in RNA binding found in an archaebacterial protein. Proteins 50, 177–183. [PMC free article] [PubMed]