Understanding the design logic of living systems is now mainly based on genomics and DNA sequence comparisons. Typically, protein comparisons are evaluated by sequence alignments. However, living systems run programs that are written both as passive information (the genome) and as dynamic, molecular ecologies (the proteome). This dichotomy drives proteomic research because no living system can be solely described by its DNA sequence. Accurate proteomic maps are logically the next dataset required to complement complete genome sequences. However, the generation of reliable proteomic data remains challenging [1
The budding yeast, Saccharomyces cerevisiae
, has led eukaryotic research in several fields, particularly genomics, reverse genetics, cell biology and proteomics. For proteomic mapping, S. cerevisiae
has been the main venue for the evaluation of various methodologies, which led to the clear conclusion that biochemical methods based on physiological expression levels deliver the most accurate results. In contrast, bioinformatic, yeast two hybrid and overexpression approaches generate less accurate data that require validation by a different means [1
In contrast to a genome sequence, it is unlikely that a proteomic map can ever be complete because proteomes change in response to alterations of cellular condition. Proteomes include a very large number of post-translational modifications that are inherently variable, as well as protein-protein interactions that vary over a wide range of stabilities. Nevertheless, a proteome is based on a stable core of protein complexes, which can be accurately mapped by biochemical approaches [2
]. Hence, an accurate proteomic map will be based on the constellation of stable protein complexes for a given cellular condition. The map then provides a scaffold onto which transient interactions and post-translational modifications can be organized. Thereby, proteomes can be rationalized [5
The quest to understand proteomes has led to the definition of new perspectives and terms, such as a proteomic 'environment', which describes the local relationships within a group of interacting proteins; 'hubs', which is applied to proteins that interact with many other proteins [2
]; and 'hyperlinks', which is a term we applied to proteins that are present in more than one stable protein complex [7
]. Similarly, insight into proteomes can be gleaned from comparative proteomics [8
]. However, without accurate proteomic maps, these new terms and perspectives, particularly those derived from comparative proteomics, have limited meaning.
To map the budding yeast proteome accurately, methodologies for physiological expression and purification of tagged proteins were developed based on gene targeting with the tandem affinity purification (TAP) tag [9
]. The high throughput application of these methods by two different groups led to the best proteomic map datasets for any cell, whether prokaryotic or eukaryotic [11
]. Collins et al.
consolidated both datasets into one of even higher quality; nevertheless, they recommended more intensely focused data gathering to evaluate accuracy [13
Here we address the issue of proteomic accuracy by intense exploration of a section of the budding yeast proteome that is related to chromatin regulation. Chromatin is regulated by multiprotein complexes, which dynamically target nucleosomes with a multitude of reversible modifications, such as acetylation, methylation, phosphorylation and ubiquitination (reviewed in [14
]). Also, in budding yeast, many of these complexes have been individually isolated and functionally characterized, which provides a rich and detailed source of reference information. Previously, we concluded that greater accuracy can be attained by sequential tagging to reciprocally validate interactions [10
]. Sequential tagging of candidate interactors to map a proteomic environment has also been termed proteomic navigation or SEAM (short for Sequential rounds of Epitope tagging, Affinity isolation and Mass spectrometry). For a low throughput approach, which also permits a more intense focus on individual experiments, sequential tagging will deliver improvements in accuracy.
Several other factors may reduce mapping accuracy. In the S. cerevisiae
proteome every fourth protein is apparently a proteomic hyperlink [5
]. That is, a member of more than one distinct protein complex. Hence, many pull-downs are mixtures of completely or partially co-purified complexes, together with other sub-stoichiometric and pair-wise interactors. Also, sorting out background proteins from genuine interactors remains challenging [5
], especially when proteins are identified by mass spectrometric techniques with enhanced dynamic range, such as liquid chromatography tandem mass spectrometry (LC-MS/MS) or LC matrix-assisted laser desorption/ionization mass spectrometry (MALDI) MS/MS, which produce a large number of confident protein identifications in each pull-down. Furthermore, until recently, mass spectrometric identifications have mostly neglected the quantitative aspect. It was (and, largely, still is) difficult to determine which proteins are bona fide
members of a tagged complex and, therefore, stoichiometric, and which interactors are sub-stoichiometric. Here we address these issues to develop refinements for improved accuracy of mapping, including working criteria to identify common background proteins and stoichiometric interactors.
Using the sequential strategy and these refinements, we mapped a large proteomic environment that we term 'Chromatin Central' because it includes eight protein complexes interconnected by hyperlinks encompassing the major histone aceytyltransferases and deacetylases in budding yeast. As evidence for mapping accuracy, we made several discoveries, including the identification of new subunits of known complexes and new complexes.
To exploit the quality of the map for comparative proteomics, we then explored the same proteomic environment in the distantly related yeast Schizosaccharomyces pombe. This enabled a detailed comparison of two highly accurate proteomic environments to shed light on the evolution of proteomic architecture.