Direct, pair-wise (binary), physical protein-protein interactions (PPIs) are the foundation of all biological processes. Efforts to elucidate the interaction network of all proteins within a cell or organism — termed the interactome — has helped identify the architectural and functional blueprint of cellular processes in various model eukaryotic organisms, such as yeast
[1]–
[5],
Drosophila [6]–
[9],
C. elegans [10]–
[14],
Plasmodium [15],
Arabidopsis [16]–
[17], mouse
[18] and humans
[19]–
[22]. Mapping PPIs has forwarded our understanding of key biological processes such as the mitotic spindle
[23], cell polarity
[24], the proteasome
[25] and the editosome
[26]. Furthermore, it has helped assign roles to proteins of previously unknown function
[5] and has increased our understanding of and progress against human diseases
[27]–
[28].
There are two main methods of observing direct PPIs
in vivo: the yeast two-hybrid (Y2H) and its many derivatives
[29] and more recently, the protein-fragment complementation assay (PCA)
[30]. In the Y2H, the interaction of bait and prey fusion proteins within the nucleus reconstitutes a transcription factor that up-regulates the expression of a reporter gene. PCA works similarly to the Y2H but occurs in the cytoplasm and replaces the transcription-reporter system with a reconstituted reporter protein capable of metabolizing a toxic compound.
The PPIs of the yeast
Saccharomyces cerevisiae have been extensively explored. There are currently three genome-wide high-throughput yeast two-hybrid (HT-Y2H) surveys
[1]–
[3] and one genome-wide PCA study of the yeast interactome
[4]. However, while these large-scale Y2H and PCA screening projects have established proteome-wide protein interaction networks (PINs) for yeast, statistical analysis reveals that their combined datasets account for less than 30% of the entire yeast interactome
[3]. Furthermore, there is surprisingly little overlap of PPIs between each of the four aforementioned studies and with the literature-curated (LC) interaction dataset. The LC data, which are derived from small scale Y2H studies (otherwise known as the “community” dataset) displays a narrow focus on a few proteins or an interactome sub-network. Despite recent reports to the contrary
[21],
[31]–
[32], the LC dataset is commonly believed to be of higher quality than the HT-Y2H interactions due to its narrow focus on the PPIs of a few well-characterized proteins
[33]–
[36]. Furthermore, LC studies often report reciprocal interactions (bidirectional interactions where proteins A and B interact as either bait or prey), recapitulate their results via multiple independent orthogonal methods and integrate their findings with other forms of biochemical and genetic data
[37]–
[51]. The poor PPI overlap among the large-scale screens and with the LC dataset has led to the suggestion that the current HT-Y2H studies were not done to saturation, and therefore must be missing additional interactions
[35]. This may be due to a number of reasons. First, most genome-wide HT-Y2H studies do not include all of the protein-coding genes in the yeast genome. The absence of even a few proteins from HT-Y2H screens can significantly reduce interactome coverage
[3]. Also, the enormous scope of genome-wide HT-Y2H screens often necessitates a pooling strategy in which up to 96 or more baits or preys are pooled then tested for interaction. However, when pooled, proteins that are toxic when expressed at high levels may display a dominant negative phenotype and interactions involving weakly expressed proteins may be under-reported
[35]. Similarly, certain proteins may be inefficiently imported into the nucleus, the site of the Y2H assay. Furthermore, PPIs that are not physiologically relevant (the so called “biological false-positives”) may be obtained for proteins normally residing in different cellular compartments, expressed at different stages of the cell cycle or in different tissues. These confounding factors are believed to result in pooled HT-Y2H screening strategies being less sensitive than array-based one-by-one screens, while potentially containing a higher number of false positive interactions
[35],
[52].
We focused on mapping the PPIs of the small subunit (SSU) processome, a very large ribonucleoprotein complex comprised of ~72 proteins and the U3 small nucleolar RNA (snoRNA). This biochemically well defined complex guides the endonucleolytic processing events at sites A
0, A
1 and A
2 that liberate the mature 18S rRNA from the pre-rRNA transcript
[53]–
[55]. The SSU processome is also believed to chaperone the folding of the pre-18S rRNA and its assembly with ribosomal proteins into the mature SSU of the ribosome.
The SSU processome was originally identified by tandem affinity purification followed by mass spectrometry (TAP/MS) studies
[53]–
[54],
[56]. Subsequent TAP/MS studies expanded the list of SSU processome protein components and provided some of the first data on the presence of sub-complexes
[57]–
[59]. In all, nearly 70% of all SSU processome proteins have been identified by TAP/MS studies
[53]–
[54],
[57]–
[59], with the remaining proteins being identified by other biochemical or genetic methods. Thus, TAP/MS studies have significantly contributed to our current, nearly complete list of the protein constituents of the SSU processome
[53]–
[54],
[57]–
[59]. Typically, SSU processome protein components meet the following criteria:
i) they reside in the nucleolus, the site of ribosome biogenesis,
ii) their genetic depletion results in an 18S rRNA processing defect and
iii) they co-immunoprecipitate the U3 snoRNA and/or another SSU processome protein component. There are currently 46 confirmed SSU processome proteins and 26 potential candidates suggested from partial data (
Table S1). Some of these proteins have been categorized into the t-Utp/UtpA, UtpB, UtpC, Mpp10, Rcl1/Bms1 and U3 snoRNP sub-complexes by TAP tag co-complex purifications and small-scale Y2H studies
[38]–
[39],
[46],
[50],
[57]–
[59]. However, the majority of SSU processome proteins remain unassigned to a specific subcomplex due to a lack of interaction data. Some proteins may even be components of subcomplexes yet to be identified (
Table S1). Identifying the protein-protein interactions of the SSU processome thus becomes the next step in elucidating its assembly, mechanism of function and regulation in pre-rRNA processing.
Considering the SSU processome's well characterized and nearly complete component list, we sought to generate an up-to-date, comprehensive yeast SSU processome PIN by extracting and pooling protein interaction data from existing datasets. After retrieving both high-throughput and literature-curated binary protein interaction data, an interaction map was drawn using Cytoscape. The result is the most current protein interactome map of the yeast SSU processome to date, from which we identify additional interactions within the subcomplexes and some of the first potential interactions linking the various subcomplexes.