Measuring mutant growth under 21 conditions
To facilitate the generation of large mutant phenotype profiles, we developed a simple, cost-effective method for measuring the growth of a comprehensive set of yeast mutants under a relatively large number of conditions. Our strategy uses commercial microarray software (GenePix, Axon Instruments) to derive spot size and intensity information from digital images of cells replica pinned on conventional agar plates. Data are processed and normalized using a series of freely available Perl and Visual Basic scripts (
Supplementary information) that assign a growth value corresponding to no growth, slow growth, or full growth to each strain under each condition. To distinguish general slow growth from condition-specific growth defects, we normalize the growth values of each strain under an experimental condition by its value under the YPD control condition (Materials and methods). Using this system, we assayed the growth of the 4710 strain homozygous diploid yeast deletion set (
Giaever et al, 2002) under 21 environmental conditions (Materials and methods) in duplicate, a total of >10
5 data points. The homozygous deletion set was chosen in an attempt to minimize the effects of unlinked mutations documented in the haploid deletion strains (
Hughes et al, 2000b;
Bianchi et al, 2001) that could confer unrelated phenotypes or suppress true phenotypes. Experimental conditions were selected to cover a variety of cellular processes that could be measured in the context of rich media, allowing the use of the same control condition and permitting the inclusion of auxotrophic mutants unable to grow on minimal media. Each measurement was performed twice and only phenotypes that were consistent between both replicates were studied further. Of the 4710 mutants screened, 767 displayed significant growth defects, with either a slow growth or no growth phenotype relative to the control, under at least one of the 21 conditions.
We assessed the accuracy of our results in two ways. First, we compared our data to published data sets generated using the homozygous diploid yeast deletion set that assayed similar experimental conditions by a competitive growth/Affymetrix bar-code hybridization method (
Winzeler et al, 1999) (
Supplementary information). shows a comparison with the results of
Birrell et al (2001) in a screening of the same deletion collection for UV sensitivity. The comparison shows a high degree of overlap between our data, the Birrell
et al results, and a set of UV
S mutants described in the literature (
Birrell et al, 2001). In the Birrell
et al study, six of the UV
S mutants not identified by our study were annotated as having mild UV
S growth defects (
Supplementary Table 1), consistent with the greater sensitivity proposed for the competitive growth assay (
Winzeler et al, 1999). In contrast, our study identified three UV-sensitive mutants that the Birrell
et al study failed to detect due to poor hybridization of the DNA barcodes to the Affymetrix chip (
Supplementary Table 2), highlighting an advantage of the plate-based growth method. Neither our study nor the Birrell
et al study detected UV
S phenotypes for 13 mutants described in the literature (
Supplementary Table 2), suggesting strain-dependent differences in phenotype or errors in the deletion set. Our study also identified an additional 14 UV
S mutants not present in either set, including ctf4, rpb9, sgs1, and two genes of unknown function (
Supplementary Table 4). To confirm the results of the high-throughput assay, we tested the UV sensitivity of each strain individually (
Supplementary Figure 1). With the exception of one strain, cdc40, with growth defects too severe to permit a reliable assay, all strains showed a detectable UV
s phenotype, including 10 strains that exhibited strong UV sensitivity. In addition, all strains, except mrpl3, contained the correct gene deletion as determined by PCR (Dutta, Dudley, and Church, unpublished results), a result that highlights errors that can be introduced as a result of tracking errors or contamination. We also assessed the accuracy of our data through a statistical analysis of experimental replicates (
Supplementary Methods 1). From these estimations, we conclude that the probability of erroneously assigning a growth defect is 0.0037. Thus, growth defects observed in both replicates agree well with published results and are predicted to be highly accurate.
Phenotype profiles define functional classes
To test the hypothesis that grouping genes by common phenotype profile can be used to discover a set of genetically defined functional classes, we compared our results to independent data types. One method of determining the functional coherence of a group of genes is to measure the enrichment of independently derived functional categories (
Tavazoie et al, 1999). We assessed the degree to which our clustering methods grouped genes of common function by testing the statistical significance of the overlap between our clusters and members of the Gene Ontology (GO) functional categories (
Ashburner et al, 2000).
Phenotype profile clusters derived from the low-pleiotropy mutants showed statistically significant enrichment for a number of GO functional categories (). Some examples of well-characterized conditions and functions identified by this analysis include enrichment for galactose metabolism in the ‘galactose only' cluster (
P=3.8 × 10
−18), response to DNA damage in the ‘UV only' cluster (
P=1.8 × 10
−17), and cellular respiration in the glycerol and lactate cluster (
P=2.1 × 10
−18). For less well-characterized combinations of conditions, functional enrichment results offer insights into the manner in which the cell responds to these perturbations. Such results identified in this study include the enrichment transcription from RNA polymerase II (Pol II) promoters (
P=6.7 × 10
−4) in the calcium and cycloheximide cluster and enrichment of cell cycle regulation (
P=1.2 × 10
−3) in the caffeine and rapamycin cluster. Another set of clusters that offers potential for the discovery of new cellular functions is the set of clusters with no significant enrichment for any of the GO functional categories (
Supplementary Figure 2). An interesting example is the cluster defined by a ‘cycloheximide only' phenotype, which contains 25 genes including eight of unknown function.
Biclustering the set of highly pleiotropic genes produced groups with more complex phenotype profiles (), but with equally specific functional enrichments as the gene sets constructed from low-pleiotropy mutants. Consistent with recently published results (
Parsons et al, 2004), many of the clusters that include conditions with drugs added to the media are enriched for Golgi, vacuole, and intracellular transport functions. In fact our entire set of highly pleiotropic genes is significantly enriched for genes annotated with a vacuolar organization and biogenesis in the GO database (
P=7 × 10
−19 by hypergeometric distribution). In addition to its role in intracellular protein transport and degradation, the yeast vacuole serves to maintain intracellular pH through the transport of hydrogen and other cations (
Jones et al, 1997). Several biclusters were enriched for this function exclusively ( and
Supplementary Figure 3). Within the set of highly pleiotropic genes, we also identified clusters enriched for functions unrelated to the vacuole and intracellular transport. One large class involved functions related to transcription by RNA Pol II, with several clusters enriched for transcriptional categories exclusively ( and
Supplementary Figure 4). Other functional categories included sporulation, ergosterol biosynthesis, phosphate metabolism, and DNA replication. Thus, similar to the grouping of genes required for growth in only a single condition, our biclustering of highly pleitropic genes was able to provide further information about general responses such as multidrug resistance and identify more specific responses that may be obscured by these large, general effects.
The functional enrichment results ( and
Supplementary information) also support the hypothesis that additional functions can be discovered for a group of genes that share one phenotype, by further clustering these members with respect to their phenotype profiles across many conditions. For example, the combination of sensitivity to benomyl, cycloheximide, hydroxyurea, and hygromycin B in cluster 1 () groups genes enriched for two functional categories, transcription from RNA Pol II promoters (
P=1.6 × 10
−5) and RNA elongation from Pol II promoters (
P=2.7 × 10
−5). In contrast, clusters derived from profiles containing any of these phenotypes individually () show enrichment for categories distinct from those of cluster 1 and from each other: the ‘benomyl only' cluster is enriched for functions related to the mitotic cell cycle and microtubule organization; the ‘hydroxyurea only' cluster is enriched for functions related to DNA recombination and repair; the ‘hygromycin B only' cluster is enriched for functions related to Golgi and vesicle transport; and the ‘cycloheximide only' cluster does not show significant enrichment for any GO functional category. Thus, clustering mutants with a wide range of pleiotropies by phenotype profile successfully groups genes with common biological functions.
The fact that both condition-specific and highly pleiotropic genes can be grouped by common phenotype profiles into gene sets that show significant enrichment for known biological processes suggests that such a method can be used to identify such functional classes de novo. To test this hypothesis further, we compared the results of our phenotypic clustering to other genetic and biochemical methods of assessing common gene function. These include synthetic lethal interactions, membership within the same protein complex, and associations between members of different protein complexes.
For example, bicluster 26 contains components of three large, multiprotein complexes, SAGA, Swi/Snf, and Ino80 (). We hypothesized that these complexes, and more specifically these complex members, share functions required under the environmental conditions associated with bicluster 26 (cadmium, cycloheximide, hydroxyurea, and glycerol). This assertion is supported by several lines of genetic and biochemical evidence. First, these complexes are known to have similar biochemical activities, modifying chromatin structure to facilitate transcriptional activation. In addition, genetic data, including synthetic lethal interactions, have suggested common functions for several members of bicluster 26. Synthetic lethal interactions between SAGA components (including
spt20) and Swi/Snf components (including
snf2) were used to suggest common, parallel functions of those complexes (
Roberts and Winston, 1997). Synthetic lethal interactions have also been reported between other members of cluster 26, including
spt20–
swi4 (Dror and Winston, unpublished results) and
swi4–
rsv161 (
Tong et al, 2004). Thus, the common phenotype profile shared by members of bicluster 26 can be used to group together genes that share common functions as defined by other forms of genetic and biochemical evidence.
To compare our phenotypically defined functional classifications with other genetic and biochemical data in a more comprehensive manner, we examined our data in relation to protein complexes cataloged from the literature in the MIPS database (
Mewes et al, 2004), complexes identified by TAP purification and mass spectrometry (
Gavin et al, 2002), and synthetic lethal data available in the GRID database (
Breitkreutz et al, 2003). Of the 266 complexes annotated in MIPS, 107 displayed a growth defect in at least one of our conditions, with 14 of these also containing synthetic lethal interactions between protein complex members. Similarly, 132 of the 232 protein complexes described by Gavin
et al contained members with growth defects and 23 of these also contained members with synthetic lethal interactions. To visualize the results of this analysis, we graphed all genetic interactions (both membership in the same phenotypic cluster and synthetic lethality) observed within or between protein complex members (Materials and methods and
Supplementary information).
shows a sample result from this analysis, interactions defined using the common phenotype profile data for Gavin complex 113 (the Paf1/Cdc73 transcriptional elongation complex) and complex 137 (the Sap30 histone deacetylase complex). As expected, several members of the same complex, for example, Paf1 and Cdc73, have common phenotypic profiles, suggesting that these components share functions similar enough to produce a common effect across a large number of conditions. This analysis also highlights the fact that groups of proteins within a complex may belong to different phenotypic classes, for example, the Cti6–Sap30–Ume1 and Dep1–Pho23 groups, suggesting that the complexes also contain distinct groups of functions required under different sets of conditions. Interestingly, these results are complemented by synthetic lethal interactions (), which make distinct predictions about protein functions within and between complexes. For example, the cdc73–leo1 and cdc73–rtf1 synthetic lethal interactions support the hypothesis that Cdc73 has functions distinct from and parallel to those of Leo1 and Rtf1. In addition, cdc73 synthetic lethal interactions with members of the Sap30 complex, sap30, dep1, and pho23, suggest that components of these two complexes share common (parallel) functions. These results support the functional classes defined by phenotype cluster membership and underscore the value of both types of large-scale genetic analyses.
To assess the overlap between common phenotype and protein complex membership more quantitatively, we developed a simple measure of phenotype similarity between members of the same protein complex. Briefly, we measured the similarity of phenotypes by calculating the average distance between the phenotype profiles of all pairs of subunits within that complex (Materials and methods). Results for the 52 MIPS complexes with two or more members displaying phenotypes in our data set demonstrate that complexes span the range of similarity from homogeneous to heterogeneous, with two-thirds of the complexes scoring in the range of greater phenotype similarity (score >0.5) (). These results are in sharp contrast to a randomly generated distribution, which is biased toward greater phenotypic heterogeneity. The fact that well-characterized multiprotein complexes contain members with a greater degree of phenotype similarity than would be predicted by chance provides evidence for the relationship between common phenotype and functional prediction at the level of protein–protein interaction. These results strengthen our assertion that phenotype profiles are suitable for use as functional classifier.
Classifying pleiotropic gene functions
For a given pleiotropic gene, it is possible that all phenotypes observed result from the loss of a single function required under multiple conditions or that different sets of phenotypes result from the loss of separate functions, each required under different conditions. Conventional genetic analysis cannot distinguish between these two possibilities without identifying distinct mutant alleles that exhibit different subsets of phenotypes, demonstrating that the functions are genetically separable. Our phenotypically derived functional classes have the potential to provide such information from the analysis of a single mutant allele, such as the complete gene deletions examined in this study. In the theoretical example shown (), functional classes are assigned to each pleiotropic gene based on common phenotype profile. Genes belonging to a single profile cluster, for example, gene1, are hypothesized to carry out a single function under the conditions included in that profile, while genes with membership in multiple clusters, for example, gene3, are hypothesized to have multiple functions required under different subsets of conditions. shows an example from this study, the
snf1 protein kinase mutant. In our data set, the
snf1 mutant is assigned to two biclusters with partially overlapping sets of phenotypes. The hypothesis that these two biclusters define distinct functional classes is supported by the fact that these clusters contain different genes and are enriched for different GO functional categories (). Multiple functions of Snf1 are also consistent with information from the literature, demonstrating that the kinase can act interchangeably with any of three β-subunits (Sip1, Sip2, or Gal83) to target different substrates (
Schmidt and McCartney, 2000) and has been implicated in a number of diverse cellular processes, including response to glucose depletion (
Carlson, 1999), response to some genotoxic stresses (
Dubacq et al, 2004), and regulation of filamentation and invasive growth (
Cullen and Sprague, 2000;
Kuchin et al, 2002). Our observations on the functions of pleiotropic genes may be validated and refined with direct experiments to enhance our understanding of important biological processes in yeast.
To examine the degree to which our functional classifications divided the phenotypes of pleiotropic genes into separate sets of phenotypes, we graphed the number of biclusters per gene (). From this analysis, we find that 23% of the pleiotropic genes that could be assigned to a bicluster were assigned to only one functional classification, suggesting that all of the phenotypes associated with this mutant are associated with a single gene function. As more conditions are examined, it is possible that additional phenotypes will be added to this class of genes, producing one of two possible results. The addition of a new phenotype could divide the phenotypes assigned to a mutant into multiple functional categories by now assigning it to multiple biclusters. Alternatively, the gene may still remain in a single cluster defined by a larger number of phenotypes, suggesting a single functional classification. The remaining pleiotropic mutants were assigned between two and 15 functional classifications. The partial overlap between phenotypes associated with some of the biclusters () has two possible implications for the genes assigned with more than one function. One possibility is that these sets of conditions do in fact define multiple functions that are each required under multiple conditions, for example, both functions proposed for SNF1 may be required for growth in cadmium and caffeine (). Alternatively, some of these significantly overlapping clusters, while passing the statistical criteria for distinct clusters, may be biologically redundant and therefore not sufficient to define separate biological functions. The use of additional information, such as the enrichment for distinct functional categories (), may help to distinguish between these two classes.
Estimating the degree of pleiotropy in yeast
The availability of phenotype data generated under a large number of conditions also permits initial explorations of more global properties of the yeast genetic network, such as an estimation of the overall degree of pleiotropy in yeast. To assess the degree of pleiotropy in the set of 767 mutants that displayed a phenotype in at least one of our 21 conditions, we counted the number of phenotypes observed for each gene deletion. The results () show that most genes (~70%) that display growth defects under these conditions have a relatively low degree of pleiotropy, with phenotypes in only one or two conditions. To test the statistical significance of this amount of pleiotropy, we generated a random distribution of phenotypes per gene such that the same properties of the original data set, that is, the same frequency of growth defects in each of the 21 conditions, were maintained (Materials and methods). This random distribution () was significantly different from the experimental distribution by Kolmogorov–Smirnov goodness-of-fit test (P=9 × 10−70), with double the percentage of genes assigned only a single phenotype and a maximum of six phenotypes per gene. Thus, the genes with phenotypes in this data set appear to have significantly more pleiotropy than would be predicted by chance.
While the analysis based on the data collected in this study provides an initial estimate of the degree of pleiotropy in yeast, there are several other factors that could influence these results. One factor that could artificially inflate the difference observed between the experimental and random data sets is biological dependency between conditions. To address this issue, we repeated the analysis with a subset of conditions that are significantly different from each other, that is, conditions with relatively few genes in common, and found a similar difference between the experimental and random distributions (
Supplementary Figures 5 and
6). Other factors that may affect our estimate for the degree of pleiotropy are limited coverage of the phenotype space and the reported aneuploidy and secondary mutations present in the mutant collection (
Hughes et al, 2000b;
Bianchi et al, 2001). We expect that as more phenotype data are generated, possibly with cleaner mutant libraries, our estimations may be revised.