Synthetic lethal interaction provides evidence for compensating gene function. This compensation has been rationalized as buffering within a single pathway, or buffering between two parallel or compensating pathways (Tong et al, 2001
; Wong et al, 2004
). We find that the parallel pathway model permits successful inference of protein complex membership from synthetic lethal data. The parallel pathway model, but not the single pathway model, yields successful predictions for phenotypes including nuclear migration defect rates and drug sensitivity. The parallel pathway model is also consistent with known pathways comprising genes identified in synthetic lethal screens. The model motivated our confirmation of YLL049W
as participating in the dynein–dynactin nuclear migration pathway by phenotypic analysis, permitted identification of benomyl-sensitive strains based on congruence to landmark genes, and yielded a novel prediction of Sin3/Rpd3 histone deacetylase as a new module for mitotic exit that acts in parallel with MEN.
Using a different analysis strategy, Kelley and Ideker (2005)
recently reported that synthetic lethal interactions are typically ‘between pathway', whereas ‘within-pathway' interactions occur infrequently. For their purposes, all subsets of proteins that are densely connected by physical interactions in non-mutant cells were considered ‘within pathway'. If a pathway is defined strictly by its components, however, the view that null allele synthetic lethality must always occur between parallel pathways can be enforced, precluding ‘within-pathway' explanations. In such a view, members of a protein complex that functions in the absence of either of two subunits, but not both, would participate in three parallel pathways: one that includes all possible components, and one for each ‘incomplete' complex (all of which might function in non-mutant cells). More generally, methods that summarize synthetic lethal relationships are often more useful than raw synthetic lethal pairs.
This recent analysis also predicted that Yll049wp associates with dynactin during spindle orientation (Kelley and Ideker, 2005
), consistent with our observation from congruence analysis that YLL049W
is functionally related to dynein–dynactin pathway. Our characterization includes experimental validations that support the prediction, and provides evidence from congruence score and detailed phenotype that the function of YLL049W
is more similar to JNM1
. Confirmation of a physical interaction between YLL049W
further suggests that the prediction will be useful in future detailed analysis of the molecular role of YLL049W
The congruence score metric compares favorably with other methods for inferring functional associations from synthetic lethal data. First, it produces stronger inference of gene function than the underlying direct genetic interactions. For example, direct interactions are unable to predict benomyl sensitivity, whereas congruence is a strong predictor of similar sensitivity. Second, the congruence metric naturally provides a P-value and can give improved performance relative to the raw count of the number of shared interaction partners. Finally, the P-values provided by the congruence score can provide an advantage over methods such as hierarchical clustering, which continue to depend on visual inspection of clusters and definition of cluster boundaries.
The quantitative characteristic of each congruent pair interaction can be used to consider interactions above a given threshold, allowing experimentalists to consider which network features reflect the most significant evidence in the data set, and to include less significant observations to be evaluated when desired. Importantly, a congruence summary at any significance level quantitatively relates genes according to their functional similarity by interaction profiles, not individual synthetic lethal pairings. To identify congruent gene pairs with greater or lesser significance, the interaction linkages can be annotated, or the map can be redrawn at differing congruence cutoff scores. For example, Supplementary Figure S8
, , and Supplementary Figure S9
are all target gene congruence network by setting congruence score
15, respectively. This aspect of network analysis will become increasingly important as the information summarized within it grows. Some biologically important relationships may inherently be present in the genetic congruence network only at relatively low significance overall. These can be viewed by extracting a local network containing first-degree congruence relationships in much the same way as the current large-scale interaction network is commonly viewed in subsections (Tong et al, 2001
; Ooi et al, 2003
A possible limitation of our analysis is the low coverage of the synthetic lethal network, with only ~2% screened by high-throughput methods using query genes selected on the basis of specific biological themes (Tong et al, 2004
). To assess the sensitivity of our analysis to missing data, and also to possible false positives, we repeated our analysis with data sets modified to contain up to 30% false positives (random interactions added to the data) and 30% false negatives (observed interactions removed from the data) (Supplementary Figure S10
). Note that the false-positive rate is quite low for the SGA data owing to confirmation by tetrad or random spore analysis; false negatives are estimated in the range of 17–41% (Tong et al, 2004
). Although the congruence scores shift to lower values, the overall performance is similar to using the original data set (compare and Supplementary Figure S10
). These observations suggest that the congruence score method is robust to noisy and incomplete data.
Continuing genetic interaction screens will generate increasing volumes of data. A critical challenge is to develop computational approach to integrating these data and eventually understanding gene function. Several hurdles will need to be surmounted. Essential genes are missing from the synthetic lethal network, although they may be probed eventually using non-null mutant alleles. Certain higher-order redundancy processes may also require more than two-gene deletion to be observed. The most promising approach to ease the limitations may be to combine different types of networks for improved inference. We have performed joint analysis on genetic network and physical network to argue that the correct functional links between genes should be orthogonal to the synthetic lethal interaction (see Supplementary information
). Future studies by combining other types of heterogeneous network data, such as gene expression and phylogenetic information, will certainly improve our inference of biological systems.
This work in budding yeast, made possible by the development of the comprehensive deletion collection, massively parallel phenotyping techniques, and quantitative analysis of synthetic lethal interaction data within a statistical framework, will create a template for testing and improving our understanding of biological buffering and genetic robustness in many systems as researchers gather similar information data sets from other organisms. Genome-wide synthetic lethality screens using RNAi are becoming available in other organisms (van Haaften et al, 2004
) and may eventually allow analysis similar to the one we have performed in yeast. Full-genome RNAi screens have been conducted for Caenorhabditis elegans
and Drosophila melanogaster
(Kamath et al, 2003
; Boutros et al, 2004
), and genome-wide screens in other metazoans are in progress. In instances where RNAi knockdown is complete, the congruence score method should provide a quantitative metric for shared gene function through calculating the probability of a gene pair sharing phenotypic defects in the RNAi screens. Therefore, the methodology we have applied to predict gene functions from yeast genomic synthetic lethality can be certainly extended to analogous RNAi screens for the discovery of novel gene tasks in higher organisms.