Currently, most genome-wide datasets, including expression, protein-protein and synthetic genetic interaction data, have been extensively analyzed to help illuminate cell function. Data continue to be generated, which adds predictive power to these large-scale approaches. In this study, we present the first large-scale, systematic analysis of co-fitness, highlighting its novelty and implications for functional genomics. Specifically, these studies: quantified the ability of co-fitness (the correlation of fitness profiles of all genes across all drugs) to predict the functions of genes not evident in other large-scale assays; quantified the degree to which co-inhibition (the correlation of fitness profiles of all drugs across all genes) correlates with both chemical structure and therapeutic action; and demonstrated that a machine-learning model derived from these data predicts drug-target interactions.
We first showed that, overall, co-fitness data identify gene function better than co-expression data but not as well as the physical interaction dataset when compared to a gold standard [13
]. When we examined the predictability for specific functions, co-fitness predicts certain functions much better than other large-scale datasets. These functions (underrepresented in other large-scale datasets) include amino acid and lipid metabolism, meiosis, and signal transduction (Figure ; Supplementary Figure 2 in Additional file 1
). This interesting finding suggests different biological processes are better suited to different genome-wide approaches. The fact that signal transduction is predicted relatively well by co-fitness, for example, may be explained by the fact that signal transduction is often a rapid response occurring on the order of milliseconds, a time frame too short to allow expression and translation of required proteins [53
]. It is not surprising, therefore, that co-expression performed poorly in this regard. Functions for which co-fitness performed more poorly than either expression or protein-protein interaction data include ribosome biogenesis, cellular respiration and carbohydrate metabolism. This result may be due to a high degree of redundancy of these functions or because these functions are not involved in the response to drug perturbation.
Two other findings arose from the functional analysis. First, duplicated genes were co-fit with their duplicate partners and the degree of co-fitness for this set of genes was independent of their sequence similarity. This finding supports the hypothesis of partial, rather than strict, redundancy [35
]. Second, we demonstrated the prevalence of conditionally essential complexes, suggesting that essentiality is often a property of complexes rather than individual genes [37
We also provide a first systematic analysis of co-inhibition, and show that we can identify both structural and therapeutic relationships between compounds. While the correlation of co-inhibition to co-structure was significant, it was not very high. This may be due, in part, to the fact that our library was chosen for maximum diversity. The correlation of co-inhibition to therapeutic use was somewhat surprising because the therapeutic classes of the compounds reflect their human use while the co-inhibition results are based on yeast fitness measurements. The correlation between co-inhibition and therapeutic use might, in fact, be an underestimate because our current analysis is limited by the quality and quantity of the therapeutic data available. Our representations of chemical structure and drug therapeutic use rely on public databases, which will undoubtedly improve over time.
Importantly, we showed that fitness profiling can help to identify the most likely target of a given compound from a candidate group of sensitive yeast deletion strains. Traditional drug discovery efforts often focus on the activity of a purified protein target in isolation. These in vitro
approaches are useful for maximizing the potency of a given inhibitor, but invariably ignore factors critical for understanding drug action, including cell permeability and the potential interaction/inhibition of other proteins in a cellular context. In vivo
chemical genomic assays address these limitations, and can provide a more comprehensive view of drug-protein interactions. Such results can play an invaluable role in understanding and predicting a compound's clinical effects and in guiding its use, including predicting secondary, unwanted drug targets. New methods for target identification are of enormous value because the coverage of current methods is limited. Traditional computational approaches to drug-target prediction require three-dimensional structure of the protein to predict binding, often by 'docking' the ligand into the binding pocket of the protein [16
]. The success of these methods to date has been variable, with some studies able to predict known interactions with significant enrichment, and others performing worse than random [55
]. These methods are also limited to those proteins that have solved three-dimensional structures. Other computational methods utilize protein sequence rather than chemical structure, but these methods are only applicable to individual proteins or a small subset of proteins that possess a high degree of similarity [58
]. We compared our results to a sequence-based method, testing our gold standard against the interaction model built by [58
], but the model was unable to make predictions about any of these known interactions, presumably due to the lack of sequence similarity to the available training sets.
Thus, new sources of data and accompanying computational methods can be of significant value. Our study of genome-wide fitness experiments suggests that fitness profiling offers a new, complementary approach to generate quantitative, testable predictions of drug target interactions, including predictions that may be outside the scope of previous computational approaches. Using this approach, we predicted both known and novel interactions, and provide independent experimental evidence for two novel interactions. Our algorithm predicted that the Exo84 protein interacts with nocodazole and that the Cox17 protein interacts with clozapine. Genetic gene-dose modulation experiments supported these findings. These genes, when overexpressed, rescued their respective drug-induced fitness defect in wild-type cells, providing independent experimental evidence of a predicted interaction.
The first validated prediction is the interaction of Exo84 with nocodazole. Exo84 is a subunit of the well-conserved exocyst complex, first identified for its role in the secretory pathway in Saccharomyces cerevisiae
]. The mammalian homolog is essential for development and participates in multiple biological processes, including vesicle targeting to the plasma membrane, protein translation, and filopodia extension [62
]. Filopodia are cytoplasmic projections that extend from the leading edge of migrating cells and are important for cellular motility. Like nocodazole, the exocyst complex inhibits tubulin polymerization in vitro
]. It is known that the microtubule-depolymerizer nocodazole distorts the filamentous localization of Exo84 in cultured mammalian cells [64
]. Furthermore, the exocyst localization is dependent on microtubules in normal rat kidney (NRK) cells, and the filamentous distribution of Exo84 (as well as two other exocyst subunits, Sec8 and Exo70) is disrupted by nocodazole. Accordingly, it is possible that in yeast, nocodazole treatment causes mislocalization of Exo84, preventing the protein from performing its essential role in the exocyst.
A second intriguing finding is our prediction of an interaction between clozapine and both yeast Cox17 and its human homolog. Clozapine's primary targets are thought to be neurotransmitter receptors, but the drug also alters cytochome C oxidase (COX) activity through an unknown mechanism [65
]. This COX alteration has been shown to be linked to clozapine's side effects [66
]. Our preliminary genetic data indicating a novel interaction between Cox17 and clozapine are tantalizing given the renewed interest in this drug [61
], and deserve further investigation.
Our statistical and experimental results demonstrate the ability of our novel algorithm to produce high-quality, testable hypotheses regarding drug-target interactions. Our model is, however, limited by the number of full-genome chemogenomic profiles obtained and will likely improve as we collect additional data with diverse compounds. Nonetheless, these results may shed new light on the mechanisms by which a drug exerts its primary or secondary effect. Combining this predictive method with other computational and experimental data sources should improve these predictions and expand the number of potential compound-protein pairs for subsequent testing. This technology can easily be implemented in a high-throughput manner and should have a positive impact on the early stages of drug discovery, both by identifying potential new drug targets and as a filter to prune less promising ones.