Modeling interaction degree in the S. cerevisiae genetic interaction network
Highly connected genes in the S. cerevisiae
genetic interaction network are often associated with slow-growing single mutants, protein products with disordered structure, gene pleiotropy as indicated by multiple Gene Ontology (GO) annotations, high connectivity in the physical interaction network, slower rates of evolution, and low expression variation (Figure ; Materials and methods) [5
], as well as a number of other sequence- and experimental-based gene features (Table ). We reasoned that these correlations could serve as the basis for predictive modeling of interaction degree, enabling the prediction of interaction degrees for genes that have not yet been screened.
Figure 1 Physiological and evolutionary gene features are predictive of genetic interaction degree. (a) Gene features are significantly correlated with negative genetic interaction degree. We measured the Pearson correlation coefficients between gene feature values (more ...)
Pearson correlations between features and negative genetic interaction degree in S. pombe (pom) and S. cerevisiae (cer) are observed to be significant in many cases
To this end, we applied a regression tree approach to model combinations of 16 gene features that are predictive of negative genetic interaction degree (Figure ). Regression trees are built by repeatedly splitting sets of training genes, according to the values of gene features, until genes are sorted into small sets that each contain genes with similar genetic interaction degrees. The hierarchy of gene sets produced is visualized as a binary tree and the final sets of genes are each associated with linear regression models that assign predictions to query genes (Figure ). Bootstrapped subsets of the training data were used to build an ensemble of regression trees; this use of multiple models, bootstrap aggregation, is a typical method for building a robust predictive model [23
] (Materials and methods).
To validate our approach, we used our model to predict negative genetic interaction degree for all genes in the S. cerevisiae genetic interaction network (Figure ; Materials and methods). A high correlation (r = 0.80, P < 10-324) was observed between predicted and actual genetic interaction degrees of genes not used in training the models, indicating that our model accurately reflects topological features of the S. cerevisiae genetic interaction network (Figure ). A strength of this type of model, in addition to providing degree predictions for previously unseen genes, is that the learned tree structures highlight rules consisting of combinations of gene features that explain variation in degree (Figure ).
Predicting genetic interaction degree in a distantly related species
If the rules governing genetic network topology are conserved, then a model based on S. cerevisiae
gene features should be predictive of genetic interaction degree in other organisms. To test this, we examined the same gene features of S. pombe
genes that we found to be predictive of S. cerevisiae
interaction degree, including a quantitative measurement of single mutant fitness defects across the genome (Materials and methods). Surprisingly, comparative analysis of the various features between pairs of orthologs revealed that a number of non-sequence-based features are only modestly conserved between the two yeast species [24
] (Figure ; Materials and methods). For example, we found a significant but relatively weak correlation in single mutant fitness (Pearson's r = 0.20, P
) between 1,100 one-to-one orthologous gene pairs for which we could derive fitness measurements in both yeasts. The lack of strong conservation of deletion mutant fitness is somewhat surprising given that approximately 80% of S. pombe
orthologs of S. cerevisiae
essential genes have conserved essentiality [18
]. Thus, while S. cerevisiae
and S. pombe
share a common set of genes that are indispensable for viability, our findings suggest that the severity of fitness defects imposed by the deletion of orthologous non-essential genes for growth under standard laboratory conditions is not well conserved. Other gene properties, including protein-protein interaction degree, dN/dS, and multifunctionality, exhibit a similar lack of conservation (Figure ).
Figure 2 Cross-species analysis of the predictive model for genetic interactions. (a) Pearson correlations between one-to-one S. cerevisiae and S. pombe orthologs for their values of gene features. Note that a number of features are sequence-based and are thus (more ...)
Despite the low conservation of single mutant fitness and the varying correlations between individual gene properties for orthologs, we found that relationships between S. pombe
gene features and genetic interaction degree were strikingly similar to those observed in S. cerevisiae
(Figure , Table ). Consistent with S. cerevisiae
trends (Figure , Table ), fitness defect was the strongest predictor of S. pombe
genetic interaction degree. That is, S. pombe
strains with severe fitness defects often exhibit high numbers of genetic interactions. The observed trends suggested that in addition to correlation with individual gene features, higher-level combinations of features that are predictive of connectivity in the S. cerevisiae
genetic interaction network [5
] (Figure ) may also be informative of S. pombe
genetic interaction degree.
To test this hypothesis, we built a predictive model relating the combination of available gene features to genetic interaction degree in S. cerevisiae
and then applied the resulting model to predict genetic interaction degree in S. pombe
(Materials and methods). Interestingly, we observed significant correlation (r = 0.51, P
) between interaction degree predicted by our model and the number of interactions previously determined [10
] for 548 S. pombe
genes (Figure , left side, light blue bar).
Our ability to predict genetic interaction degree from a small set of gene-specific properties is evidence that rules governing genetic interaction network topology are conserved across a large evolutionary distance (Figure ). Importantly, there is no significant decrease in correlation between predicted and actual interaction degree when predictions were restricted to genes unique to S. pombe (Figure , left side, dark blue bar), indicating that the model performs equally well when applied to genes lacking orthologs in the species used to learn relationships in the model.
As a baseline comparison for our cross-species predictive model, we built a model from S. pombe
gene features and genetic interaction degrees instead of from S. cerevisiae
data. Within-species predictions for S. pombe
interaction degrees are not significantly more accurate than predictions made by the cross-species model (Figure , left side, compare red and light blue bars). We also note that although a simplistic predictor that maps the degree of a S. cerevisiae
gene directly to its S. pombe
ortholog provides reasonable performance (Pearson correlation approximately 0.4), this strategy is out-performed by our cross-species model and is limited to conserved genes. Strikingly, the models trained on S. pombe
interactions and features were also able to predict interaction degree in the S. cerevisiae
network with high accuracy (Figure , right side, compare red and light blue bars). In general, interaction degree predictions for S. pombe
genes were weaker than S. cerevisiae
interaction degree predictions, which may reflect the limited functional diversity of available S. pombe
genetic interaction studies [9
]. Nonetheless, the ability to predict interaction degree using features measured in either yeast species is evidence that relationships between genetic interactions and fundamental physiological and evolutionary properties are generally conserved.
The strong correlation between single mutant fitness defect and negative genetic interaction degree has the unsurprising consequence that the models are considerably influenced by this feature. To explore the reliance of our model on fitness defect, we constructed two types of bootstrapped regression tree models that were trained on all features except fitness defect. The first of these additional models is trained to predict negative genetic interaction degrees and is able to successfully make both within- and cross-species predictions (Figure S1 in Additional file 1
). The second model was trained to predict the residual negative genetic interaction degree that remained after subtracting degree predictions made from a regression tree model that was trained on the single feature single mutant fitness defect. The prediction of these residuals by the remaining features was also significant (Figure S2 in Additional file 1
). We therefore consider the inclusion of many other features to be a worthwhile part of our model, since they capture aspects of genetic interaction degree that fitness defect alone does not.
Identifying network rewiring suggested by cross-species predictions
Although many individual genes are conserved, yeast genetic interaction networks may have undergone substantial rewiring, as only approximately 30% of the interactions are conserved [9
]. Similarly, a low conservation of genetic interactions has also been observed between S. cerevisiae
and C. elegans
]. To examine the extent of network rewiring, we first inferred interaction degrees for the entire S. pombe
genome using our cross-species model. Because the predictions do not depend on sequence orthologs (Figure ), they can be used to compare the topologies of the S. cerevisiae
and S. pombe
networks even though only a small fraction of the S. pombe
network has been screened.
We found several instances where the predicted interaction degree for a given S. pombe
gene was quite different from the observed degree of its S. cerevisiae
ortholog, suggesting that the gene acquired or lost interactions differentially as the species diverged. To identify larger functional modules that were targets of this rewiring, we grouped functionally related genes according to a catalog of 65 annotated protein complexes [6
] and 545 GO biological process annotations [26
] (Materials and methods), and compared the median interaction degrees determined for orthologous protein complexes and functional groups (Figure ; Figure S3 in Additional file 1
). Many groups of functionally related genes and several complexes were statistically indistinguishable in terms of network connectivity, indicating that these modules act either as network hubs in both species or non-hubs in both species.
Figure 4 Global analysis of rewiring based on whole-genome predictions in S. pombe. (a) Points in the scatter plot each represent groups of between 2 and 22 genes whose protein products are in the same protein complex (Materials and methods). Darker color represents (more ...)
However, we also identified many examples of possible rewiring, in which a significant difference in network connectivity, observed in S. cerevisiae
and inferred in S. pombe
, was found for orthologous modules (Figure ; Figure S3 in Additional file 1
; Materials and methods). These predicted-rewired groups represent complexes or biological processes that may have evolutionarily diverged in terms of their importance in the genetic interaction network, acting as hubs in one species but not in the other. In particular, we found that 11 of 65 (17%) protein complexes and 44 of 545 (8%) GO biological processes may have undergone significant rewiring (Figure ; Figure S3 in Additional file 1
) at a level of significance expected to identify only 3 and 27 (5%) rewired modules, respectively. For example, components of the dynactin complex are hub genes in the S. cerevisiae
genetic interaction network (complex average of 85th percentile; Figure ) whereas the orthologous genes were predicted to exhibit average connectivity in the S. pombe
genetic interaction network (complex average of approximately 50th percentile; Figure ). Dynactin, a multi-subunit protein complex known for interacting with dynein and enabling long-range movement along microtubules (reviewed in [27
]), has been implicated in a S. cerevisiae
cell cycle checkpoint pathway that arrests cell cycle progression in response to perturbations in cell wall synthesis [28
]. A similar checkpoint has not been reported in S. pombe
, suggesting that the difference in the number of genetic interactions observed across species may reflect a dynactin-specific role in monitoring S. cerevisiae
cell wall integrity.
In addition to S. cerevisiae
-specific genetic interaction hubs, we also identified gene groups predicted to be hubs in the S. pombe
but not observed as such in the S. cerevisiae
genetic network. One such case is the calcineurin-associated protein complex (Figure ). A difference in network connectivity might reflect a unique role for calcineurin in the regulation of bi-polar growth activation in S. pombe
]. Unlike an S. cerevisiae
cell, which grows predominantly via an actin-dependent budding mechanism, an S. pombe
cell grows in a highly polarized bi-polar manner from its two ends. Following cell division, cell growth is initiated from the old end first, and later, after completion of S phase, from the newer end that forms at the site of cell septation (referred to as new end take off, or NETO). Calcineurin has been shown to play an important role in the delay of NETO by directly dephosphorylating critical targets involved in microtubule dynamics at the site of cell growth. This mechanism is dependent on activation of Cds1 kinase, best known for its role in the intra-S phase DNA replication checkpoint [30
]. A connection between the intra-S phase checkpoint and inhibition of bipolar growth activation is so far unique to S. pombe
and distinct from the checkpoint controls operating in S. cerevisiae
. Additionally, calcineurin is dispensable for growth in S. cerevisiae
]; in S. pombe
, its deletion leads to defects in cell growth, cytokinesis, cell polarity, mating, and spindle pole body positioning, which are widespread effects consistent with its hub-like activity [32
While our method of identifying rewired modules reports several statistically significant differences, we note two caveats in interpreting these results. First, since degrees of genes within functional modules may be systematically poorly predicted, our procedure may incorrectly identify modules as significantly rewired in cases where our test statistic would also have indicated that the within-species difference between predicted and observed degree was significant. Therefore, as a control, a version of this rewiring experiment that compares observed and predicted S. cerevisiae
degrees will enable identification of cases that do not reflect true cross-species rewiring (Figure S4a, b in Additional file 1
). Second, due to variations in the experimental protocol for measuring genetic interactions, there are differences in the media on which fitness defects were measured in S. cerevisiae
and S. pombe
, which may also contribute to apparent rewiring [33
Functional properties of genes can be captured by many types of biological networks, so we turned to an independent dataset for confirmation of our rewiring predictions. To enable a comparative analysis of gene expression profiles across the two yeasts, we constructed a species-specific S. pombe
co-expression network using a previously published approach [34
] and large collections of publicly available expression data (Materials and methods), and obtained a previously published S. cerevisiae
]. Each species' network contains 832 genes that are one-to-one orthologs between the two yeasts and connected genes are those pairs that have high co-expression values surpassing a threshold of the 95th percentile. At our selected density of 0.05, there are approximately 17,000 edges in each network. In general, we found evidence of conservation between the S. cerevisiae
and S. pombe
networks: co-expression edges between two genes occurred in both networks for 9.2% of the gene pairs that were co-expressed in at least one network. This is about twice the background conservation rate of approximately 4.3%, as determined through comparison to a randomized network produced by a degree-preserving procedure.
To explore the connection between genes predicted to be rewired in the genetic interaction networks and differences between the co-expression networks, rewiring predictions were overlaid on the co-expression networks. Specifically, all non-essential one-to-one orthologs were classified as either rewired or non-rewired based on our prediction of genetic interaction degree (Figure ). Using this rewiring labeling, we measured the conservation rate of three types of co-expression edges: co-expression edges connecting two non-rewired genes, connecting two rewired genes, and connecting rewired and non-rewired genes.
We found that co-expression edges involving predicted rewired genes are consistently less conserved than edges with exclusively non-rewired endpoints (Figure ), a trend that is robust over different co-expression thresholds used for network sparsification (Figure S5 in Additional file 1
). For example, when genes whose degrees differ by 55 interactions or more are considered rewired, 6.9% of the co-expression relationships connecting rewired genes are conserved (107 of 1,659), in contrast to the significantly higher 10.1% of co-expression relationships that are conserved between non-rewired genes (1,238 of 12,472, Fisher's exact test P
). This trend grows stronger when considering genes that were predicted to have even larger differences between S. pombe
and S. cerevisiae
. This analysis independently confirms predictions of highly rewired genes between the two species and suggests that changes at the level of gene expression regulation are at least one mechanistic factor that contributes to these differences.