|Home | About | Journals | Submit | Contact Us | Français|
There is a wide gap between the generation of large-scale biological datasets and more detailed, structural and mechanistic studies. However, recent studies that explicitly combine data from systems and structural biological approaches are having a profound effect on our ability to predict how mutations and small molecules affect atomic-level mechanisms, disrupt systems-level networks and ultimately lead to changes in organismal fitness. In fact, we argue that a shared framework for analyzing non-additive genetic and thermodynamic responses to perturbations will accelerate the integration of reductionist and global approaches. A stronger bridge between these two areas will allow for a deeper and more complete understanding of complex biological phenomenon and ultimately provide needed breakthroughs in biomedical research.
As sequencing efforts reveal unprecedented levels of genetic diversity in populations, key challenges remain in linking heritable variation to organismal fitness (Lander, 2011). The functional effect of sequence changes are commonly predicted starting from the top-down (using a network from systems biology) or the bottom-up (using an atomic structure from structural biology). However, biologists now consciously (and unconsciously) straddle the line between global, systems and reductionist, structural approaches. Here, we review how the growing synergies between these two tactics are advancing our understanding of the mechanisms of phenotypic change. A theme emerging from both systems and structural biology is that multiple perturbations can function non-additively. We anticipate that further parallels between non-additive effects on both network and macromolecular dynamics will emerge as scientists working at the systems-to-structure interface probe complex biological phenomenon. Learning how mutations and small molecules affect both atomic interactions and network organizations will advance our abilities to predict the phenotypic responses to perturbations and to design new therapies that restore homeostasis.
A major goal of systems biology is to determine the abundance of each protein, nucleic acid, or metabolite component and all the interactions that exist between them. In protein-protein interaction networks, an edge represents more than just a construct of graph theory - a connection between nodes implies a direct atomic contact between proteins. Aside from using a co-crystal structure to provide the ultimate “true-positive” confirmation of an interaction, what can structural biology contribute to our knowledge of systems-level organization?
The number of interactions discovered by proteomics experiments (Havugimana et al., 2012) can dwarf the total number of protein structures available in the PDB (Rose et al., 2011). To bridge this gap, homology models can be used to approximate the interfaces of many interactions (Pieper et al., 2011; Zhang et al., 2012a). Using these principles, Kim and Gerstein developed a systems-level structural interaction network in yeast (Kim et al., 2006). The atomic-level details afforded by this structural network representation allowed them to distinguish interactions that can simultaneously assemble on a single receptor from those that are mutually exclusive (Figure 1A). For example, the GTPase Ras interacts with multiple proteins that compete for an overlapping interaction surface (Figure 1B). Only one of these proteins can therefore bind Ras at any given time (Wittinghofer and Vetter, 2011). In contrast, macromolecular machines often assemble many interaction partners simultaneously. Indeed, the SCF Ubiquitin Ligase complex must assemble Rbx1, Skp1 and a member of the F-box protein family along a Cullin scaffold to properly ubiquitinate target proteins (Zheng et al., 2002). Systematically characterizing overlapping and simultaneous binding within a protein-protein interaction network has resolved long-standing ambiguities about the relationship between a protein's evolutionary rate and its degree (i.e. number of interaction partners), by establishing a central role for the amount of accessible surface area buried upon protein-protein association (Kim et al., 2006). This approach can also be inverted: structural information can be leveraged to predict new protein-protein interactions. Remote structural alignments that consider geometric relationships between interacting proteins have recently been used to predict protein-protein interactions on a proteomic-scale (Zhang et al., 2012a).
Similarly, structural insights can be used to assess the evolution and functional role of posttranslational modifications, such as phosphorylation, ubiquitination, and acetylation (Beltrao et al., 2012). The increased sensitivity of mass spectrometry to identify posttranslational modifications in proteomic samples has been a blessing and a curse: while new hypotheses can be generated based on the identification of modified sites, the functional significance of the vast majority of these modifications remains unknown. By examining posttranslational modifications in a structural context, Beltrao et al. found phosphosites located at protein-protein interaction interfaces are more highly conserved (Beltrao et al., 2012). Additionally, this study proposed and tested the hypothesis that posttranslational hotspots for controlling protein function are structurally conserved across family members. However, these data suggest that most post-translational modifications likely have no biological role. They proposed the near-neutrality of modifications probably increases the evolvability of interaction networks. Structural principles, therefore, can be applied to proteomic studies to pinpoint only those modifications that are highly conserved and likely to be functional.
The recent construction of a human-viral structural interaction network revealed an exception to the principle that highly conserved surfaces are the most interesting regions (Franzosa and Xia, 2011). Consistent with the earlier work, these studies confirmed that interfaces involved in host protein-protein interactions evolve more slowly. However, interfaces exploited by pathogens tend to evolve much faster (Figure 2A). This result provides structural rationale for an evolutionary arms race between host and pathogen: there is selective pressure on host interfaces to evolve incompatibilities with their pathogen interaction partners. For example, the human kinase PKR has evolved under intense positive selection to avoid interactions with viral proteins such as poxvirus K3L, while maintaining interactions with endogenous interaction partners such as eIF2alpha (Elde et al., 2009) (Figure 2B). In the case of PKR, co-evolving mutations on multiple structural elements are likely required to decrease the affinity of binding the pathogenic protein while maintaining the proper connections to the host protein-protein interaction network.
The idea that co-evolution of interface residues across interacting proteins plays an important role in the evolution of network structure has also been tested in model systems. Using the yeast proliferating cell nuclear antigen (PCNA) DNA clamp as a model for studying protein-protein interaction network evolution, Aharoni and colleagues discovered that disrupting the delicate balance of affinities between different interaction partners can yield more severe phenotypes than gene deletions (Fridman et al., 2010). These studies suggest that large-scale genetic interaction studies, using deletion libraries or RNAi screens, should be supplemented with targeted mutations on specific components to tease apart the roles of multifunctional proteins. Because these mutations can, in principle, affect only one edge of a protein-protein interaction network, they have been termed “edgetic” mutations to contrast from gene deletions, which remove all edges associated with a given node (Amberg et al., 1995; Charloteaux et al., 2011). Similarly, structurally guided mutants could be used to dissect the roles of multifunctional macromolecular machines, such as RNA polymerase II (Braberg et al, submitted), which interact with different protein partners during complex functional cycles.
Mutations that tune the affinity of specific interactions or introduce new, unwanted interactions may have more severe phenotypic consequences than can be accessed with traditional gene knockouts. Mapping human disease-associated mutations onto structures has provided a structural rationale for pleiotropic genes, where distinct mutations are implicated in different diseases (Wang et al., 2012a). Mutations on a single interface are more commonly associated with a single disease, whereas mutations at different interfaces within the same protein tend to be associated with distinct diseases. Furthermore, mutations at interfaces with known disease implications provide a guilt-by-association indication for new genes, which may be more therapeutically tractable. For proteins with multiple domains or protein-protein interaction interfaces, these analyses can greatly narrow the scope of candidate genes to those most likely to be functionally relevant for the disease of interest. As both structural coverage of the human proteome and disease associated mutations discovered by exome sequencing studies increase, applying structural insights will likely increase the leverage of network-level observations to understand the genetic basis for disease (Aloy and Russell, 2003; Wang and Moult, 2001).
As structural biology shifts increasingly to characterizing complex, heterogeneous macromolecular assemblies, datasets generated from unbiased, global systems biology studies are playing prominent role. The influence of systems biology has already begun changing the normal course for identifying members of protein complexes from a “fishing trip” to a “mining expedition”. Because identifying all members of a protein complex is often a necessary pre-requisite for structural characterization, a common fishing approach is to use a yeast two-hybrid screen with single defined bait against a large, proteomic-scale prey library. An early example of how this approach can facilitate a deeper structural and mechanistic understanding of an important biological problem involves BRCA2, a DNA binding protein with important roles in breast cancer (Zheng et al., 2000). When structural studies of BRCA2 were initiated, baculovirus expression of BRCA2 in the absence of any interaction partners yielded insoluble protein (Yang et al., 2002). A two-hybrid screen based on cDNA derived from HeLa cells and mammary tissues identified DSS1 as a new interaction partner for BRCA2 (Marston et al., 1999). Co-expression of DSS1 and BRCA2 eventually yielded a co-crystal structure and the structure of a ternary complex with DNA demonstrating the importance of fishing out new interaction partners to enable structural studies (Yang et al., 2002) (Figure 3A). Interestingly, DSS1 has been shown to be part of multiple complexes, including an evolutionarily conserved component of the 26S proteasome (Wilmes et al., 2008).
In contrast to the BRCA2-focused yeast-two hybrid study that identified DSS1, most systems biology studies are performed with limited bias, systematically focusing on functionally related sets of proteins (Babu et al., 2012; Behrends et al., 2010; Krogan et al., 2004; Sowa et al., 2009) or even entire the entire proteome within an organism (Gavin et al., 2006; Havugimana et al., 2012; Hu et al., 2009; Krogan et al., 2006). These comprehensive approaches facilitate “mining expeditions” through a large database of interactions to identify multiprotein complexes that can be reconstituted biochemically and structurally (Figure 3B) (Aloy et al., 2004; Brooks et al., 2010). For example, in a recent global survey of the HIV proteome, Jager et al. identified interactions between human proteins and all 18 HIV-1 proteins and polyproteins (Jager et al., 2012a). The resulting 497 high confidence interactions provide many new opportunities to observe the atomic details of how a pathogen utilizes and subverts the normal function of host proteins. Examination of the interaction maps across two different cell lines revealed a surprising association of the transcription cofactor CBF-β, which heterodimerizes with Runx to regulate transcription in T-cells (Wong et al., 2011), with the HIV protein Vif. Although Vif was previously known to hijack an endogenous CUL-5 ubiquitin ligase complex to target for ubiquitination and degradation the host restriction factor APBOBEC3G (Yu et al., 2003), the architecture of this complex had remained elusive. Similar to DSS1 with BRCA2, co-expression of CBF-β provided the missing puzzle piece, enabled biochemical reconstitution and structural characterization of the Vif-Cul-5 complex (Jager et al., 2012b; Zhang et al., 2012b). Interestingly, Vif may be facilitating a dual hijack, one that affects a cullin-containing ubiquitination pathway as well as the rewiring of the transcriptional landscape regulated by the CBF-β -RUNX1 complex (Kim et al, Mol Cell, in press). Consistent with these results, a knockdown of CBF-β adversely affected HIV infectivity (Jager et al., 2012b; Zhang et al., 2012b). Future studies will investigate if and how both functional hijackings are required for efficient HIV infection. As further atomic-level details of this interaction emerge, it will likely serve as a paradigmatic example of how unexpected interaction partners can emerge from unbiased global studies to enable mechanistic studies.
Hybrid strategies that build on “-omics” techniques normally associated with systems biology are being used to uncover new binding partners and to accelerate biophysical studies for intrinsically disordered proteins (Dyson, 2011), macromolecular machines (Alber et al., 2007b) and transient complexes (Herzog et al., 2012). A particular challenge faced by these studies is to integrate and weight diverse data types into a self-consistent structural model (Alber et al., 2007a). For example, when performed on purified proteins, chemical cross-linking coupled with mass spectrometry provides distance restraints (Rappsilber, 2011). Crosslinking restraints can be incorporated into structure determination using well established methods similar to those used routinely in NMR (Havel et al., 1983); however, additional complications are encountered as crosslinking studies are extended to even more heterogeneous samples. In these studies, network context inferred from previous systems-level studies is essential for separating signals resulting from sterically incompatible complexes, allosteric distortion of individual subunits and entirely novel assemblies. The complexity of deconvoluting signals emerging from proteomic studies is exemplified by the Protein Phosphatase 2A (PP2A) holoenzymes, which consist of regulatory, catalytic, and scaffold subunits that provide combinatorial specificity to substrate proteins (Herzog et al., 2012). By purifying endogenous PP2A. Aebersold and colleagues revealed new players in the PP2A network, but also provided insights into the structural basis for how specific PP2A holoenzymes achieve interactions with a wide array of cellular processes. Proteomic-based crosslinking methods are a valuable source of structural data that can be obtained directly from heterogeneous mixtures and complement methods such as X-ray crystallography, NMR, and electron microscopy that rely on purified samples.
In addition to the natural links between structural biology and proteomics, other “-omics” technologies are beginning to generate interesting mechanistic problems. Synergies between genomics, metabolomics and structural studies are exemplified by an interesting mutation (R132H) in the metabolic enzyme isocitrate dehydrogenase-1 (IDH1). This mutation was first uncovered in a large genomics study, where it was present in 12% of Glioblastoma multiforme brain cancer patients (Parsons et al., 2008). Initial studies suggested that the mutation led to a dominant negative effect on the catalytic formation of the metabolite α-ketoglutarate (Zhao et al., 2009). Surprisingly, subsequent metabolomics profiling revealed no significant changes in levels of α-ketoglutarate between cells expressing mutant or wild type versions of the enzyme (Dang et al., 2009). However, levels of a different metabolite, R(-)-2-hydroxyglutarate, were significantly elevated. A crystal structure of the mutant enzyme revealed the key changes in active site geometry that enable this novel catalytic activity. The discovery of a new activity has motivated further systems-level studies into the role of R(-)-2-hydroxyglutarate in epigenetics (Sasaki et al., 2012) and the structural information provides additional hope that the mutant IDH1 enzyme represents a new target for therapeutic intervention in glioblastomas and acute myeloid leukaemias.
The examples discussed above represent only the initial cross-pollination between reductionist and global approaches. By recognizing the importance of non-additive interactions in genetics and biophysics, a common framework is being developed to more fully integrate systems and structural biology (Lehner, 2011). These interactions are commonly referred to as “epistatic” effects because the extent of the interaction depends on its genetic context (Phillips, 2008). In both genetics and biophysics, most perturbations act independently, yielding purely additive effects on fitness (Mani et al., 2008) or thermodynamic parameters (Wells, 1990). However, many functionally important perturbations may be benign in one context and deleterious in another (Cadwell et al., 2010). The non-additive effects of mutations may underlie “missing” genetic heritability - the limitation in our ability to explain functional consequences of sequence changes (Zuk et al., 2012). Exciting new efforts are underway to correlate genetic interaction measurements used in systems biology (Mani et al., 2008) with thermodynamic coupling analyses used in structural biology (Horovitz, 1996) to establish general principles of non-additivity and robustness that scale from systems to structure.
The most complete analyses of non-additive effects at a systems-level are from genetic interaction screens in the budding yeast, Saccharomyces cerevisiae. High throughput measurements, such as Synthetic Genetic Array (SGA) (Costanzo et al., 2010) or Epistatic Mini-Array Profile (E-MAP) experiments (Collins et al., 2010), compare quantitative measures of the fitness of individual gene knockouts to double knockouts (Figure 4). Positive genetic interactions occur when the fitness of the double mutant is higher than would be expected based on growth rates of the individual mutants. These alleviating interactions are enriched in gene products that physically interact or are part of a linear signaling pathway (Beltrao et al., 2010; Collins et al., 2007; Roguev et al., 2008). In contrast, cases where the double mutant is less fit than expected based on the individual mutations are termed negative genetic interactions. These aggravating interactions have been used to often identify proteins that function in parallel pathways of a given process (Tong et al., 2001). The extension of high-throughput genetic interaction methods to other organisms (Horn et al., 2011; Lin et al., 2012; Tischler et al., 2006; Typas et al., 2008) (Roguev et al., submitted) has created great interest in mining these datasets to generate combination therapies for human diseases. In general, genetic interactions are most conserved between species within protein complexes, less conserved within biological processes, and least conserved across distinct biological processes. This suggests a hierarchical modularity that governs how non-additive connections between proteins, complexes, and processes are functionally rewired and repurposed during evolution (Ryan et al., 2012). These genetic interaction approaches can extend across species to reveal dependencies between organisms in disease and symbiotic contexts (Fischbach and Krogan, 2010). A recent integration of insertional mutagenesis and depletion (iMAD) in the bacteria Legionella combined with RNAi in the host Drosophila melanogaster identified interesting cross-species synthetic lethal interactions that are relevant for pathogenesis (O'Connor et al., 2012). The fundamental principles of these genetic conflicts between species will likely be revealed by careful studies in classic host-pathogen systems such as bacteriophages and bacteria (Bondy-Denomy et al., 2012).
The hierarchical design principles revealed by high-throughput genetic interaction studies have many parallels in the principles revealed by biophysical studies of protein function. Analysis of non-additive effects in biophysics studied in the context of protein folding (Horovitz and Fersht, 1990; Perry et al., 1989), enzyme mechanism (Carter et al., 1984), allostery (Sadovsky and Yifrach, 2007) and protein-ligand interactions (Baum et al., 2010; Schreiber and Fersht, 1995). As with genetic interaction studies, biophysical double mutant cycles reveal that most perturbations act independently (Wells, 1990) (Figure 4). However, both bioinformatics (Breen et al., 2012) and experimental (Bershtein et al., 2006; Tenaillon et al., 2012) studies suggest that non-additive (epistatic) effects play a dominant role in shaping the evolution of proteins. In the context of protein structure, non-additive effects of mutations often occur between residues that are in direct contact (Horovitz, 1987). For example, mutating two Alanine residues to form a new Lysine-Aspartate salt bridge results is only beneficial when both mutations are made as neither the Lysine-Alanine nor Aspartate-Alanine pair is likely to be stabilizing This “contact-based mechanism” of interactions parallels the positive genetic interactions observed between gene products that physically interact (Collins et al., 2007). A similar analogy of “parallel pathway mechanisms” exists between negative genetic interactions (Tong et al., 2001) and mutations that form new stabilizing intraprotein contacts or alternative intermolecular interaction binding modes. Promiscuous proteins with multiple binding modes may mask the negative effects of substitutions that only affect one interaction mechanism. Ancestral reconstruction of a nuclear hormone receptor revealed how two potential binding modes can be interconverted through conformational epistasis: where mutation at one site indirectly repositions the three-dimensional position of a remote second site and changes the functional effects of mutations at the second site (Ortlund et al., 2007). Similar to negative genetic interactions, conformational epistasis can lead to double mutations that affect both binding modes that are not apparent from the individual mutations.
However, despite general trends across atomic- and network-scales for positive and negative interactions, it remains difficult to predict the strength of non-additive interactions from network, structural, or evolutionary principles (Lehner, 2011). The simple contact-based or parallel pathway models can break down when long-range compensatory interactions occur (Lunzer et al., 2010), consistent with the fact that the correlation between direct protein-protein interactions and positive genetic interactions is not absolute, especially for complexes containing essential proteins (Costanzo et al., 2010). Mapping the genotype-phenotype-fitness relationship (Lunzer et al., 2005) can be additionally complicated by the ability of molecular chaperones to act as buffers that may enhance non-additive interactions as a protein nears its stability limit (Tokuriki and Tawfik, 2009) or a network is challenged by an external perturbation (Geller et al., 2007). To address the significance of these trends, comprehensive studies of protein mutations – on a scale similar to the genetic interaction studies – must be carried out. The advent of new sequencing methods now makes it possible to survey comprehensive sets of mutations under different selective pressures (Hietpas et al., 2011). One recent survey of all possible mutations in a PDZ domain revealed a strongly co-evolving set of amino acids within which two mutations act non-additively to change peptide binding specificity (McLaughlin Jr et al., 2012). Studies on this newly accessible scale have the potential to reveal new principles of non-additivity underlying protein stability and function (Araya et al., 2012). The extension of these principles to the co-evolution of protein-protein interactions will address the similarities in molecular mechanisms underlying epistatic interactions between and within individual gene products (Fridman et al., 2010; Skerker et al., 2008). Therefore, quantitative measures that intrinsically calibrate thermodynamic quantities to fitness measurements present new opportunities to discover unifying principles of buffering, non-additivity and adaptation across the atomic and network scales.
As these more comprehensive surveys of protein sequence and genetic interaction space continue to emerge, an additional challenge will be to adapt the common analytical models used in both systems and structural biology to include the fourth dimension: time. Much of the work we have highlighted here has focused on integrating multiple static representations, as exemplified by incorporating crystal structure information in a protein-protein interaction network diagram. Recently, structural interaction networks have been adapted to incorporate multiple protein conformations (Bhardwaj et al., 2011). Disease mutations that have only a minor effect on the dominant structure of a protein can exert a deleterious function by altering the ability to dynamically switch between near-native conformations (Shan et al., 2012). Indeed, mutations remote (Fraser et al., 2009; Wang et al., 2012b) and within (Bhabha et al., 2011) the active sites of enzymes can have large functional consequences without dramatically altering protein structure. Conservation of protein motions have been proposed to be a major constraint in protein evolution and may provide a mechanistic explanation for non-additive effects between residues that are not in direct contact (Reynolds et al., 2011). Microsecond-millisecond protein conformational dynamics can be exploited in the development of new protein therapeutics (Levin et al., 2012) and tuning the biophysical properties of small molecule inhibitors (Carroll et al., 2012).
At longer timescales, measuring the dynamics of non-additive responses in network contexts will also likely reveal new principles of modularity (Alexander et al., 2009). Indeed, profiling the changes in protein-protein interactions of the scaffold protein GRB2 after perturbations designed to activate different receptor tyrosine kinases revealed that the time-dependent rewiring and elaboration of new complexes centered around a conserved set of core interactions (Bisson et al., 2011). A comparison of genetic interactions in yeast in the presence and absence of DNA damaging agents revealed that although relationships within protein complexes are largely conserved, there is substantial rewiring in the functional relationships between complexes (Bandyopadhyay et al., 2010). These results suggest that the concept of a network “module” may become more diffuse as a time dimension and additional perturbations are incorporated. Similar to genetic interactions, current efforts are underway to discover non-additive effects in drug combinations (Yeh et al., 2009). Multiplex profiling of approved and clinically tested drugs uncovered new interactions that synergize to inhibit HIV replication (Tan et al., 2012). Varying the timing and dynamics of different perturbations may discover additional synergistic combinations. Methods, such as mass cytometry, that can dynamically read out the response of the cell across many parameters simultaneously can be used to guide the optimization of perturbation dynamics (Bodenmiller et al., 2012). Indeed, new therapeutic regimens are currently being developed around non-additive effects on the order and time course of drug combinations by exploiting network level principles (Lee et al., 2012). This study found that EGFR inhibition was ineffective when combined simultaneously with genotoxic drugs; however, staggered administration of the two perturbations dynamically rewires the cells and leading to a more efficient potential therapy. Collectively, these studies point to emerging parallels of the role of non-additivity on the sub-second timescale in protein dynamics, on the hours-weeks timescale in the cellular response to perturbations, and on the million-year timescale in the adaptations that shape interaction networks (Ideker and Krogan, 2012).
In this review, we highlighted how the use of systems and structural biological approaches are truly synergistic when used in combination to understand complex biological phenomenon. For example, unbiased, systems approaches can identify key components that facilitate structural analysis whereas structural information can provide insight into and even guide large-scale biological studies. Furthermore, we argue that there are striking similarities between the framework used to interpret quantitative genetic interactions and thermodynamic responses. Finally, detailed “small-scale” systems and synthetic biology efforts, which examine network motifs and quantitatively model signaling networks, are complimentary to the large-scale studies since they can be used to define core interactions and network principles (see review by Lim et al. in this issue). A challenge for future work will be to determine and integrate the quantitative relationships that exist at these different levels, including cell biology (Slack et al., 2008), pharmacology (Lounkine et al., 2012), and ecology (Hekstra and Leibler, 2012). In the near future, we expect methods that explicitly investigate these parallels from “systems to structure” will play an increasingly larger role in efforts to predict the phenotypic response to mutation, small molecules, pathogens, and other perturbations.
We thank Michael Shales for help preparing figures. JSF was supported by NIH (DP5OD009180) and QB3. JDG was supported by NIH (GM081879 and GM078360). NJK was supported by NIH (GM084448, GM082250, GM084279, GM081879, GM098101, AI090935, AI091575) and DARPA (HR0011-11-C-0094). NJK is a Searle Scholar and a Keck Young Investigator.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.