We previously found that HCV sequences from patients who responded well to pegylated IFN-α and ribavirin were more variable than were poor responders in genes implicated in counteracting the type 1 IFN response (17
). We interpreted this to mean that viral isolates with a relatively tight genetic distribution around an optimum sequence were more able to withstand the pressures induced by therapy, and those that were more distant from this optimum were less able to survive, presumably as a result of the presence of multiple variations that each reduced the overall efficacy of the viral proteins. Here, we found that genome-wide networks of covarying amino acids existed, that the connections within the networks (connectivity) were different in the responders and nonresponders, and that the nonresponder networks had many more hydrophobic amino acid pairs than did the responder networks.
The covariance networks covered all 10 HCV proteins and all had hub-and-spoke architectures, which indicates that a few residues covaried with many other residues but that most covaried with only a few other positions. The network connectivity was very different between the Marked and Poor and between the SVR and Non-SVR response classes. Therefore, the genetic and functional interactions represented by the covariances in the response-specific networks may represent HCV genetic differences that affect the ability of the viruses to withstand the pressures of therapy. There was a large overlap in the covariances in the networks from the responder Marked and SVR classes, and a similar overlap was found in the nonresponder Poor and Non-SVR classes, for both genotypes (Figure ). Therefore, the viral variables reflected in these networks that affect the day-28 response to therapy were similar to those affecting the outcome of therapy.
Genome-wide covariance analysis has very recently been used by Campo and colleagues to assess coordinated evolution of residues throughout the HCV genome (37
). This work was performed independently of our analysis and used a different method to identify the covariances, but the results from the 2 studies were very similar. The algorithm used by Campo et al. assessed the physiochemical properties of residues at the 10% of most variable positions in an alignment of 114 genotype 1b HCV amino acid sequences. Similar to our results, the covariances they identified linked into a hub-and-spoke network that encompassed all 10 of the proteins encoded in the HCV polyprotein; this network was analogous to our 1b All network. Furthermore, many of the most highly connected hubs in the Campo network were also found in our 1b networks that were generated without regard to the physiochemical properties of the amino acids. Campo and colleagues concluded that the network was a tightly coordinated unit that was functionally and/or structurally connected (37
), in full agreement with our present conclusions. Although the Campo sequences were not stratified by outcome of antiviral therapy, and thus their network cannot be used to evaluate differential sensitivity to IFN-α–based therapy, their results are important to our work because they provide an independent validation of the existence of an All network. By extension, they also support the validity of the response-specific networks, because every covariance and node in our All network was also found in one or more of the response-specific networks.
Genetic covariance indicates a functional interaction between the covarying residues, but it does not identify the nature of the interaction. The functional linkages could involve direct binding between the covarying residues, compensatory allosteric changes within a protein, and/or compensatory changes on the surface of the HCV proteins where they interact with host or other viral proteins. Examples of all of these mechanisms are likely to be present among the large number of covariances we identified, but a major mechanism by which the differences between the responder and nonresponder networks may contribute to differential response to IFN-α–based therapy was revealed by the chemical nature of the covarying residues. The covariance networks from the nonresponder Poor and Non-SVR classes had greater than 3-fold more hydrophobic residue pairs than did sequences from the responder Marked and SVR classes (Table ). In contrast, the responders had many more hydrogen bond donors or acidic-basic residue pairs. Hydrophobic interactions contribute much more to protein stability in an aqueous environment than do hydrophilic interactions. Therefore, the potential for greater stability provided by the higher hydrophobic nature of the interactions may allow some of the viruses in the population to better survive the pressures introduced by therapy. However, because the covariant residues were rarely close enough to bind to each other directly, we predict that in most cases the increased hydrophobicity provided by the covariant pairs would stabilize multiprotein complexes rather than the structure of a given protein.
IFN-α activates a multitude of host barriers that limit the spread of infection (10
), and ribavirin has at least 3 proposed effects against HCV (38
). Therefore, it is highly unlikely that the generalized increase in the hydrophobic nature of the covarying residue pairs in viruses from nonresponders acts through a few discrete intermolecular interactions. Rather, the simplest explanation is that the sum of these interactions strengthened complexes involving viral proteins. In the structural proteins that form the virion (core, E1, and E2), the greater number of hydrophobic interactions would be predicted to stabilize the virus particle and to somehow increase its infectivity and/or resistance to degradation by IFN-induced mechanisms. The predicted increase in the stability of complexes including the nonstructural proteins (P7, NS2, NS3, NS4A, NS4B, NS5A, and NS5B) would presumably both stabilize the replicase complex to reduce its sensitivity to effectors of the IFN-α response and improve the ability of the viral proteins to interdict the cellular type 1 IFN response (e.g., the ability of NS3/NS4A to block sensing of double-stranded RNA by TLR3 and RIG-I; ref. 10
). The mechanisms by which the structural and nonstructural proteins function during viral replication and the generalized increase of hydrophobic residue pairs in the responder networks together imply that clearance of the virus during IFN-α–based therapy may be aided by both lower cell-to-cell infectivity of the virions and higher sensitivity of the viral components within cells to the drugs.
Furthermore, the majority of the covariances probably reflect compensatory variations among multiprotein complexes composed primarily of viral proteins. The justification for this prediction is that host proteins do not vary with the high frequency observed among HCV sequences, with the exception of the antigen-binding regions of the immunoglobulins and the T cell receptors. However, escape from cell-mediated immunity is unlikely to be the dominant force driving development of the covariance network because sets of T cell epitopes would need to coevolve, but T cell epitopes are short linear peptides, and very few of the covariances were adjacent to one another in the linear amino acid sequence. Effective antibody-mediated selective pressures would also be unable to generate the genome-wide covariance networks because these pressures would be largely limited to the E1 and E2 surface glycoproteins that form the exterior of the virion, but covariances were found among all 10 HCV proteins. Therefore, the high degree of variation — and covariation — among the HCV sequences is not needed to accommodate the limited sequence diversity present in the vast majority of human genes. However, some of the covariances between different HCV proteins could represent compensatory adaptations between HCV proteins to maintain a common interaction with a third partner that may be of host origin.
The presence of amino acid covariance networks in the HCV genome specific to the outcome of antiviral therapy has 3 practical implications for personalized medicine. First, the nonoverlapping regions of the covariance networks from the Marked and Poor response classes (Table and Figure ) may provide a basis for a sequence-based test that could predict the susceptibility of individual HCV isolates to IFN-α–based therapies. Our initial assessment of such biomarker positions (Figure ) is very promising, because hundreds of subnetworks providing 100% accuracy and greater than 90% coverage for prediction of both SVR and Non-SVR were found.
However, the precision by which the networks may be able to predict the outcome of therapy must be viewed with some reserve, because although the All networks have been validated in external data sets by us and others (37
), we cannot yet externally validate the response-based networks. This is because no non–Virahep-C sequence set exists for which the outcome of IFN-based therapy is available. We anticipate that the ability of the networks to predict treatment outcome will be less robust outside of this training set because the relatively small number of sequences available could have led to overestimation of the degree of separation between the treatment outcome networks. However, the large genetic diversity differences between the response classes (17
), the extensive overlap between the congruent response-based networks (Marked with SVR and Poor with Non-SVR), and the largely nonoverlapping nature of the contrasting networks (Marked versus Poor and SVR versus Non-SVR) all imply that HCV sequence variation has a major role in determining the outcome of therapy. Therefore, the large number of potential covariance biomarkers available and the ability to simultaneously consider multiple subnetworks strongly imply that a clinically useful predictive test could be designed.
The differences between the 1a and 1b networks indicates that predictive tests based on the covariance networks will need to be customized for each subtype. However, even with customization, such tests would be highly cost effective because chip-based assays could be designed for about $100 per sample, whereas the drugs used in a failed course of therapy can cost up to $30,000 (39
). We anticipate that the ability to predict nonresponse would be the most practical form of the assay, because treatment of a susceptible HCV isolate could still result in Non-SVR through drug intolerance or noncompliance. In this context, physicians could counsel against IFN-α–based therapy, avoiding tens of thousands of dollars in expenses and painful side effects for the patient. For example, more than 250 HCV patients are treated at Saint Louis University Hospital per year, and if futile treatment of just half of the nonresponders (approximately 62 patients) was eliminated at a cost of about $25,000 for the screening assay, the savings could be up to $1.8 million in drug costs alone.
The second medical implication for these networks is that the highly connected hub residues have a large number of functional interactions with other residues; hence, disrupting a hub would be predicted to weaken this web of interactions. Therefore, the hubs may be valuable antiviral drug targets. This is an attractive concept because knockout of hubs in interaction networks has previously been shown to be lethal in several different organisms (40
). Targeting variable sites for drug design is counterintuitive, but it should be feasible for anti-HCV therapy, because new anti-HCV drugs are likely to be used in conjunction with IFN-α. Therefore, an anti-hub drug would be designed to inhibit the IFN-resistant hub configuration, leaving variant viruses with the IFN-sensitive configuration to be eliminated by IFN-α. Targeting the hubs would be especially attractive because evolution of resistant mutants should be slow, as a result of the high genetic cost of mutating a highly interconnected residue without simultaneously mutating many of its covarying partners.
Finally, the high error rate of RNA synthesis that is a fundamental feature of RNA virus replication leads to high genetic diversity among these viruses. Therefore, covariance network analysis should be applicable to essentially all RNA viruses. If similar networks correlating with virulence or drug sensitivity exist in other viruses, covariance network analysis should open a wide range of diagnostic and therapeutic options in medical, veterinary, and agricultural settings.