Two important features of HCV infection, persistence following primary infection and resistance to IFN-based therapy, have been related to the extensive HCV genetic variability (39
). Although HCV has developed a very efficient capacity to escape from adaptive (15
) and innate immune responses (12
), ~20% to 30% of all HCV infections are cleared by the host (23
) and 50% to 70% of chronic infections can be successfully treated with IFN-RBV (48
). The variation in response to therapy among HCV strains remains poorly understood. However, differential sensitivity of HCV genotypes to IFN therapy (52
) suggests that viral genetic factors play an important role in determining therapy outcomes. Despite a low degree of response to treatment during chronic infection, 80% to 98% of patients with acute HCV infection can achieve complete virological response to IFN therapy (51
), suggesting that HCV acquires a significant degree of IFN resistance during chronic infection. Taken together, these observations indicate a strong connection between the intrahost HCV evolution and success of the IFN-RBV therapy.
In the current study, an integrative approach was implemented for the evolutionary analysis of the HCV genome. This approach was based on modeling interrelationships between polymorphic sites along the entire HCV polyprotein and relating the modeled coordination among amino acid substitutions to the UR/NR outcomes of therapy. Models constructed here showed an extensive interdependence of all polymorphic sites within the HCV polyprotein, suggesting a significant coevolution among individual HCV proteins. The data indicate that all HCV proteins contain sites coordinating their polymorphism with sites in all other proteins (Fig. ). A similar observation has been recently made using a correlation network analysis of the HCV genotype 1a full-genome sequences from untreated patients (17
) and patients on therapy (7
). Among all connections identified using the polyprotein BN in this study, only 17.5% were among sites within individual proteins. It is interesting to note that E2 shows the most extensive coordination among its sites, with all other proteins having ~2 to 13 times fewer connections among intraprotein sites than E2. With 82.5% of all connections in the network being among proteins, HCV evolution is evidently defined by coadaptation among many phenotypic traits encoded by different HCV proteins.
Although all HCV proteins contribute to the network, the topological properties of sites differed among proteins. The core protein contributes fewer sites (n = 11) per its size than any other HCV protein. However, each core site forms ~2 to 4 times more links in the network than any site from other proteins (Table ). This protein has 12.4 times more outgoing than incoming links in the polyprotein BN, while the ratio between outgoing and incoming links for all other proteins varies from 0.8 to 2.2 (Table ). Another important feature of core connectivity in the polyprotein BN is that 98.6% of all core links are with other proteins. The presence of only two intraprotein links (polyprotein positions 90→110 and 47→29) makes the core protein the least intraconnected protein, indicating a minimal direct coordination among core polymorphic sites. Thus, the contribution of core to the network topology differs considerably from those of all other proteins, suggesting that this protein has a unique role in coordinating substitutions and defining heterogeneity at many sites of the HCV polyprotein.
This observation is in agreement with the multitude of functions performed by the core protein and emphasizes its important role in HCV infection. In addition to forming the nucleocapsid (105
), this protein was shown to interfere with many cellular signaling pathways involved in apoptosis (134
), transcription (60
), and transformation (21
). The core protein is also involved in lipid metabolism (10
). It inhibits the microsomal triglyceride transfer protein, binds to apolipoprotein AII, and induces accumulation of cytoplasmic lipid droplets (2
). Core and NS5A are key factors for assembly of infectious particles. Both colocalize on the surface of lipid droplets, a proposed site for HCV particle assembly (4
). With lipid droplets playing a crucial role in the assembly and release of infectious HCV particle (83
), interactions involving domain 2 of core and domain 3 of NS5A (5
) are essential for virion production and, therefore, have a strong impact on infectivity and viral fitness. Mutation at position 147 in domain 2 of the core protein was found to affect adherence of core to lipid droplets and virus production (107
). Our data show that this site has direct links in the polyprotein BN to sites in E1, E2, NS2, and domain 3 of NS5A. Another site, from domain 2 of core at position 161, linked to P7 in addition to these four proteins. All of these proteins play a role in the membrane-associated viral replication (86
). These observations suggest coordination of heterogeneity across the HCV polyprotein related to viral production and the important role played by the core protein in this coordination.
Two proteins, E2 and NS5A, together contribute ~40% of all sites and ~62% of all links to the polyprotein BN and, therefore, essentially define the state of this entire network. In combination with E1, these three proteins contribute ~50% of all sites and ~77% of all links to the polyprotein BN. It is interesting that E2 and NS5A also mutually coordinate their heterogeneity (Fig. ). Although coordination between sites from any two HCV proteins is a common feature of the polyprotein BN, this coordination is most extensive between sites of E2 and NS5A, owing to the large number of sites contributed by these two proteins to the network. Thus, the states of many sites in one of these two proteins reflect the states of many sites in the other protein, suggesting a high degree of coevolution between these two proteins. Additionally, it was observed that sites from E2 formed the strongest links with many other sites in the polyprotein as determined by CI testing (Fig. ), among which were links between sites 482 and 642 in E2 (P
= 2 × 10−9
), 612 in E2 and 233 in E1 (P
= 3 × 10−8
), and 642 in E2 and 1756 in NS4B (P
= 2 × 10−8
). Taking into consideration that site 482 is from the CD81-binding region (45
) and site 612 from one of two E2 regions proposed to be involved in the viral fusion process (72
), we speculate that the tight coordination between sites 482 and 642 as well as that between sites 612 and 233 is associated with viral entry.
Another important observation made in this study is that all HCV proteins have association with the UR/NR outcome of IFN-RBV therapy. Taking into consideration the aforementioned extensive linkage among polymorphic sites from different proteins, this observation, although not surprising, reveals that the HCV response to immunomodulatory therapy is a very complex trait involving numerous viral functions that require coordination. All networks constructed for individual proteins included the UR/NR outcome as a variable (Table ). However, this observation cannot be unequivocally interpreted in terms of equal contribution of each protein to the IFN-RBV response. Nevertheless, it suggests that the genome-wide coordination among sites is important for this response, with some proteins possibly playing accessory roles and reflecting the IFN-RBV-related changes in other proteins that are mainly responsible for resistance. The analysis conducted here revealed that sites substantially associated with the outcome are scattered along the entire HCV polyprotein. Among the sites with relevant significance of >0.5 (Fig. ) are sites in E1 (n = 1), E2 (n = 8), p7 (n = 2), NS2 (n = 1), NS5A (n = 9), and NS5B (n = 4). Two proteins, E2 and NS5A, shared 68% of these 25 sites, suggesting their strong connection to IFN-RBV resistance. E2, NS5A, and P7 have, respectively, 6.8%, 9.0%, and 11.7% of their polymorphic sites being highly relevant to the therapy outcome, while all other proteins have only 1.5% to 3.1% of these sites.
One surprising finding was that five among the eight sites most relevant to therapy outcome are located in HVR1 of the E2 protein (aa 384 to 410), emphasizing a strong connection of HVR1 heterogeneity to IFN-RBV resistance. Association of HVR1 sites with outcomes of therapy can be also found in the correlation networks (7
). However, the significance of these observations is not apparent. Analysis of HVR1 connectivity in the polyprotein BN showed that polymorphic HVR1 sites have a total of 140 links to all HCV proteins, with each HVR1 site being connected to three to nine sites in the HCV polyprotein. Such an extensive interdependence of HVR1 sites with many sites across the entire HCV polyprotein (Fig. ), in conjunction with the earlier similar observations using network analysis (17
), suggests that the HVR1 substitutions are not random and that HVR1 evolution is substantially coordinated with all HCV proteins. Coordination of HVR1 heterogeneity is especially noticeable with E1, E2, and NS5A, which share, respectively, 15%, 26.4%, and 14% of all HVR1 links in the polyprotein BN, while any other HCV protein shares 3.6% to 9.3% of HVR1 links.
HVR1 contains antigenic epitopes (66
) with HCV neutralizing activity (40
). Rapid HVR1 evolution is associated with immune escape (70
). However, the conservation of the HVR1 physicochemical properties and conformation (94
) argues that this region is significantly functionally constrained despite its extensive heterogeneity. The observation that compensatory mutations in the ectodomain of E2 (46
) and the I347L mutation in E1 compensate for HCV fusion impairment (9
) in HCV mutants whose HVR1 have been excised suggests potential functional relationships of this region with other parts of the HCV genome. HVR1 was shown to be involved in the SR-B1-facilitated entry of HCV pseudoparticles in cell culture (11
). It was suggested that HVR1 plays an important role in HCV entry by modulating receptor recognition and affects lipoprotein composition and infectivity of viral particles (9
). HVR1 heterogeneity was also associated with the development of resistance to therapy (74
). We hypothesize that complex functional relationships of HVR1 are reflected in coordinated evolution with other HCV proteins and that HVR1 mirrors the evolution of the entire HCV genome, including evolution toward the IFN-RBV resistance.
There are many sites from different HCV proteins strongly linked to the IFN-RBV resistance (Fig. and ). However, consideration of individual sites allows only for the identification of connections to the therapy outcome in the form of a trend and does not have a strong predictive power. Correlation of the IFN-RBV therapy outcomes has been reported with site polymorphisms in the core (36
), E2 (87
), and NS5A (88
) proteins. Although these observations revealed numerous associations between the HCV genetic polymorphism and evolution toward IFN-RBV resistance, these associations were never explored in terms of their interrelationships and formulated into an integrative model capable of revealing accurate quantitative connections between HCV genetic changes and therapy outcomes.
The current report presents several probabilistic models connecting the UR/NR outcome to coordinated changes at polymorphic sites across the entire HCV polyprotein as well as from individual HCV proteins. Analysis of individual sites without consideration of their relationships seems inefficient in detecting a reliable connection to the outcomes. Only 3 among 25 sites having the highest value of mutual information with the outcome (Fig. ) were found to be directly linked to the outcome in the polyprotein BN (Fig. ). The same 3 sites, 2280, 2283, and 2633, are among the 14 most relevant sites extracted from the HCV polyprotein using correlation-based feature selection (Table ) and among 18 sites that have the strongest connections to outcome in the undirected dependence graph (Fig. ). All computational techniques used in this study ranked the contribution of various sites differently. For example, only 12 sites were shared by 18 sites shown in Fig. and 25 sites shown in Fig. . Although sites 2280 and 2283 from NS5A and site 2633 from NS5B were frequently identified as most relevant to the IFN-RBV response, analysis of states at these sites is not sufficient for an accurate prediction of the therapy outcome (data not shown). Such a prediction requires the use of a combination of sites selected for their collective contribution to the outcome.
For that purpose, we conducted a series of experiments for selection of site sets most relevant to the therapy outcome from the entire HCV polyprotein and individual proteins (Table ). Two proteins, E2 and NS5A, were explored in detail. As mentioned earlier, both proteins have many polymorphic sites and contributed many links to the polyprotein BN. These two proteins consistently made substantial contributions of the most relevant sites identified using different feature selection techniques (Fig. and ). Probabilistic mapping of UR and NR outcomes in 2D physicochemical space showed an equally representative distribution of the outcome probabilities for E2, NS5A, and the polyprotein (Fig. ). All these findings strongly suggest that these two proteins have a strong connection to therapy response and can be used for the accurate prediction of therapeutic outcomes. However, as can be seen in Fig. , the 10-fold CV experiments showed that the NS5A BN outperforms the E2 BN constructed using complete sets of polymorphic sites (82.5% versus 90% accuracy) or feature-selected sites (85% versus 97.5% accuracy). These results, taken together with the observation that NS5A contains two of six sites directly connected to the therapy outcome in the polyprotein BN while E2 has no direct links to the outcome, suggest that NS5A has a very strong relevance to evolution toward the IFN-RBV resistance.
Two sites, at positions 2376 and 2414 in NS5A, have experimentally been associated with the development of resistance to RBV (97
). It is important to note that these two sites were consistently selected as being relevant to the therapy outcome (Table and Fig. ), indicating that the NS5A BN as well as polyprotein BN constructed using all or feature-selected sites includes links that reflect contribution of RBV to therapy. Site 2414 located in domain 3 of NS5A is linked to site 161 in domain 2 of core in the polyprotein BN. As mentioned earlier, both domains are involved in protein-protein interactions between these two proteins, association with lipid droplets, and assembly and release of viral particles (81
). There seems to be a linkage between coevolution of the core and NS5A proteins and RBV resistance, and this resistance is associated with interaction between these two proteins. The final validation of the two predictive NS5A Virahep-C models using the HALT-C data strongly confirms a robust connection between coordination among the NS5A sites and IFN-RBV resistance. Additionally, it shows that a small number of features from NS5A alone may be sufficient for the prediction of therapy outcomes (Fig. ). This finding suggests that analysis of a very few sites from a small HCV genomic region, such as NS5A, may be used for monitoring sensitivity to the IFN-RBV therapy.
A general interconnectivity among HCV proteins was comparable for the 40 Virahep-C sequences and the 298 HCV genotype 1a full-genome sequences obtained from GenBank (Fig. and ), indicating that the modeled coordination among substitutions is essentially similar for all HCV variants from treated and treatment-naïve patients. This observation additionally suggests that the development of resistance during immunomodulatory therapy is generally shaped by selection pressures similar to the HCV evolution in untreated patients. However, there are some important differences between the polyprotein BNs generated using sequences from treated and treatment-naïve patients. The GenBank sequences from untreated patients contain more polymorphic sites (n = 1,296) than the Virahep-C sequences (n = 551). Despite this fact, the Virahep-C sequences contain 25 polymorphic sites that are conserved in the GenBank sequences. These sites are distributed within E1 (n = 3), E2 (n = 4), P7 (n = 1), NS2 (n = 2), NS3 (n = 6), NS4A (n = 1), NS4B (n = 3), NS5A (n = 3), and NS5B (n = 2). Among them, sites at positions 230 in E1, 768 in P7, and 1461 and 1592 in NS3 are the most relevant to the IFN-RBV response (Table ). Furthermore, the two BNs had topological differences in the number of interprotein links, most notably the 1.7- and 2-fold proportional increase in the number of links between E1 and E2 and between E2 and NS5A in the Virahep-C BN compared to those in the GenBank BN (Fig. ). These observations suggest that despite the similarity of these two networks, there are distinct differences in coordination among substitutions in HCV from treated and treatment-naïve patients.
IFN is a major component of innate immunity (19
). Several HCV proteins are involved in modulation of the host IFN response (12
). RBV used as a component of combined therapy seems to facilitate early response to IFN (43
) rather than playing a strong independent role. Resistance to IFN is not clearly linked to any specific mutation within the HCV genome. As shown in this study, HCV adaptation to IFN is a complex trait encoded in the interrelationships among many sites along the entire HCV polyprotein. The extensive coevolution among HCV amino acid sites leads to a significant integration among the HCV IFN-response-related phenotypic traits. Each HCV protein contributes to the IFN resistance, albeit to a different degree. With E2 and NS5A contributing many polymorphic sites to the network and generating a broad epistatic connectivity to sites in other HCV proteins, intrahost HCV evolution toward the IFN resistance is essentially defined and, therefore, can be accurately predicted using a carefully selected combination of sites from these two proteins.
Treatment with IFN does not exert an unusual selection pressure on HCV, unlike treatment using direct-acting antiviral compounds, but rather generates an unusually strong selection pressure of the innate immune system. Thus, HCV strains capable of resisting or evolving toward resistance to immunomodulatory therapy are most efficient in overcoming the host immune system. With the entire HCV genome being responsible for the response to IFN, there is no single IFN resistance mutation. Once established, the wide-ranging epistatic connectivity among sites involved in the IFN response may not be rapidly reverted even with reduction of the selection pressure in the absence of treatment, thus locking the HCV genome into the state of resistance to IFN. Without being eliminated by IFN-RBV therapy, these variants can continue to circulate among human hosts. In contrast, IFN-RBV-sensitive strains are being removed from circulation. This consideration implies that the current widespread adoption of IFN-based therapy, although extremely beneficial for individual patients with SVR, may affect the composition of the circulating HCV population and enlarge the reservoir of IFN-resistant HCV, a potentially alarming public health issue that warrants a further investigation.