|Home | About | Journals | Submit | Contact Us | Français|
Pegylated alpha interferon and ribavirin therapy for hepatitis C virus (HCV) genotype 1 infection fails for half of Caucasian American patients (CA) and more often for African Americans (AA). The reasons for these low response rates are unknown. HCV is highly genetically variable, but it is unknown how this variability affects response to therapy. To assess effects of viral diversity on response to therapy, the complete pretreatment genotype 1 HCV open reading frame was sequenced using samples from 94 participants in the Virahep-C study. Sequences from patients with >3.5 log declines in viral RNA levels by day 28 (marked responders) were more variable than those from patients with declines of <1.4 log (poor responders) in NS3 and NS5A for genotype 1a and in core and NS3 for genotype 1b. These correlations remained when all T-cell epitopes were excluded, indicating that these differences were not due to differential immune selection. When the sequences were compared by race of the patients, higher diversity in CA patients was found in E2 and NS2 but only for genotype 1b. Core, NS3, and NS5A can block the action of alpha interferon in vitro; hence, these genetic patterns are consistent with multiple amino acid variations independently impairing the function of HCV proteins that counteract interferon responses in humans, resulting in HCV strains with variable sensitivity to therapy. No evidence was found for novel HCV strains in the AA population, implying that AA patients may be infected with a higher proportion of the same resistant strains that are found in CA patients.
Chronic infection with hepatitis C virus (HCV) is a major cause of cirrhosis, liver disease, and hepatocellular carcinoma (reviewed in reference 37). About 3.1 million Americans are chronically infected with HCV, causing 8,000 to 10,000 deaths annually (3). Due to the slow progression of hepatitis C virus infections and the increasing prevalence of HCV in the American population, HCV-associated deaths are expected to more than triple over the next two decades, eventually exceeding those from AIDS (45).
HCV is a hepatotropic Flavivirus (reviewed in reference 40). The virion contains a lipid envelope with two envelope proteins surrounding a capsid. Within the capsid is a positive-polarity RNA genome about 9,600 nucleotides long that contains an open reading frame (ORF) that encodes a polyprotein of ~3,000 amino acids (Fig. (Fig.1).1). The structural proteins include the core protein that forms the capsid and the E1 and E2 surface glycoproteins. The nonstructural proteins include P7 (ion channel), NS2 (protease), NS3 (protease and helicase), NS4A (cofactor for NS3), NS4B (putative organizer of the viral replicase complex), NS5A (implicated in viral replication and pathogenesis), and NS5B (RNA polymerase). An 11th viral protein, the alternate reading frame (ARF) protein, is encoded in the +1 frame within the core region and is of unknown function (7).
Six HCV genotypes that are less than 72% identical at the nucleotide level have been identified, and within these genotypes, subtypes with 75% to 86% nucleotide identities may occur (8, 55, 60-62). HCV replicates as a quasispecies rather than as a clonal population; hence, multiple HCV variants are observed within individual patients that differ from each other by a few percent at the nucleotide level. These variants are in competition with each other, and at any given time one or a few sequences are dominant because they are most fit for the prevailing conditions (35, 74). The quasispecies distribution can vary with time, either through adaptive or neutral evolution (60). Adaptive changes are due to emergence of fitter variants as conditions facing the virus change, and neutral changes result from replacement of sequences with others of equivalent fitness.
The currently recommended therapy for chronic HCV infection is a combination of pegylated alpha interferon (peginterferon) and ribavirin for 24 to 48 weeks. Interferon (or peginterferon) provides the primary antiviral effect and can eradicate HCV even when used alone (25, 38, 42, 50, 75). Ribavirin is ineffective by itself in eliminating viremia (6, 11, 13), but in combination with peginterferon it increases the clearance rate and decreases the risk of relapse (42, 50). Sustained viral response (SVR; undetectable HCV RNA for at least 24 weeks posttherapy) is the primary goal of therapy. SVR rates with peginterferon and ribavirin therapy differ between racial groups; most notably, African Americans (AA) respond to therapy only about half as well as Caucasian Americans (CA) and Asians (9, 27, 29, 44, 53).
The reasons that peginterferon and ribavirin therapy is effective in only about half of cases are unknown, as are the reasons for the exceptionally poor SVR rate for AA patients. Host factors such as differences in immune responses to the virus and environmental factors such as alcohol use and drug tolerance can contribute to failure of therapy (9). HCV genetic variation can have a major effect on efficacy of therapy, because patients infected with HCV genotype 1 have a much lower response rate (40% to 50%) than patients with genotype 2 or 3 (75% to 80%) (18, 21, 41, 63). However, the role of genetic variation within a genotype in response to therapy is less clear, because there is little evidence that response to therapy differs between subtypes of a given genotype (e.g., patients infected with genotype 1a and 1b both respond to therapy in 40% to 50% of cases, despite the subtypes differing by 10% to 12% at the amino acid level) (5, 9). This observation is equally consistent with the possibility that there is no contribution of the variation between 1a and 1b to response to therapy or that “sensitive” and “resistant” HCV variants are found with equal frequencies in genotypes 1a and 1b.
Therefore, to determine the role of HCV genotype 1 genetic variation between infected individuals and between CA and AA patients in responses to therapy, we conducted a comprehensive analysis of the association of HCV genetic variability in the viral ORF with outcome of therapy. This analysis was part of the Study of Viral Resistance to Antiviral Therapy of Chronic Hepatitis C (Virahep-C), a multicenter clinical study designed to assess rates of response of CA and AA patients to peginterferon and ribavirin therapy and to assess reasons for nonresponse (9).
Virahep-C was a multicenter trial of peginterferon and ribavirin therapy administered to treatment-naïve participants chronically infected with HCV genotype 1 (9). Virahep-C enrolled 205 CA and 196 AA participants; the Institutional Review Boards of all centers approved the protocols, and all patients gave informed, written consent for therapy and for investigations of viral, immunological, and cell-signaling responses. Per protocol, all participants were treated with peginterferon alpha-2a (Pegasys; Roche Pharmaceuticals) (180 μg weekly self-administered by subcutaneous injection) and ribavirin (Copegus; Roche Pharmaceuticals) (1,000 mg/day for those who weighed <75 kg or 1,200 mg/day for those who weighed ≥75 kg; administered orally); treatment duration was 24 to 48 weeks for nonresponders and 48 weeks for responders. Serum HCV RNA levels were quantified using a COBAS AMPLICOR hepatitis C virus monitor test, version 2.0 (Roche Molecular Diagnostics). The primary outcome was the SVR. The SVR rate was higher among CA (52%) than AA (28%) patients, and this racial difference could not be explained by clinical factors associated with response such as age, gender, body weight, obesity, severity of the underlying hepatitis, pretreatment viral levels, or amount of drug taken. A full description of Virahep-C is provided in reference 9.
Consensus sequences for the full HCV ORF were obtained by directly sequencing overlapping nested reverse transcriptase PCR amplicons as described previously (72). Briefly, HCV RNA was isolated from plasma and nested reverse transcriptase PCR was performed. Both strands of the amplified DNAs were sequenced using ABI dye-terminator technology; sequence depth averaged over four. Sequences were assembled using Vector NTI software, and base-calling errors were corrected following inspection of the chromatograms. Mixed-base positions due to the HCV quasispecies were resolved by identifying the predominant base at each position. The extreme 3′ end of the ORF could not be amplified in 13 of the 94 samples; hence, the C-terminal 56 amino acids of NS5B were excluded from the analyses to ensure that all sequences were represented equally.
The genotype 1a and 1b samples were analyzed separately, because the sequence variation between the genotypes was anticipated to be larger than differences associated with the response class or race of the patient. Amino acid sequences were deduced from the nucleotide sequences, and all analyses were performed at the amino acid level.
Sequence alignments were done with ClustalW software. Positions that differed relative to the genotype 1a or 1b population consensus sequence were identified with Mutation Master software (70). The genotype 1a consensus sequence (data not shown) was derived from all 12 full-length ORFs in the Los Alamos National Laboratory and the European HCV databases (33) in April 2005 that were from different patients and from 5 genotype 1a ORFs that we sequenced using samples from members of non-Virahep-C cohorts. The genotype 1b population consensus sequence (data not shown) was generated from all 126 full-length ORFs from different patients in the Los Alamos and European HCV databases in January 2006. The known and predicted CD4+ and CD8+ T-cell epitope sequences were obtained from the HCV Immunology Database at the Los Alamos National Laboratory in March 2006 (73). The Shannon's entropy value (64) was calculated with Bioedit software (23). Phylogenetic trees were generated using the neighbor-joining algorithm in PHYLIP software (16). The mean genetic distance value for each group was calculated using the p-distance algorithm in the MEGA DNA analysis package (34).
Race and sex distributions were compared across response groups by use of Pearson's chi-square test for association. Normally distributed characteristics were compared across response groups by use of analysis of variance, whereas the Kruskal-Wallis nonparametric test was used when the distributions were not normally distributed. Fisher's exact test was used to compare the proportions of unique variations between marked- and poor-response groups and between CA and AA patients. Unique variations across all three response groups were compared using the Freeman-Halton exact test. To determine whether there were racial or response differences in the number of variations (unique or total) or whether any response relationship to the number of variations differed by race, we used Poisson regression models with the number of variations as outcome and race, response groups, and their interactions as independent variables. The likelihood ratio test was used to assess statistical significance of the race and response interaction. Shannon's entropy values were compared between races and response groups separately using the Mann-Whitney rank sum test, and the average genetic distances between the groups were compared using an independent-sample t test. The level of significance (α) for statistical significance was 0.01. Statistical analyses were carried out using SAS software (SAS Institute, Inc.) or SPSS version 13.0 (SPSS, Inc.).
The sequences determined in this work have been deposited in GenBank (EF407411 to EF407504).
We hypothesized that HCV genetic variation results in viruses with variable sensitivity to peginterferon and ribavirin. We further hypothesized that AA patients may be infected with HCV strains that are somewhat more resistant to therapy than are those infecting CA patients. Therefore, we analyzed the pretreatment HCV sequences from 96 Virahep-C patients evenly distributed by race (CA or AA), genotype (1a or 1b), and day 28 response to therapy (marked, intermediate, or poor). The samples were stratified by day 28 response to minimize effects on suppression of viremia from factors other than biologic response to the drugs (such as amount of drug taken), to limit the effects of viral genetic adaptation to selective pressures induced by therapy, and most importantly, because all non-SVR patients are not the same. There are patients who respond well and then relapse, those who have a partial response to the drugs but never eliminate HCV, those who stop therapy because of side effects, and “null” patients who fail to respond to therapy in the first place (our “poor” category). Defining response at day 28 distinguishes the null patients from the other types of non-SVR patients, although it may not distinguish SVR patients from relapse patients. This is important because relapse patients respond to the drugs and almost achieve SVR, but it is the biological reason for the failure of therapy in the null patients which is the central focus of these analyses. In this study, “marked” responders had a decline in HCV titers of >3.5 log10 or to undetectable levels between baseline and day 28 of therapy, “intermediate” responders had declines of 1.4 to 3.5 log, and “poor” responders had declines of <1.4 log. To eliminate the effect of drug reductions on the day 28 outcome, all 96 patients received full doses of both peginterferon and ribavirin for the first 28 days. One genotype 1b CA marked responder and one genotype 1a AA intermediate responder proved to be coinfected with both genotype 1a and 1b viruses and were dropped from the analyses, yielding a final group of 94 patients.
The baseline characteristics of these 94 patients are shown in Table Table1.1. There were no significant (P ≤ 0.05) differences among the three response groups except in baseline HCV RNA levels, platelet counts, and alpha-fetoprotein (AFP) levels. The lower platelet counts and higher AFP levels reflect slightly more severe liver disease, which is known to affect response to therapy, as do lower serum HCV RNA levels at baseline (17, 63). Lower pretreatment HCV titers in responders were also seen in the whole Virahep-C cohort (9), and although viral titer differences were associated with response to therapy, they did not account for the lower response rate in the AA patients in the full Virahep-C cohort.
HCV replicates as a quasispecies rather than as a clonal population. Therefore, the virus can be represented genetically either by characterizing the quasispecies distribution or by using the consensus sequence of the quasispecies to reflect the center of the genetic distribution in each individual. We used the consensus sequence as determined by directly sequencing HCV DNA amplified from plasma (72) for three reasons. First, detailed analysis of quasispecies variation within an individual does not assess genetic variation between different individuals undergoing antiviral therapy, which is the goal of this project. Second, interpatient genetic variation is much larger than intrapatient quasispecies variation and would overwhelm intrapatient variation. Finally, characterizing the quasispecies spectrum of the entire ORF for each isolate was not feasible on the scale needed for this study.
Directly comparing the consensus sequence determined by the method employed here with a near full-length quasispecies analysis of variants found in the same patient sample explicitly demonstrated that the consensus sequence was representative of the common quasispecies variants (76). The average amino acid difference between the consensus sequence and the individual quasispecies variants was 1.8%, which was smaller than the average distance between the quasispecies variants themselves (2.3%). Phylogenetic analyses revealed that the consensus sequence was either in the center or near the base of the phylogenetic tree for each HCV gene.
We first tested the hypothesis that distinct viral strains were found in marked compared to poor responders or in AA compared to CA patients. We generated phylogeny trees from alignments of all 32 1a or 31 1b marked- and poor-responder amino acid sequences (intermediate responders were excluded to focus on the biological extremes of response), and similar trees were generated for race categories (AA versus CA) for all samples. No clustering of sequences by day 28 response or by race was found for any viral gene for either genotype 1a or 1b (data not shown). Therefore, the response and racial groups were not infected with distinctly different HCV strains.
To determine whether any amino acid positions differed consistently between the response and race groups, we aligned the sequences by response or race class and identified positions that were conserved in at least 60% of the sequences in a given group. Eight positions at which the respective biological classes had different conserved residues were found (Table (Table2).2). In all cases, the prevalence of the dominant amino acid in the alignment was ~60% to ~70%, and in all but one case, the predominant amino acid in the alignment of the other biological group was the second most prevalent amino acid. These variations were largely conservative, with the biggest differences occurring at S or P at position 401 for genotype 1a NS5A by race. The conservative nature of these variations and frequent presence of the most prevalent residue from the other biological class indicates that the variations were unlikely to play a dominant role in causing the limited reduction in viral titers in the poor responders or the low rate of response to therapy in the AA patients.
Because discrete differences at a few key residues were unlikely to explain the differences in response to therapy, we next assessed diversity differences between the response or race groups. These diversity analyses were designed to detect genes or regions of genes at which sequences of one group were more variable than sequences in another group (e.g., genotype 1a marked versus poor responders). The simplest way to analyze genetic diversity differences between groups was to identify the amino acids in each sequence that differed from an external reference sequence and then compare the variations in each group. As reference sequences, we employed genotype 1a and 1b population-wide consensus sequences derived from all full-ORF sequences available in the public databases that were from different individuals because they represent “typical” HCV sequences without host-specific adaptations (52). The ARF protein was omitted from this analysis due to variability in its start/stop sites, leading to difficulty in establishing a reference sequence consistent with those employed for the other genes. The intermediate responders were excluded to focus on the biological extremes of response to therapy.
Each Virahep-C sequence was compared to the 1a or 1b reference sequence, positions of nonidentity were identified, and the numbers of variations were compared by response group. The total number of variations was generally higher in the marked than in the poor responders for both genotype 1a and genotype 1b, but only sequence variations in genotype 1a NS3 and genotype 1b NS2 and NS3 were significant at the P ≤ 0.01 level by a Poisson analysis. Next, to focus on differences that may be more strongly associated with response to therapy, we eliminated variations common to both the marked- and poor-responder groups because they were likely to be neutral. The number of unique variations in the sequences from the marked responder patients was generally higher than in those from the poor responders; these differences were statistically significant for genotype 1a E2, NS3, and NS5A (Fig. (Fig.2A)2A) and genotype 1b core, E2, NS2, and NS3 (Fig. (Fig.2B2B).
We extended these analyses of viral diversity between HCV-infected patients by employing other measures of genetic diversity, including the proportion of unique relative to nonunique variations assessed by the Fisher's exact test (data not shown), Shannon's entropy (a quantitative measure of diversity in an alignment [59, 64; Fig. Fig.3]),3]), the number and location of focal Shannon's entropy differences (Table (Table3),3), and the mean pair-wise protein distance for all proteins in an alignment (Fig. (Fig.4).4). For genotype 1a, samples from marked responder patients were consistently more diverse than those from poor responders in NS3 and NS5A, with E2 often approaching significance. For genotype 1b, samples from marked responders were consistently more diverse than those from poor responders in core and NS3. Together, these analyses indicate that HCV sequences from marked responders were significantly more diverse than sequences from poor responders in a subset of viral genes.
We next asked whether the intermediate samples had a distinct genetic diversity pattern or were intermediate in diversity between the marked- and poor-responder groups. We identified variations relative to the external reference sequences in the marked-, intermediate-, and poor-responder groups and then compared the numbers of these variations among all three response classes. In general, the marked responders had the most unique variations, the poor responders had the fewest, and the results for the intermediate responders were in between (Fig. (Fig.5).5). This pattern was supported by assessing the proportion of unique compared to nonunique variations by use of the Freeman-Halton exact test (data not shown). Therefore, there was a continuum in the degree of variation in the three response classes, with variation being highest in the marked-responder samples, lower in the intermediate-responder samples, and lowest in the poor-responder samples.
We next repeated the diversity analyses by comparing the sequences by race of the patient rather than by response to therapy to ask whether there were race-specific viral diversity differences. These analyses included all 94 sequences. When all variations were considered, there was a trend of greater diversity for the CA patients, but this was significant only for genotype 1b E1 at P ≤ 0.01. When variations unique to either the AA or CA sequences were considered to minimize neutral genetic variation, no significant differences were observed for genotype 1a (Fig. (Fig.6A).6A). For genotype 1b the number of variations was significantly higher for CA patients in E1, E2, NS2, and NS5B (Fig. (Fig.6B6B).
We extended these analyses of viral diversity by analyzing the proportion of unique relative to nonunique variations (data not shown), Shannon's entropy values (Fig. (Fig.7),7), the numbers and locations of focal Shannon's entropy differences (Table (Table3),3), and the mean pair-wise protein distance for all proteins in an alignment (Fig. (Fig.8).8). No significant differences were found for genotype 1a, but differences were found most consistently for genotype 1b in E1 and NS2, with the CA sequences being more diverse than the AA sequences. Therefore, in contrast to the large number of significant diversity differences observed in isolates from patients with divergent responses to therapy, significant differences were found when the sequences were compared by race of the patients exclusively in genotype 1b.
We next asked whether the race of the patient influenced variation associated with response to therapy. We first examined all variations relative to the reference consensus sequences in samples from the marked and poor responders and asked whether the AA and CA samples had the same pattern of variation within each response class. For genotype 1a, no gene results were significant at P ≤ 0.01, but those for NS5B approached significance (P = 0.014), with the marked-responder CA samples being more diverse than the marked-responder AA samples. For genotype 1b, only NS4A results achieved significance, with the marked-responder CA samples having fewer variations than the marked-responder AA samples. When this analysis was limited to variations unique to either the marked- or poor-responder samples to minimize neutral genetic variation, genotype 1a NS5B results approached significance (CA marked responders > AA marked responders), and for genotype 1b E1 the CA poor-responder samples were more diverse than the AA poor-responder samples (Fig. (Fig.9).9). Finally, we stratified the marked-responder and poor-responder samples by both response and race and identified variations that were unique to each of the four race/response groups. Using a log-likelihood ratio test to determine whether the variations were distributed differently among the groups, we found no significant differences for either genotype 1a or 1b in any of the genes (data not shown). Therefore, the effect of race on variations associated with response to therapy was relatively small. However, we may have underestimated the effect of race on variation associated with response because the power of this analysis was lower than when race and response were assessed separately.
Interpatient HCV diversity differences associated with response could be due to preexisting variation leading to resistant and sensitive isolates, and/or they could be due to stronger immune pressures in the marked responders inducing more escape mutations (10, 20, 52, 67, 68). The latter possibility is plausible because anti-HCV CD4+ T-cell responses were stronger in Virahep-C patients who had sustained clearance of the virus than in those who in whom HCV infection was not eradicated (Rosen et al., manuscript in press). We therefore asked whether immune pressures were sufficient to produce the diversity differences associated with response to therapy. We removed the E1 and E2 glycoproteins from the analysis because they are subject to humoral selective pressures and then used a Poisson analysis to ask whether there were diversity differences in the remaining genes outside of all known or predicted T-cell epitopes. This strategy overestimates the number of T-cell epitopes because HLA restriction is not considered, but it eliminates almost all targets of cell-mediated selective immune pressures in the nonepitope sequences. When all variations outside T-cell epitopes were considered, NS3 and the full polyprotein from marked responders were significantly more diverse than those from poor responders for both genotypes 1a and 1b (data not shown). When variations unique to the marked or poor responders outside of the T-cell epitopes were considered, the diversity differences were significant in the nonepitope regions of genotype 1a NS3 and NS5A (Fig. 10A). For genotype 1b, the diversity differences remained significant in the core, NS2, and NS3 (Fig. 10B). Very similar results were obtained when the proportion of unique versus nonunique variations in the nonepitope sequences from the marked- and poor-responder patients was assessed using Fisher's exact test (data not shown). Therefore, significant diversity differences were present in sequences that were very unlikely to be under immune selection; hence, these differences must be due to allelic variation. The correlation of these allelic variations with response to therapy directly implies that genetic diversity differences modulated sensitivity of the isolates to therapy with peginterferon and ribavirin.
We next examined the pattern of genetic variation in NS3 more closely because of its strong association with response to therapy for both genotypes 1a and 1b. The N-terminal domain of NS3 contains a protease that cleaves NS3-NS5B from the polyprotein and also degrades cellular signal transduction molecules to block the type 1 interferon response to double-stranded RNA (reviewed in reference 30). The C-terminal domain contains an RNA helicase activity that presumably functions during RNA replication. We predicted that variations correlating with response to therapy would be concentrated in the protease domain because of its role in blocking induction of alpha interferon. Unexpectedly, significantly higher diversity in the marked responders was found exclusively in the helicase domain for both genotype 1a (Fig. 11A and B) and genotype 1b (Fig. 11C and D). Similar results were obtained when the proportion of unique versus nonunique variations was assessed using Fisher's exact test (data not shown).
Variation in NS5A correlated highly with response to therapy for genotype 1a. NS5A has three proposed structural domains (I, II, and III) separated by two low-complexity sequences (LCS I and II) (66), and within these structural domains are many putative functional motifs (39). Domain I contains the membrane anchor and a Zn++ binding site, domain II contains the interferon-sensitivity-determining region (ISDR) and an overlapping protein kinase R (PKR) binding site, and domain III contains a polyproline cluster and a variable region (V3) for which higher diversity has been correlated with response to therapy (14, 36, 46). Genetic diversity associated with response to therapy was widely spread throughout genotype 1a NS5A, and consequently the differences in the numbers of total or unique variations between the marked and poor responders did not achieve significance for any of the individual domains at P ≤ 0.01 (data not shown). However, genotype 1a marked-responder sequences had a significantly higher proportion of unique variations relative to total variations than the poor-responder sequences for domain I, domain III, and V3 by Fisher's exact test (data not shown). Finally, the ISDR has been widely studied as a potential genetic modulator of interferon sensitivity (47, 71). In this study, higher variation in the ISDR/PKR binding site correlated with response to therapy only for genotype 1a CA patients (Fig. (Fig.1212).
The effects of genetic diversity of HCV on the success of antiviral therapy are complex for many reasons. First, two drugs are used, both drugs are likely to have more than one effect, and peginterferon at least does not act directly on any of the HCV proteins. Second, viral genetic variability is only one factor in response; host genetics and immune function are also important. Third, even closely related HCV subtypes such as 1a and 1b differ by ~10% to ~12% at the amino acid level; hence, isolates from different genotypes or subtypes may not be affected the same way at the molecular level by the drugs even when overall response rates to therapy are similar. Finally, all HCV isolates from treated individuals are wild-type strains that established chronic infections; hence, stark differences between isolates are not expected in parameters essential for viral replication or immune evasion. Rather, differences in susceptibility to therapy between isolates from different individuals are likely to be due to quantitative differences in viral functions, with some isolates being somewhat more sensitive than others. Therefore, variations at simple genetic motifs are unlikely to correlate highly with response. Consistent with this prediction, we found only three positions at which variation showed ≥60% correlation with response to therapy but found many diversity differences that were highly correlated with day 28 virological response.
The diversity data summarized in Table Table44 led to three conclusions. First, there were strong associations between response to therapy and genetic diversity of the consensus sequences from infected individuals for a few viral genes but little to no association for the other genes. Higher variation was found by multiple analytical methods in sequences from the marked compared to poor responders in genotype 1a NS3 and NS5A and in genotype 1b core and NS3. Associations with response to therapy were also seen with E1, E2, and NS2, depending on the analytical method used. Second, the patterns of association between genetic diversity and response to therapy were not the same for genotypes 1a and 1b. This implies that the two genotypes may be affected by therapy through somewhat different molecular mechanisms. Finally, diversity differences correlating with the race of the patient were seen only for genotype 1b.
The diversity differences between infected individuals correlating with response to therapy may be due to either allelic variation leading to differential sensitivity to therapy and/or to differential accumulation of escape mutations due to differing immune pressures in the marked and poor responders. Immune escape mutations can reduce viral fitness (10, 52, 68), and our data do not preclude such effects in this cohort. However, genetic variability was also significantly higher for the marked compared to poor responders in sequences that are very unlikely to be targets of cellular or humoral immune selective pressures (Fig. (Fig.10).10). These include nonepitope regions in core, NS3, and NS5A where the correlation between genetic diversity and response to therapy was strongest. Therefore, selection of immune escape variants cannot explain all of the correlation of pretreatment viral genetic variations with response to therapy. This implies that the higher diversity in the marked samples was due to preexisting allelic variation directly associated with response to therapy. It is well accepted that HCV isolates can differ in sensitivity to alpha interferon-based therapy at the level of the major genotypes (e.g., genotypes 2 and 3 are more sensitive than genotype 1 [18, 21, 41, 63]). However, the idea that interferon-sensitive and -resistant isolates exist within the major genotypes is controversial. The strong association of viral diversity with early response to therapy within both genotypes 1a and 1b reported here indicates that even within the major genotypes, viral genetic variability leads to differential sensitivity to peginterferon and ribavirin therapy.
A possible confounding issue concerning the relevance of pretherapy HCV sequences to the outcome of therapy is that HCV replicates as a quasispecies and hence can evolve rapidly in response to the selective pressures induced by antiviral therapy. Viral evolution during the first month of therapy could skew our results by misclassifying sensitive HCV sequences into the poor-responder group or resistant sequences into the marked-responder group. Misclassifying resistant sequences into the marked-responder group was not a major problem in this study because 28 of the 31 marked responders achieved SVR; hence, the large majority of HCV quasispecies variants in the marked responders were sensitive to therapy. It is possible that some of the pretreatment sequences in the poor-responder group were sensitive to the drugs and were rapidly supplanted by resistant strains during therapy. To evaluate the degree of viral evolution during therapy, we are sequencing the full ORFs at 24 weeks posttherapy from the 1a patients who did not achieve SVR. Posttherapy sequence data have been obtained for the full NS3 gene and for NS5A amino acids 1 to 328 (of 448) from all 24 1a non-SVR patients for whom a follow-up week 24 blood sample is available. In NS3 the median numbers of variations relative to the reference sequence were 18 prior to therapy and 19 after therapy. When the pre- and posttherapy NS3 sequences from the same patient were compared, the median number of amino acid changes per patient was only 1. In the first 328 amino acids of NS5A, the median number of total variations relative to the reference sequence in both the pre- and posttherapy sequences was 14, and the median number of amino acid changes between the pre- and posttherapy sequences from the same patient was again just 1. Therefore, the pre- and posttherapy sequences were very similar in NS3 and NS5A, the two genotype 1a genes in which high pretherapy diversity correlated with suppression of viremia. This indicates that viral evolution during therapy does not alter the observation that higher sequence diversity correlates with successful suppression of HCV during therapy.
Alpha interferon causes the large majority of the decline in viral titers by day 28 during combination therapy (12); hence, our marked- and poor-response categories were defined primarily by the response to peginterferon. Alpha interferon is a key player in the innate immune system, and HCV has evolved strategies to inhibit its effects (reviewed in references 22, 26, 48, 51, 54, and 60). Therefore, the variable sensitivity to peginterferon of HCV isolates from different individuals is likely due to variations in the sequences and functions of viral proteins that dampen host antiviral responses. The strongest correlations between response to therapy and viral variation were in NS3 and NS5A for genotype 1a and in core and NS3 for 1b. Importantly, all of these genes can actively block the action of alpha interferon in vitro (reviewed in reference 19). Therefore, the genetic data reported here provide strong support for a role for core, NS3, and NS5A (and, to a lesser extent, E2) in antagonizing the type 1 interferon response induced pharmacologically during treatment of HCV infections in humans.
Core residues R70 and M91 have been associated with response to therapy (2), but few other studies have examined the role of diversity in the core in the outcome of therapy. We observed a similar association of R70 with marked response for genotype 1b but not 1a; however, M91 was highly dominant in both the marked- and poor-responder sequences. E2 has been more thoroughly studied, because it has a PKR-phosphorylation homology domain (PePHD) which inhibits PKR in vitro (65). Increased diversity in the PePHD has been correlated with response to therapy in studies of patients infected with genotype 3 (58) but not in other studies that included genotypes 1 to 3 (1, 4, 49, 56, 57). We found that by some measures, E2 from marked responders was more diverse than E2 from poor responders for genotype 1a, and this difference approached significance for genotype 1b. However, these differences clustered in the N-terminal third of E2, and there were almost no variations among the isolates in the PePHD for either genotype 1a or 1b.
We observed a strong correlation between HCV diversity in NS3 between infected individuals and response to therapy, and to our knowledge variation in NS3 has not previously been associated with response to alpha interferon-based therapy in humans. The NS3 protease can block induction of type 1 interferon in response in vitro by cleaving signal transduction molecules downstream of the double-stranded RNA sensors Rig-I and TLR3 (reviewed in reference 30). In our sequences, diversity differences in NS3 correlating with response to therapy mapped exclusively to the helicase domain rather than the protease domain (Fig. (Fig.11).11). This implies that variation in NS3 correlating with response to therapy may affect the helicase activity rather than the protease activity or that the helicase domain may modulate the activity of the protease domain allosterically; interdomain communication within NS3 has been reported previously (32). These results also raise the possibility that variability in the viral replication rate modulated by variation in helicase activity may correlate with sensitivity to antiviral therapy and/or that the helicase may limit accumulation of double-stranded RNA that can trigger innate immune responses.
We found significant differences between viral sequences from marked and poor responders in NS5A, but only in genotype 1a. The functions of NS5A during HCV infection are unknown, but NS5A can attenuate the type 1 interferon response in vitro (19, 24, 39). Variation within a short region of NS5A termed the ISDR (15) has been associated with interferon sensitivity during antiviral therapy in some populations (47, 71). We observed higher variability in the ISDR of marked responders relative to the poor responders, but only in genotype 1a CA patients (Fig. (Fig.12).12). The role of genetic variation in NS5A outside the ISDR or the overlapping PKR binding site in response to therapy has been investigated in relatively few studies (14, 36, 46, 69). These studies found greater diversity in responders to alpha interferon-based therapy, especially in the carboxy-terminal half of the protein which includes the V3 hypervariable region (28). The Virahep-C samples were more diverse in the marked responders primarily in the carboxy-terminal half of the protein, including the V3 region (data not shown), so our results agree with previous studies of the full NS5A gene. Because most of the variability associated with response in NS5A in our samples was outside the ISDR/PKR binding site, variation in the ISDR may have modulated sensitivity to therapy in some patients, but this is not the only contribution that diversity in NS5A could have made to response of HCV to therapy.
Two recent studies examined diversity differences in E2 and NS5A between CA and AA patients (31, 36). In E2, sequences from CA patients were more diverse than those from AA patients for both genotype 1a and genotype 1b. This diversity was very significant in genotype 1b, with the majority of the difference clustering in the middle third of the protein. In NS5A, higher viral V3 variability was found for CA than for AA patients, and this increased variability correlated with response to therapy (36). We found a trend to higher diversity in the CA compared to the AA sequences throughout the viral ORF. For genotype 1a, these differences did not achieve statistical significance for any gene, but for 1b significant differences in E1, NS2, and NS5B were found by at least three analytical methods. The association between diversity and race for NS2 is novel, but the implications of this association are not clear because the role(s) of NS2 in HCV biology is not well known.
This study is the most comprehensive analysis to date of HCV genetic variation in the AA population. Our data do not support the idea that unusual genotype 1a HCV strains infecting AA patients exist, but there may be minor strain differences for genotype 1b between the AA and CA populations that could contribute to the poor response of AA patients to therapy. However, this equivocal conclusion does not preclude a role for viral diversity contributing to the unusually poor response of AA patients to therapy. We selected equal numbers of marked, intermediate, and poor responders in the AA and CA populations for sequence analysis, despite significantly lower response rates to therapy for the Virahep-C AA patients (28% versus 52% SVR rates) (9). The genetic similarity of HCV strains in the AA and CA patients across the three response classes and the intermingling of the AA and CA sequences in phylogenetic analyses (data not shown) indicate that very similar HCV strains circulate in the two racial groups; hence, the higher proportion of poor responders in the AA population implies that a higher proportion of relatively resistant HCV variants may circulate in the AA population, even for genotype 1a. Therefore, viral sequence variation may contribute to the unusually low response of AA patients to therapy, but the mechanism would be circulation in the AA population of a higher proportion of the same resistant strains found in the CA population rather than circulation of novel resistant strains that are absent from the CA population. Genotype 1b is more common in the AA population than in the CA population (53), so there is a precedent for differential circulation of HCV strains in the two racial groups.
The goal of antiviral therapy for HCV is SVR, the clearance of detectable viral RNA from serum for at least 6 months posttherapy. Large declines in viral titers by day 28 (such as occurred in the marked responders) correlate well with SVR (17, 43), with a major confounding factor being the failure or inability of the patients to take sufficient drug amounts during the demanding 48-week treatment regimen. A complete analysis of how viral genetic diversity is related to SVR and other clinical variables will be presented elsewhere, but we have performed an initial characterization of how pretreatment HCV genetic diversity correlates with SVR. The pretherapy sequences were stratified as SVR or non-SVR, and then the total and unique numbers of variations relative to the population consensus reference sequence, the total Shannon's entropy values, and the mean genetic distances in the two groups were compared. For both genotype 1a and genotype 1b, the polyproteins of the SVR sequences were significantly more diverse than those of non-SVR samples at P ≤ 0.01. For genotype 1a, NS3 and NS5A were significantly more diverse in the SVR samples by at least three of the four analytical methods. For genotype 1b, core was significantly more diverse in SVR patients by all four measures, and E2, NS2, and NS3 were more diverse by three of the four measures. Therefore, our basic observation that pretherapy sequence diversity was higher in the day 28 marked responders than in the poor responders for a few select viral genes is also true when the sequences are analyzed with respect to the ultimate goal of therapy, SVR.
Taking the data together, the focal distribution of viral genetic diversity at the patient level correlating with response to therapy and the inability of immune selection to account for all of this diversity lend strong support to the hypothesis that HCV genetic variation modulates efficacy of antiviral therapy in human patients. Because most of the reduction in viral levels at day 28 was due to peginterferon therapy, these data imply that the genes for which diversity differences correlate with early response to therapy function at least in part to blunt the type 1 interferon response in humans. This is consistent with the in vitro immune evasion activities reported for core, NS3, and NS5A (19). The consistently higher diversity in isolates from the marked responders (for whom the virus failed to counteract the drugs) compared to the poor responders (for whom the drugs were ineffective) implies that the functions of the proteins which counteract the effects of therapy can be impaired by amino acid differences at multiple locations. In essence, diversity would correlate with clearance because there are only a few ways to optimize activity of the viral proteins but many ways to interfere with their function.
We thank the participants of Virahep-C for their invaluable commitment of time and effort. The members of the Virahep-C study group are listed in reference 9.
Virahep-C was funded by the National Institute of Diabetes and Digestive and Kidney Diseases and the National Cancer Institute with further support under a Cooperative Research and Development Agreement with Roche Laboratories, Inc. Funding for this viral genetics project was through U01 DK60345; other grants supporting Virahep-C were U01 DK60329, U01 DK60340, U01 DK60324, U01 DK60344, U01 DK60327, U01 DK60335, U01 DK60352, U01 DK60342, U01 DK60309, U01 DK60346, U01 DK60349, and U01 DK60341. Support through the National Center for Research Resources General Clinical Research Centers Program was through grants M01 RR00645, M02 RR000079, M01 RR16500, M01 RR000042, and M01 RR00046.
Published ahead of print on 23 May 2007.