|Home | About | Journals | Submit | Contact Us | Français|
Accurate genotyping of hepatitis C virus (HCV) is important for determining the optimal regimen, dose, and duration of antiviral therapy for chronic HCV infection, as well as for estimating the response rate. The 5′ untranslated region (UTR) of HCV RNA is used in commercial genotyping, but the probes and the lengths of the amplicons are proprietary and vary among the assays. In this study, factors involved in the reliable determination of HCV genotypes utilizing the 5′ UTR were evaluated. Serum samples from four subjects with chronic HCV infection and disparate results on commercial genotyping and four controls were analyzed. HCV RNA was extracted from serum samples, and the 5′ UTR and NS5B region were sequenced. Ten clones from each region were compared to prototype sequences and analyzed for genotype assignment using five programs. The results were compared to those from commercial assays. 5′ UTR sequences were sequentially shortened from either the 5′ end, the 3′ end, or both ends, with genotyping of the resultant fragments. Sequences were obtained for the 5′ UTR in all eight subjects and for the NS5B region in five subjects. The genotype assignments were identical between the two regions in the five subjects with complete sequencing. Genotyping by sequencing gave different results than those from the commercial assays in the four experimental samples but agreed in the four controls. Shortening of the sequences affected the results, and the results for sequences of <200 bases were inaccurate. Neither the Hamming distance nor the quasispecies affected the results. Sequencing of the HCV 5′ UTR provided reliable genotyping results and resolved discrepancies identified in commercial assays, but genotyping by sequencing was highly dependent upon sequence length.
Hepatitis C virus (HCV), a member of the Flaviviridae family, possesses a single-stranded positive-sense RNA genome approximately 9.6 kb in length. The HCV genome is characterized by 5′ and 3′ untranslated regions (UTRs) that abut a single large open reading frame. The single polyprotein translated from the HCV genome includes an initial structural region comprising envelope (E1 and E2) and core regions and a nonstructural region that is posttranslationally processed into at least 4 polypeptides (NS2 to NS5). In a manner similar to that of other RNA viruses, HCV circulates in infected individuals as a population of closely related but diverse viral sequences, referred to as quasispecies. Heterogeneity of the HCV genome is a consequence of mutations during viral replication, which is mainly due to the error-prone viral RNA polymerase and the lack of an associated repair mechanism. Different HCV isolates display significant nucleotide sequence variability depending on the genomic region. The E1 and E2 regions are highly variable, whereas the 5′ UTR is the most conserved (1, 2).
The accumulation of mutations in the viral genome over time has led to the emergence of multiple HCV genotypes. Phylogenetic analysis of full-length or partial sequences of HCV strains from various regions of the world has identified 6 HCV genotypes numbered 1 to 6 and multiple subtypes (1a, 1b, etc.) (3–5). Members of different genotypes vary by over 30% at the nucleotide sequence level, while the subtypes vary by 20 to 23% (6). Genotype variation has substantial clinical implications in terms of treatment response, duration, dosage, and sensitivity to different HCV protease (NS3) inhibitors. Patients chronically infected with HCV genotype 1 have a lower response rate to combination therapy (pegylated interferon and ribavirin) than patients infected with genotype 2 or 3 (~45% versus 75%) and require a longer course of treatment (48 versus 24 weeks) with higher ribavirin doses for optimal response rates. Patients with genotype 4 infection have an intermediate rate of response to pegylated interferon and ribavirin (7, 8). Studies with new protease inhibitors such as telaprevir and boceprevir have shown an increase in the response rate for combination therapy in patients with genotype 1, which might be more effective in those with genotype 1b than in those with genotype 1a (9, 10).
Due to the clinical importance of genotyping, there is a demand for fast, easy, cost-effective, and reliable assays. While nucleotide sequencing of the coding region (such as NS5B or core/E1) of the HCV genome followed by phylogenetic analysis is the gold standard of genotyping, this method is both expensive and time-consuming. Almost all commercially available genotyping assays use direct hybridization and probes that are directed against the 5′ UTR of HCV (8). Genotyping errors might occur because variable lengths and regions of the 5′ UTR are analyzed. In some instances, the hybridization methods do not give subtype information. Data on specific determinants of accuracy in 5′ UTR-based genotyping are lacking.
The present study grew out of discrepancies in the results of commercial assays for genotyping of four patients done both inside and outside the Clinical Center of the National Institutes of Health (NIH). Using four additional controls, we explored the question of whether these patients had mixed genotypes or had a high degree of viral diversity that made them difficult to genotype by hybridization. The aim was to evaluate the HCV 5′ UTR for critical factors involved in determining genotypes by sequencing the 5′ UTR and NS5B region of HCV RNA in these patients. Using five different software programs, we analyzed the sequences and compared the results to those of the commercial assays.
Serum samples from eight different subjects were selected for postamplification cloning and sequencing of the 5′ UTR and the NS5B region of HCV RNA. Four experimental subjects were chosen due to problems with results of commercial assays. Three subjects had genotype assignments from genotyping performed at other institutions prior to their arrival at the NIH. However, repeated genotyping at the NIH failed to assign a genotype or the results conflicted with the genotype assigned at the outside institutions. One patient did not have a genotype from an outside institution and could not be genotyped at the NIH (see Table 2). Genotyping of samples at the NIH was done using the Versant HCV genotype 2.0 assay (LiPA) (Siemens, Tarrytown, NY). If a sample could not be genotyped, then genotyping was repeated using the same assay with a new serum sample. Samples were considered untypeable only if both specimens could not be genotyped. Four additional samples were randomly selected from the same studies and the same time points to be used as controls. None of the eight patients had received antiviral therapy. All patients had given informed consent for institutional review board (IRB)-approved studies.
Serum samples for sequencing were stored at −80°C before use and thawed on one occasion only. The experimental and control samples were coded, and the PCR and sequence analysis were done without knowledge of which samples were linked. RNA was extracted from the serum using a QIAamp viral RNA mini kit (Qiagen, Valencia, CA) according to the instructions provided. With this kit, RNA was extracted from 140 μl of serum and finally eluted into 30 μl of RNase-free water.
Reverse transcription (RT) was performed with SuperScript III (Invitrogen, Carlsbad, CA). With 50 pmol of antisense primer 9410R (5′-CTCAGGCCTATTGGCCTGGAG-3′), which binds to the 3′ UTR, 4 μl of buffer, 1 μl of deoxynucleoside triphosphate (dNTP) (Invitrogen), 0.5 μl of RNaseOUT (Invitrogen), and 8 μl of RNA, RT was performed as follows: 2 min at 42°C and then 1 μl of SuperScript III was added, followed by 60 min at 42°C.
PCR with Pfu DNA polymerase (Stratagene, La Jolla, CA) was used to amplify the 5′ UTR, including a part of the core and NS5B regions. PCR was performed by one round (40 cycles) for the 5′ UTR with 50 pmol of primer 45F (5′-CTGTGAGGAACTACTGTCTTC-3′) and 394R (5′ GCGGTTGGTGTTACGTTT-3′). The NS5B region was amplified with primer 7539F (5′-CTCAGTGACGGGTCTTGGTC-3′) and 9365R (5′-GGGGAGCAGGTAGATGCCTAC-3′) for the 1st-round PCR. Primer 7589F (5′-SGTSTGCTGCTCNATGTC-3′) or 8401F (5′-ATTCAAAAGGGCAGAACTGCG-3′) and primer 8622R (5′-CTACGAGTCTTCACGGAGG-3′) or 9359R (5′-CAGGTAGATGCCTACCCCTAC-3′) were used for nested PCR. With the use of the PTC-200 DNA Engine thermal cycler (Bio-Rad Laboratories, Inc., Hercules, CA), the PCR conditions were as follows: 95°C for 5 min, followed by 5 cycles of 30 s at 95°C, 45 s at 58°C, and 3 min at 72°C, followed by another 35 cycles of 30 s at 95°C, 45 s at 55°C, and 3 min at 72°C, with a final extension of 72°C for 10 min. PCR fragments were electrophoresed and then purified with a QIAquick gel extraction kit (Qiagen, Valencia, CA).
The purified PCR products were cloned with a TOPO TA cloning kit (Invitrogen, Carlsbad, CA) according to the instructions provided. Ten clones from the 5′ UTR and the NS5B region were analyzed for each patient. The viral genotype was determined by the sequence of the 5′ UTR, including a part of the core and NS5B regions, using phylogenetic tree analysis to prototype sequences representing all 6 genotypes from GenBank (http://www.ncbi.nlm.nih.gov/GenBank/) (see Table S1 in the supplemental material for a complete list of the reference sequences). The following genotyping programs were used: CLC Workbench 6 (CLC bio, Cambridge, MA), Geneious 5.0 (Biomatters Ltd., Auckland, New Zealand), MEGA 4 (Center for Evolutionary Functional Genomics, Tempe, AZ), and MacVector 11.1 (MacVector, Inc., Cary, NC). Genotyping software on the National Center for Biotechnology Information (NCBI) website (http://www.ncbi.nlm.nih.gov/projects/genotyping/formpage.cgi) was also used. Default settings for all the programs were used to assign genotypes. The quasispecies diversity was evaluated using the Hamming distance (Hd) calculated from 10 clones as follows: Hd = (1 − s) × 100, where s is the fraction of shared sites in the 2 aligned sequences.
Sequence shortening was performed in silico. The 5′ UTRs were then sequentially shortened from either the 5′ end, the 3′ end, or both ends to determine whether the sequence length affected the accuracy of genotyping. These sequences were then analyzed as described above using the same software. Genotyping accuracy was determined by calculating the percentage of samples whose genotypes remained unchanged after sequence shortening.
The clinical features of the patients are shown in Table 1. Experimental subjects included 2 men and 2 women, ages 42 to 57 years. Their levels of HCV RNA at the initial visit were 2,040,000, 492,000, 1,140,000, and 1,070,000 IU/ml. The four control subjects were all men, ages 51 to 63 years, with initial HCV RNA levels of 1,890,000, 331,000, 2,150,000, and 6,140,000 IU/ml. Three experimental subjects genotyped outside the NIH were reported to have genotypes 6c, 2b, and 1b. One experimental subject genotyped at the NIH had genotype 6. Patients in the control group genotyped at the NIH had genotypes 1a, 1, 1b, and 1b. Although the commercial assay was not able to subtype patient C2, our sequence analysis revealed that this patient had genotype 1a.
Nucleotide sequencing was successful in the 5′ UTR for 8 samples and in the NS5B region for 5 samples (Table 2). The genotype assignments were assessed using different programs in order to rule out the software program as a factor in genotyping discrepancies. Experimental and control viral sequences were aligned to reference sequences from GenBank, and neighbor-joining trees were made to show evolutionary relationships (see Fig. S1 and S2 in the supplemental material). In the 5′ UTR sequences, all of the genotypes were concordant except for the NCBI program with patient E1. There were some differences in subtypes among the programs. Only one patient sample (C1) had complete agreement in the genotype and subtype assignments from all five software programs in both the 5′ UTR and the NS5B region. In the NS5B region sequences, there was complete agreement with all the genotypes and subtypes assigned by the different software programs.
A possible reason for genotyping discrepancies was the existence of multiple genotypes in the same sample. We therefore looked for evidence of this. There were no mixed genotypes among the 10 clones for any of the eight samples. The average Hamming distances of the experimental and control sequences were not different (0.002 versus 0.0015). When analyzed individually, quasispecies in the 5′ UTR and the NS5B region did not affect the genotype analysis (data not shown).
The viral sequence length was assessed as a possible source of the genotyping discrepancies. Consecutive base pair (bp) shortening reduced the accuracy in 5 programs for the experimental, control, and prototype sequences (Fig. 1). In the experimental and control subjects, shortening from the 5′ end of the 5′ UTR side was more accurate than shortening from the 3′ end and from both ends simultaneously. All sequences of <200 bp were inaccurate in both the experimental and the control subjects. Only MEGA showed more than 80% accuracy when sequences were shortened from either the 5′ or the 3′ end but not from both ends simultaneously. Prototype sequences representing all major genotypes were then analyzed. This was done to assess if the findings from the experimental and control sets were universally applicable. Shortening of the prototype sequences showed patterns that were similar to those with the experimental and control sequences with the greatest drops in accuracy of <200 bp.
Phylogenetic analysis of sequences from the experimental samples revealed that mixed genotype, quasispecies, and a high degree of viral diversity did not contribute to problems with genotyping. As expected, the different software programs used yielded similar results, thereby eliminating the software program as a reason for differences in genotype assignments. Although there was variation in the subtypes assigned by the different programs (most from the NCBI program), the genotypes were in complete concordance, except for that of patient E1, even when the 5′ UTR and the NS5B region were compared. Moreover, the sequence analysis was able to identify genotypes that were previously untypeable by commercial assays. For patient E3, the genotype obtained outside the NIH was later proved incorrect by 5′ UTR and NS5B region sequencing. It was also found that one of the samples that was randomly chosen as a control (from patient C4) in fact had genotype 1a instead of 1b. These results show that even though the commercial assays are generally viewed as reliable, errors might occur, and there are factors that must be considered for faithful assignment of genotypes.
In evaluation of the factors critical to accurate genotype assignment, sequence length variation was identified as a source of genotyping discrepancies. As sequences were shortened, the accuracy in genotyping dropped markedly, particularly for sequences of <200 bp, regardless of which end of the sequence was shortened. This might be the threshold sequence length, and any genotype assigned using a sequence that is less than this length might be viewed as unreliable.
Our small data set is a limitation of our study, and to address this, we looked at prototype sequences to determine whether sequence shortening produced similar effects. Shortening of the prototype sequences mirrored results from our experimental and control sequences, thereby providing further validation and generalizability of our results. When the different software programs were compared, it seems that MEGA showed the most accurate results after shortening, while the NCBI program had the greatest drops in accuracy. Furthermore, our data suggest that genotyping accuracy might be affected more by shortening at the 3′ end of the 5′ UTR than by shortening at the 5′ end or symmetrically. All in all, the results from our study demonstrate how sequence length and, to a lesser degree, the specific region used (i.e., the 3′ end of the 5′ UTR) are critical factors involved in determining the genotype. The failure of the Versant assay to correctly genotype these samples might have been due to these factors.
The 5′ UTR is relatively more conserved than the other regions of the virus, making PCR amplification straightforward; thus, it has been used mainly for the detection of HCV infection (11, 12). Since the 5′ UTR is the most common target in diagnostic HCV RNA assays, it has also been the substrate used for most genotyping assays. Although the 5′ UTR is well conserved, there are generally sufficient nucleotide differences to discriminate between most genotypes (11, 13–15). Ironically, the high sequence conservation of the 5′ UTR makes it difficult to distinguish between all genotypes and subtypes. HCV genotype 6 variants from southeast Asia have a 5′ UTR sequence that is identical to that of HCV genotype 1b and can be mistyped (16, 17). This might be a reason for the conflicting genotype assignments for patient E3. In our study, most of the disagreement among the different software assignments occurred when the 5′ UTR subtypes were compared. Genotyping assays that use sequence information from both the 5′ UTR and the core region have allowed for more improved and accurate distinction between genotypes and subtypes (18–20). Our results further confirm that adding the core region for sequence analysis will possibly alleviate some of the 5′ UTR genotyping problems.
Genotyping by sequence analysis of the NS5B region has been shown to correlate well with genotyping using the 5′ UTR, with the added advantage of superior subtyping (21–24). To some extent, the NS5B region is considered the gold standard for genotyping and subtyping. In our analysis, there was complete agreement in the genotype assignments and overall subtype assignments with the 5′ UTR and the NS5B region. While NS5B region-based genotyping methods are preferred for the precise identification of subtypes, amplification of the NS5B region is quite challenging, as demonstrated by the 37.5% failure rate in our study. Subtyping appears to be more important for treatment outcome and development of resistance to the new direct-acting antivirals (10, 25). However, genotype determination by 5′ UTR-based assays seems to be satisfactory for most clinical applications (11, 23).
In conclusion, the accuracy of genotyping might be determined mainly by the amount of 5′ UTR sequence analyzed and less by the region within the 5′ UTR analyzed or other factors. While rare, discrepancies in genotyping of the 5′ UTR, which can lead to suboptimal treatment, might exist. If clinical outcomes are inconsistent with genotypes, then further investigation should be undertaken.
This work was supported by the intramural research program of the National Institute of Diabetes and Digestive and Kidney Diseases (Z01 DK054514-02 LDB) and the Clinical Center, National Institutes of Health, Bethesda, MD.
We declare no conflicts of interest.
Published ahead of print 6 March 2013
Supplemental material for this article may be found at http://dx.doi.org/10.1128/JCM.03344-12.