Phylogenetic analysis of sequences from the experimental samples revealed that mixed genotype, quasispecies, and a high degree of viral diversity did not contribute to problems with genotyping. As expected, the different software programs used yielded similar results, thereby eliminating the software program as a reason for differences in genotype assignments. Although there was variation in the subtypes assigned by the different programs (most from the NCBI program), the genotypes were in complete concordance, except for that of patient E1, even when the 5′ UTR and the NS5B region were compared. Moreover, the sequence analysis was able to identify genotypes that were previously untypeable by commercial assays. For patient E3, the genotype obtained outside the NIH was later proved incorrect by 5′ UTR and NS5B region sequencing. It was also found that one of the samples that was randomly chosen as a control (from patient C4) in fact had genotype 1a instead of 1b. These results show that even though the commercial assays are generally viewed as reliable, errors might occur, and there are factors that must be considered for faithful assignment of genotypes.
In evaluation of the factors critical to accurate genotype assignment, sequence length variation was identified as a source of genotyping discrepancies. As sequences were shortened, the accuracy in genotyping dropped markedly, particularly for sequences of <200 bp, regardless of which end of the sequence was shortened. This might be the threshold sequence length, and any genotype assigned using a sequence that is less than this length might be viewed as unreliable.
Our small data set is a limitation of our study, and to address this, we looked at prototype sequences to determine whether sequence shortening produced similar effects. Shortening of the prototype sequences mirrored results from our experimental and control sequences, thereby providing further validation and generalizability of our results. When the different software programs were compared, it seems that MEGA showed the most accurate results after shortening, while the NCBI program had the greatest drops in accuracy. Furthermore, our data suggest that genotyping accuracy might be affected more by shortening at the 3′ end of the 5′ UTR than by shortening at the 5′ end or symmetrically. All in all, the results from our study demonstrate how sequence length and, to a lesser degree, the specific region used (i.e., the 3′ end of the 5′ UTR) are critical factors involved in determining the genotype. The failure of the Versant assay to correctly genotype these samples might have been due to these factors.
The 5′ UTR is relatively more conserved than the other regions of the virus, making PCR amplification straightforward; thus, it has been used mainly for the detection of HCV infection (11
). Since the 5′ UTR is the most common target in diagnostic HCV RNA assays, it has also been the substrate used for most genotyping assays. Although the 5′ UTR is well conserved, there are generally sufficient nucleotide differences to discriminate between most genotypes (11
). Ironically, the high sequence conservation of the 5′ UTR makes it difficult to distinguish between all genotypes and subtypes. HCV genotype 6 variants from southeast Asia have a 5′ UTR sequence that is identical to that of HCV genotype 1b and can be mistyped (16
). This might be a reason for the conflicting genotype assignments for patient E3. In our study, most of the disagreement among the different software assignments occurred when the 5′ UTR subtypes were compared. Genotyping assays that use sequence information from both the 5′ UTR and the core region have allowed for more improved and accurate distinction between genotypes and subtypes (18
). Our results further confirm that adding the core region for sequence analysis will possibly alleviate some of the 5′ UTR genotyping problems.
Genotyping by sequence analysis of the NS5B region has been shown to correlate well with genotyping using the 5′ UTR, with the added advantage of superior subtyping (21
). To some extent, the NS5B region is considered the gold standard for genotyping and subtyping. In our analysis, there was complete agreement in the genotype assignments and overall subtype assignments with the 5′ UTR and the NS5B region. While NS5B region-based genotyping methods are preferred for the precise identification of subtypes, amplification of the NS5B region is quite challenging, as demonstrated by the 37.5% failure rate in our study. Subtyping appears to be more important for treatment outcome and development of resistance to the new direct-acting antivirals (10
). However, genotype determination by 5′ UTR-based assays seems to be satisfactory for most clinical applications (11
In conclusion, the accuracy of genotyping might be determined mainly by the amount of 5′ UTR sequence analyzed and less by the region within the 5′ UTR analyzed or other factors. While rare, discrepancies in genotyping of the 5′ UTR, which can lead to suboptimal treatment, might exist. If clinical outcomes are inconsistent with genotypes, then further investigation should be undertaken.