Identification of HCC-specific mutations is a challenging endeavor. HCV’s great diversity makes it difficult to perform a comparative analysis among different HCV genotypes or subtypes. The existence of ethnically or geographically specific mutations is also a concern.8
More importantly, even if putative HCC-associated mutations are observed, it is not known if these mutations are responsible for the HCC incidence or a simple result of evolutionary adaptation. The current study was designed to focus on a single HCV genotype (4a) in a single geographical region (Egypt). All three datasets have explicit sampling dates, patterns, and adequate numbers to provide a unique opportunity to explore the possibility of an epidemiological relationship between HCV mutations and HCC incidence.
Initial comparative analysis identified four statistically significant nucleotide substitutions in the HCC and cirrhosis groups. However, in repeated experiments, these four mutations were completely lost with the consistent appearance of alternative four mutations (). In sequence chromatograms, almost all eight mutations showed single peaks, suggesting that these mutations are not located in highly variable sites. Experimental contamination is not supported because all other sites from the same HCV isolates appear the same (). Under 70 PCR cycles, four putative false mutations over 387-bp domain give an error rate at 1.5 × 10−4
substitutions per base pair, which is well within the range of Taq DNA polymerase’s misincorporation rate of 2.1 × 10−4
to 2.0 × 10−5
errors per base pair.33–38
Thus the eight nucleotide substitutions observed are most likely not authentic. Under the same experimental procedure, the appearance on different positions in a non-random pattern from repeated experiments may be attributable to the batch to batch difference of AmpliTaq DNA polymerase. Another factor is the subtle alteration of template heterogeneity due to additional 1-year storage. The role of template heterogeneity contributing to the error rate of DNA polymerase has been ignored largely.39–41
Because of the complete sequence identity after 70-cycle PCR upon the use of plasmid DNA as the template, template heterogeneity may be a more possible factor to explain our observation. Finally, the four nucleotide mutations from the initial and repeated experiments are also present on the healthy volunteers from datasets 2 and 3, respectively (). Thus, even assuming a real nature, these mutations may just be a result of adaptive evolution without having any relationship with end-stage liver disease, either HCC or cirrhosis.
At the phylogenetic level, BaTS analysis revealed no apparent clustering in terms of their disease traits in HCC and cirrhosis. However, the inclusion of the dataset 1 (blood donors) resulted in strong association between disease traits (HCC/Cirrhosis or blood donors) (). Since the dataset 1 were sampled in 1993, such an observation may be largely due to different sampling dates rather than disease traits. Because of a small number (n=6) of HCV sequences from blood donors in the dataset 2, a univocal answer may require the analysis with the inclusion of more contemporaneously collected HCV sequences from patients without HCC/cirrhosis.
The HCC group, cirrhosis group and the dataset 1 all have significantly negative Tajima’s D values, indicating an excess of low-frequency mutations and therefore a positive selection pressure. Among datasets the HCC group has the strongest negative Tajima’s D value, corresponding with its highest dN
ratio (). Actually, while having similar dS
values, the HCC and cirrhosis have significantly higher dN
values than the dataset 1 (p<0.001) (). Taken together, these data suggest a strong evolutionary pressure of HCV in patients with end-stage liver diseases, which is consistent with previous reports in HCC patients infected with HCV genotype 1b.8,13
An important implication from this observation is a theoretically enhanced chance for the detection of putative HCC or cirrhosis-specific mutations, which requires caution in data interpretation since the mutations identified may simply be the consequence of adaptation.
It should be noted that our analysis was based on a short HCV domain, the 387-bp partial Core/E1 region. Comprehensive understanding of HCC-specific mutations and/or strains may require a full-length HCV genome scanning as well as the availability of adequate number of samples collected in both simultaneous and longitudinal patterns. In this setting, the current study represents a proof-of-concept investigation in terms of experimental approaches and phylogenetic techniques to address this elusive but clinically important issue.