We analyzed the viral dynamics and viral diversification of HCV very early in acute infection. The early diversity of HCV is very low, and the inter-sequence Hamming distances follow a Poisson distribution, as would be expected when the mutations occur approximately at the same rate at all positions and the sequences are not selected for diversity
[27],
[44]. Given this observation, the number of mutations at early times should depend on the time since infection, the mutation rate and the biology of viral replication. This idea has been used before in the context of primary HIV infection to estimate the time of infection, assuming a given mutation rate
[26],
[27]. In the present study, the time of infection is known to within a short time window, with the first HCV positive sample within 5 days of the last negative sample. With this information, we could use our data to estimate the
in vivo HCV mutation rate. By developing a model of HCV replication that takes into account the details of the viral lifecycle, we found the estimated mutation rate varied among subjects between 1.6×10
−5–6.2×10
−5 mutations per nucleotide per replication cycle, with a median of 2.5×10
−5 (, 5 h genome). This estimate was very robust to different assumptions about model parameter values (see
Text S1). Moreover, we systematically made conservative assumptions for the less well known parameter values leading to higher estimates for the mutation rate. To further confirm our results, we estimated the mutation rate by a completely different approach based on the frequency of stop codons (non-sense mutations), corrected by the number of non-sense mutation targets, as proposed by Cuevas
et al.
[13]. With this calculation we obtained a mutation rate of 2.8×10
−5 or 3.2×10
−5 mutations per nucleotide per replication cycle depending on the calculation method (see
Text S1), which is consistent with the estimate from our more complex dynamical model and substantially less than the rate (~10
−4) estimated by Cuevas
et al.
[13]. A likely explanation for the difference between the findings of our nonsense mutation analysis and that of Cuevas
et al. is that in our study
Taq polymerase errors are eliminated from the finished sequences by the SGA-direct amplicon sequencing method and thus do not enter in the error rate calculations; this was not the case for the previous analyses
[6],
[13]. We further note that estimates of the HCV mutation rate based on nonsense mutations are likely to be overestimates since we found that stop codons were not always lethal (see
Text S1). One explanation for this observation is that there are multiple HCV RNAs in an infected cell and another RNA may complement nonsense mutations. Indeed, we also found a case of a chronically infected patient who has a strain with a large deletion replicating in plasma at multiple time points
[23]. Moreover, for dengue virus (in the same
Flaviviridae family of HCV) there is a report of a viral strain with a stop codon that spread and attained a high frequency in the population, implying replication in both humans and mosquitoes
[50].
In addition, our analysis does not account for mutational errors resulting from the cDNA synthesis step of the sequencing process, which again may lead to an overestimation of the mutation rate. However, we used Superscript IIITM Reverse Transcriptase (Cat. No. 18080-093, 2000 units, Invitrogen Life Technologies, Carlsbad, CA) that has been reported to have an error rate of ~2×10
−6 mutations/nucleotide/replication
[23],
[51], which is at least 10-fold lower than our HCV mutation rate estimates, and hence should not significantly influence our estimates.
Our estimates of the mutation rate for the HCV RdRp of ~2.5×10
−5 are notable because previous reports have suggested that the
in vivo mutation rate of HCV is of the order of 10
−4 mutations per nucleotide per replication
[13]; and that the
in vitro rate of the isolated RdRp could be as high as 10
−3
[12]. One possible explanation for the latter discrepancy is that the mutation rates observed with purified RdRp enzymes are generally larger than those seen
in vivo, because
in vitro analyses cannot recapitulate the intracellular milieu of the replication or polymerase complex. For example, in the case of HIV reverse transcriptase, the errors measured with purified enzyme were found to be up to 20-fold higher than those measured in infected cells
[52]. Another possibility is that we may have missed some low prevalence strains. However, a detailed power calculation shows that with the number of sequences obtained per patient, we would only miss strains that are present at very low levels, below 2%
[23], which is much better than was possible before
[25],
[53] (see Li
et al.
[23] for a detailed discussion). Moreover, for the dynamical model we follow time courses and analyzed the fraction of virus identical to the T/F virus; and for the stop codon analyses, we corrected for the mutational targets. Both of these lower the impact of missing strains.
Given the low level of diversity observed in early infection and the relatively low mutation rate, the enormous diversity of HCV
[14],
[15],
[18] and its high substitution rate (i.e., substitutions/site/year) have to be understood in light of HCV's replication mechanism
[16]. Relatively long-lived infected cells, with multiple replication complexes allow for the accumulation of diversity in the virions produced. At the same time, the turnover of both replication complexes and infected cells, which must surely ensue as the immune response develops, allows for renewed generation of diversity throughout the course of infection (compare 10062 in and ). Indeed, it could be that these details of the life cycle are responsible for the large diversity of HCV. We note that HIV and influenza, which are thought to have similar mutation rates to the one estimated here
[6],
[52], also have high substitution rates
[54]. In this context, we see that accumulation of diversity is not only dependent on mutation rate, but also to a great extent on the particular processes of the viral life cycle
[7],
[8],
[16]. Clearly, the pressure of the immune response, once established, will be important in determining relative fitness of many of the mutations and in determining the spectrum of mutations observed. That we see only scarce evidence of positive selection in our dataset indicates that there is a window of several weeks before the effects of the immune response can be detected.
Another important parameter that we estimated was the fraction of infected cells during the early plateau in viral load, which ranged between 1.7% and 22% of hepatocytes. This fraction is in reasonable agreement with other studies of HCV
[41],
[42]. In our model, this fraction depends on the value assumed for the maximum number of replication complexes (
RCM). The larger the number of replication complexes in an infected cell, the more viruses this cell can produce per unit of time, and thus the fewer the number of infected cells needed to maintain a given steady state viral load. However, increasing
RCM has little effect on our estimate of the mutation rate (see
Text S1).
In this study, we constructed a simple model of HCV replication that tried to capture the most salient features of the viral life cycle. Moreover, we were careful to choose parameters consistent with the literature a priori, so that only 2 parameters had to be adjusted to fit the data on viral growth and diversity increase. We tested variation in the model assumptions and found that the results were quite robust. Still, it is clear that many complexities could be added to the model. For example, instead of having a fixed RCM, we could allow it to vary from cell to cell and possibly even from time to time; or we could allow for a distribution of generation times for RNA synthesis. These and other processes are easy to include in the model, however we opted to keep to the essential aspects of the replication process, so that we did not have to make further assumptions, which would complicate the interpretation of the results. In essence, this is akin to choosing a simple experimental system that is amenable to easy manipulation and interpretation of results, even if it does not represent fully all the details of in vivo system.
Altogether, the unique dataset presented here, including HCV viral kinetics and genomic diversification very early in infection, revealed that the initial exponential expansion of HCV RNA is followed by a plateau in viral load that lasts up to a few weeks
[30]. The initial viral expansion is accompanied by a fast early increase in sequence diversity, whereas during the viral plateau viral diversity remains approximately constant. During the plateau viral production continues but is simply balanced by the rate of viral clearance. In order to understand why viral diversity did not continue to increase during this period, we develop a novel stochastic model of HCV infection. The basic idea behind the model is that during the early exponential expansion of the virus, new cells are being infected and generating multiple replication complexes in each infected cell. This involves multiple copying events of (+)RNA to (−)RNA to (+)RNA, etc, with errors potentially being generated at each stage. We postulate that once the viral plateau is reached a stable population of long-lived infected cells has been generated which then produce the plateau virus without any need for new RC generation. If no new replication templates are made then there is little opportunity for mutations to accumulate, though each virus can still mutate in relation to its parent RC due to the (−)RNA to (+)RNA copying event. We found that our model, based on this idea, agreed with both the viral load kinetic data and the sequence diversity data if we assumed that the
in vivo mutation rate of HCV is ~2.5×10
−5 per nucleotide per replication cycle. This is about 5-fold lower than previously reported, but still high enough that coupled with the long-lasting nature of HCV infection and the very high turnover of virus in chronic infection leads to substantial HCV diversity in an individual and in the population.