The present study provides new quantitative and qualitative insights into HCV transmission and early diversification in humans. Previous reports documented a virus population bottleneck associated with HCV transmission, but none of those studies including ones based on 454 deep sequencing captured the broad range in multiplicity of infection or the full spectrum of genetic diversity that exists among transmitted viruses. In our study of 17 acutely infected subjects, we could unambiguously identify and determine the exact nucleotide sequences of one or more T/F virus genomes in each subject. This was true for all subjects whose HCV genomes were sequenced within the initial ~6–8 weeks of infection; beyond that there were examples of immune selection that confounded the identification of T/F virus genomes (Figures S16 and S17). We estimated the multiplicity of infection (numbers of T/F viruses leading to productive clinical infection) to range from 1 to as many as 37 or more with a median of 4. These are minimum estimates given our sampling limitations, although we note that our median sampling depth of 151 sequences () afforded us a 95% likelihood of detecting variants present at 2% prevalence
[15]. In subjects productively infected by lower numbers of viruses (<10), where the progeny of each transmitted virus is repeatedly sampled, our estimates ( and
S1) are likely to be an accurate and precise measure of the number viruses that result in productive infection. In subjects infected by higher numbers of viruses, especially in the setting of acute-to-acute transmission where transmitted viruses are expected to differ by as few as one nucleotide, the accuracy of our estimates are necessarily less. This is because we could not sample deeply enough due to practical constraints of single genome sequencing of quarter and half genomes, and because we could not distinguish between transmitted viruses that differ by one or few nucleotides from single variant transmission followed by early stochastic mutations. However, based on the striking differences in diversity patterns that we observed between subjects with chronic-to-acute versus apparent acute-to-acute transmission, we suspect that the actual numbers of T/F viruses in subjects 10016, 10020, 10003 and 106889 approximate or exceed our estimates of 15, 10, 37 and 30 T/F genomes, respectively ( and
S2).
The broad range in numbers of T/F viruses responsible for acute HCV infection in our cohort must reflect the different transmission routes and risk practices ofsource plasma donors. Ostensibly, such individuals should be at low risk of acquiring HCV infection since they are qualified as regular source plasma donors only after extensive pre-enrollment screening that consists of medical histories, physical examinations and behavioral questionnaires designed specifically to eliminate from the donor pool individuals at risk for HCV, HBV or HIV infection (
http://www.fda.gov/BiologicsBloodVaccines/GuidanceComplianceRegulatoryInformation/default.htm). However, self-reporting of risk behaviors among paid plasma donors is admittedly imperfect
[60]. Thus, it is likely that the subjects in the present study represent the broad clinical spectrum of community-acquired HCV infection in the United States, which includes injection drug users, men who have sex with men, heterosexuals, and possibly, household contacts of HCV infected individuals.
Our findings regarding the multiplicity of human infection by HCV are quite different from those obtained by 454 pyrosequencing in seven acutely infected subjects reported by two different investigative groups where the range in T/F viruses was one to four with a median one
[19],
[43]. Our findings are also substantially different from estimates from other reports that employed reverse transcription, bulk PCR amplification, population sequencing or molecular cloning followed by sequencing
[20],
[23],
[25],
[30],
[38],
[41],
[42].The latter studies showed that acutely infected subjects exhibited a spectrum in HCV sequence diversity that could at best be interpreted qualitatively as reflecting ‘few-variant’ versus ‘multi-variant’ transmission. We also note a recent study that used conventional bulk PCR amplification, cloning and sequencing to analyze acute and early HCV sequences consisting of a 225 bp hypervariable region of
env from 10 acute infection subjects following IDU, sexual or nosocomial exposures
[28]. Thisreportdescribed perplexing findings: 7 of 10 acutely infected subjects seemed to harbor more than one HCV genotype andsequential sequences obtained from these subjects a median of 17.5 days apart throughout the acute infection period suggested fluctuations in the prevalence of different HCV genotypes, subtypes and clades. These findings are at odds with our results (, –,
S2,
S3,
S4,
S5,
S6,
S7,
S8,
S9 and
S14,
S15,
S16) and those of most other studies
[24].
The SGA-direct amplicon sequencing strategy used in the present study represents a substantial advance in sensitivity and molecular resolution for distinguishing closely and distantly related T/F HCV genomes and their evolving progeny. Studies of HCV specific CTL recognition and escape
[20],
[25],
[61],
[62], neutralizing antibody recognition and escape
[41],
[42], and DAA drug resistance development
[7] have previously been performed without a precise identification of T/F viral genomes and future studies may benefit from such an approach. We could readily distinguish evolving viral lineages that differed from the T/F genome by just 1 nucleotide in 5,000 (0.02%) at sites under selective pressure (Figures S16–17). This discriminating power further revealed evidence of acute-to-acute virus transmission in three subjects (, and
S9) and DAA drug-induced viral genetic bottlenecking in a donor to a fourth acutely infected subject(). This exquisite sensitivity in distinguishing T/F virus genomes and their progeny stands in contrast to the 454-based approaches, which were unable to distinguish between T/F viruses that differed by less than 2.5%
[43] and bulk PCR-clone-sequencing methodsthat used a cutoff of 3% diversity to distinguish homogeneous from heterogeneous virus transmission
[23]. Both of the latter methods are further confounded by the potential for
Taq polymerase-mediated strand transfers leading to recombination artifacts in finished sequences
[15],
[19],
[63].
The ability to identify actual T/F viral sequences and to track virus diversification from these sequences with single nucleotide resolution provided a unique opportunity to assess HCV sequence evolution
in vivo. Virus diversification from discrete T/F viruses was generally star-like and conformed well to our HCV adapted model of early virus diversification. The finding of only a single instance of potential CTL escape or reversion among 17 acutely infected subjects at the last sampling time point is consistent with previous reports indicating substantial delays in the onset of adaptive immunity to HCV
[9],
[10]. The overall nucleotide substitution frequency that we observed among all subjects and including all sampling time points was 1.4×10
−4. This substitution frequency is different from the mutation rate since it does not account for time, numbers of replication cycles, or different modes of HCV replication (linear versus geometric)
[3],
[57],
[64], nor does it account for nucleotide substitutions introduced by the MuLV polymerase (Superscript III) during cDNA synthesis. The latter is estimated to occur at a frequency as low as 2×10
−6
[65] and thus likely contributes negligibly to the mutation frequencies observed in the present study. Consistent with this interpretation were our results of single genome sequencing performed on the earliest vRNA positive plasma sample from subject 10051 where we found that 43 of 46 5′ quarter 1 genome sequences were identical (
Figure S2). Among all 46 sequences, there were only four nucleotide substitutions in 100,050 nucleotides. This corresponds to a combined substitution frequency for the HCV polymerase and the Superscript III MuLV polymerase of 4×10
−5, Again, thisresult does not account for the numbers of HCV replication cycles occurring between the moment of virus transmission and the time point of sampling, which in this case was very early during the viral ramp-upperiod when the plasma virus load was approximately 10,000 vRNA molecules/ml (). In an accompanying report
[50], we describe a new stochastic model of HCV replication and diversification that provides for a more precise estimation of the
in vivo HCV RdRp error rate, which was found to be ~2.5×10
−5 per base per generation. This is lower than previous reports for HCV
[56],
[57] and comparable to the RT error rate of HIV-1
[57],
[66].
We found a low dN/dS ratio consistent with early negative or purifying selection and a strong 18 to 1 mutational bias for transitions over transversions in acute infection. The latter finding is consistent with a recent report by Gotte and colleagues
[67] who studied sequence evolution in chronically infected subjects and
in vitro where a strong preference for G
![[ratio]](/corehtml/pmc/pmcents/x2236.gif)
U/U
![[ratio]](/corehtml/pmc/pmcents/x2236.gif)
G mismatches was observed for recombinant HCV RdRp. A mutational bias favoring transitions may be a factor besides RdRp error rate that influences the rate of development of DAA resistance mutations
[67]. Importantly, we found no evidence of viral recombination in any subject, which would have been plainly evident in those subjects infected by multiple genetically diverse viral genomes (; ; S3–9). The absence of recombination distinguishes HCV from HIV-1, where early recombination is widespread
[12],
[16],
[46], but is consistent with molecular epidemiological data that suggest that HCV recombination is rare
[68]–
[71]. In addition, the failure to find plus-plus strand recombination in any of the sequences in the present report shows that strand switching by the MuLV reverse transcriptase (RT)
in vitro must be extremely rare. This is important because it demonstrates that MuLV RT-mediated recombination does not confound single genome sequence analyses of HCV or other RNA viruses including HIV-1 andSIV
[15],
[18],
[51]. On the other hand, we did observe seven examples of template switching between plus and minus strands of double-stranded HCV RNA templates (
Figure S18). We could not determine if this resulted from strand switching by MuLV RT
in vitro or by HCV RdRp
in vivo. We note that Branch and colleagues
[72] recently reported high levels of double-stranded HCV RNA in hepatic tissue, thus providing a plausible source of dsRNA for the observed template switching events.
A surprising finding of the current study was evidence of acute-to-acute HCV transmission in a relatively high proportion (3 of 17) of subjects. The acute infection period of HCV, like that of HIV-1, is characterized by very high plasma virus loads, absence of neutralizing antibodies, and rapid expansion of biologically fit virus populations that are homogeneous relative to the respective T/F virus genomes
[22],
[26],
[27]. For HIV-1, the acute and early infection period has been shown to be associated with hyper-transmissibility with epidemiological studies and epidemic modeling indicating substantial enhancement in spread of the virus as long as six months post-transmission
[73]–
[76]. In the simian immunodeficiency virus (SIV) – Indian rhesus macaque transmission model, virus from acute infection plasma is up to 750-fold more transmissible on a per virion basis than is virus from chronic infection plasma
[77]. To our knowledge, a clinical predilection for acute-to-acute HCV transmission has not previously been reported. In addition to the three subjects whom we identified with putative acute-to-acute HCV transmission, an argument can be made for an additional potential case in subject 10017 in whom distinct subsets of closely related T/F sequences were found within a context of high overall sequence diversity (e.g., see lineages v1 and v3;
Figure S8). In this example, a plausible scenario is that a virus ‘donor’ to subject 10017 was acutely infected by multiple genetically-diverseviruses and that multiple progenyrepresenting several of these lineages were transmitted. The implication of these findings is that if the acute period of HCV infection is characterized by hyper-infectiousness as is the case for HIV-1, it could be a previously unrecognized but important contributing factor to the spread of HCV, potentially contributing to a recently described emerging HCV ‘epidemic’ in HIV-1 positive men who have sex with men
[78],
[79]. A limitation in our evidence supporting ‘acute-to-acute’ infection is that our study design did not allow us to identify paired donors and recipients of virus in order to analyze virus transmission directly. Future viral sequencing studies involving social networks of HCV transmission partners
[80], or analyses of cryopreserved plasma specimens from previously conducted acute-to-acute human-to-chimpanzee HCV transmission studies
[81], can provide corroborative evidence. We notethat there is precedent for phylogenetic linkage of HCV sequences in a human-to-human transmission case where clinical epidemiologic linkage between donor and recipient was established
[42].
Still another surprising observation in this study was transmission of what we estimated to be as many as 30 NS3 protease-resistant viruses to subject 106889 (). These mutations (V36M and R155K) confer high level resistance to both Boceprevir and Telaprevir, which were used in clinical trials near the time when 106889 samples were collected. Recently, we performed single genome sequencing of plasma viral RNA from subjects before and after treatment with a next generation investigational HCV protease inhibitor and observed viral genetic bottlenecking closely resembling that found in subject 106889 (unpublished data). To our knowledge, the data from subject 106889 is the first example of high multiplicity DAA drug resistant virus transmission, and the findings here illustrate how transmission of DAA resistant mutants can be deciphered with single genome specificity and sensitivity.
The identification of T/F genomes of HCV, HIV-1
[18], SIV
[51] and potentially other RNA viruses by single genome sequencing is an enabling experimental strategy that captures molecular entities that are wholly sufficient and responsible for productive clinical infection and disease causation. In an accompanying report
[50], we use sequences derived by this approach to analyze and mathematically model the early dynamics of HCV replication and diversification in acutely infected humans and derive new estimates of the
in vivo mutation rate of HCV. A second application of the single genome sequencing method is to reveal through enumeration of T/F genomes, the challenge that vaccine candidates face in attempting to prevent or constrain HCV transmission. In a third application of the method, we previously demonstrated for HIV-1 that single genome sequencing allows for the molecular identification, cloning and biological characterization of full-length T/F genomes and a comprehensive proteome-wide analysis of autologous, strain-specific patterns of cytotoxic T-cell and neutralizing antibody responses
[13],
[18],
[82]–
[84]. By demonstrating that early HCV diversification generally conforms to a model of essentially random virus evolution where sequences coalesce to distinct, unambiguous T/F genomes, the present study has taken the first critical steps to demonstrate the feasibility of similar genome-wide analyses for HCV. An intriguing possibility is that full-length T/F HCV genomes, which by definition possess nucleotide and amino acid sequences sufficient for efficient
in vivo replication in humans, can be identified, molecularly cloned and expressed for biological analyses in cell culture and animal models.