|Home | About | Journals | Submit | Contact Us | Français|
HIV-1 transmission and viral evolution in the first year of infection were studied in 11 individuals representing four transmitter-recipient pairs and three independent seroconverters. Nine of these individuals were enrolled during acute infection; all were men who have sex with men (MSM) infected with HIV-1 subtype B. A total of 475 nearly full-length HIV-1 genome sequences were generated, representing on average 10 genomes per specimen at 2 to 12 visits over the first year of infection. Single founding variants with nearly homogeneous viral populations were detected in eight of the nine individuals who were enrolled during acute HIV-1 infection. Restriction to a single founder variant was not due to a lack of diversity in the transmitter as homogeneous populations were found in recipients from transmitters with chronic infection. Mutational patterns indicative of rapid viral population growth dominated during the first 5 weeks of infection and included a slight contraction of viral genetic diversity over the first 20 to 40 days. Subsequently, selection dominated, most markedly in env and nef. Mutants were detected in the first week and became consensus as early as day 21 after the onset of symptoms of primary HIV infection. We found multiple indications of cytotoxic T lymphocyte (CTL) escape mutations while reversions appeared limited. Putative escape mutations were often rapidly replaced with mutually exclusive mutations nearby, indicating the existence of a maturational escape process, possibly in adaptation to viral fitness constraints or to immune responses against new variants. We showed that establishment of HIV-1 infection is likely due to a biological mechanism that restricts transmission rather than to early adaptive evolution during acute infection. Furthermore, the diversity of HIV strains coupled with complex and individual-specific patterns of CTL escape did not reveal shared sequence characteristics of acute infection that could be harnessed for vaccine design.
While some HIV-1 infections result in the initial outgrowth of multiple viral variants, in most cases a single variant establishes infection (1, 4, 14, 19, 22, 24, 33, 34, 37, 54, 55, 61, 65, 73). Seroconversion generally occurs within 3 to 12 weeks (9); then, within 6 to 12 months, plasma viral levels typically reach a quasi-stable set point that is prognostic for disease progression (43, 46–48, 64). Symptoms of acute retroviral syndrome, when they are noted, coincide with emerging or peak viral loads, which then decline sharply as HIV-1-specific CD8+ cytotoxic T lymphocyte (CTL) responses emerge (6, 35).
Although viral populations early in HIV infection have been known for 2 decades to typically be nearly homogeneous (14, 75, 77), recent studies have better characterized HIV-1 sequences in the earliest weeks of infection, including sequences obtained prior to the selective pressure imposed by the nascent immune response of the newly infected individual (1, 22, 34, 61). The low genetic variability of viruses in early HIV-1 infection and the rapid viral population expansion and contraction that occur in the first weeks of infection underline the potential importance of stochastic processes in the earliest phases of HIV-1 adaptation to a new host. Analyses of both env (34) and whole-genome sequences (62) showed that, before peak viremia, HIV-1 evolution proceeds randomly under a star-like phylogeny, conforming to a model of exponential HIV-1 population growth without selective pressure (elaborated by Lee and colleagues  for single founder strains). In the week(s) after peak viremia, major changes occur in viral sequences, with clear signs of adaptive evolution reflected in the occurrence of new mutations. Most early mutations are selected by cellular immune responses (21), corroborating that CTL responses are a major force acting on the viral population in a single host (2, 41) as well as at the interhost population level (5, 32).
To better understand the interplay between stochastic and selective processes and how this affects initial HIV-1 viral outgrowth and adaptation to a new host, we studied transmitter-recipient transmission pairs and acutely infected individuals (five individuals first sampled in Fiebig stage I and four first sampled in stage V). We examined the evolutionary patterns observed in their HIV-1 subtype B genomes, using 475 nearly full-length (~9,100 nucleotides) HIV-1 sequences derived at multiple time points for up to 350 days after the onset of clinical symptoms of primary HIV-1 infection. Genomic sequences were obtained prior to and following peak plasma viral load for three individuals, allowing us to assess how viral population dynamics and selection impact HIV-1 evolution in very early stages of infection.
Eleven adult subjects were recruited through the University of Washington Primary Infection Clinic (PIC) and gave informed consent under clinical protocols approved by the University of Washington Institutional Review Board. All were men who have sex with men (MSM), and nine were enrolled in primary HIV-1 infection (Fiebig stages I to V ). All were antiretroviral therapy naïve during the study period. Blood samples were collected every week for the first month and every 1 to 3 months thereafter. Plasma specimens were tested for HIV-1 RNA, p24 antigen, and virus-specific antibodies to determine Fiebig stages. HLA class I genotyping was performed by sequence-specific primer PCR (absolute resolution to two digits and high-probability resolution to four digits) (8). The duration of infection was estimated as the number of days after the onset of symptoms of an acute retroviral syndrome; among PIC enrollees for which a determination could be made, a median of 12 days elapsed between transmission and symptom onset (J. D. Stekler et al., unpublished data).
Viral RNA extraction from plasma samples, cDNA synthesis, genome (~9.1 kb) amplification, cloning, and sequencing were performed as described previously (59). We amplified viral genomes from single viral RNA templates. PCR amplification followed endpoint dilution methodology in order to avoid template resampling bias and was conducted using primers 1.U5 (TGAGTGCTTCAAGTAGTGTGTGCCCGTCTGT; HXB2 coordinates 541 to 571) and 1.3′3′pl (GGGTGAAGCACTCAAGGCAAGCTTTATTG; HXB2 coordinates 9611 to 9636) for the first-round PCR and primers 2.U5 (GGCCGCGGATCCAGTAGTGTGTGCCCGTCTGTTGTGTGACTC; HXB2 coordinates 552 to 581) and 2.3′3′pl (GGCCGCGCGGCCGCTGAAGCACTCAAGGCAAGCTTTATTGAGGCTTA; HXB2 coordinates 9604 to 9636) for the second-round PCR. For transmitters, we obtained 10 genomes from one (two for the subject designated transmitter 4 [T4]) time point near the estimated time of HIV-1 transmission. For individuals followed longitudinally, we obtained 9 to 15 genome sequences at up to 12 time points (up to 350 days after onset of symptoms). Genome and subgenomic sequences (encompassing ~60% of the genome) for two individuals (T1 and recipient 1 [R1]) were reported previously (41).
Gamma interferon (IFN-γ) enzyme immunospot (ELISPOT) assays were done on cryopreserved peripheral blood mononuclear cells (PBMC) using a panel of HIV-1 subtype B peptides (9- to 11-mers) (see Table 1 posted at http://mullinslab.microbiol.washington.edu/publications/herbeck_2011/); we report spot-forming cells per million PBMC (SFC/M) when >50 SFC/M after background is subtracted (41). IFN-γ ELISPOT assays were performed on PBMC from the individual designated seroconverter 1 (S1) at days 7 and 127 using 54 peptides (10 Gag, 15 Pol, 10 Env, 15 Nef, 2 Vif, and 2 Rev), on PBMC from subject R3 at days 25 and 144 using 47 peptides, and on PBMC from R4 at day 14 using 17 peptides.
Nucleotide sequences were aligned with Clustal W, version 1.8 (70), and manually edited with MacClade, version 4.08 (44). Alignments are available at http://mullins.lab.microbiol.washington.edu/publications/herbeck_2011/. Alignments of phylogenetically informative nucleotide sites omit mutations that occur only once, which are possibly introduced by polymerase-induced errors during PCR. This informative-sites (InSites) approach (http://indra.mullins.microbiol.washington.edu/DIVEIN/insites.html) results in slightly decreased estimates of nucleotide diversity relative to single-template amplification methods (61) although standard methods of PCR/cloning have been shown to produce measures of population structure and genetic diversity equivalent to those found with single-genome amplification methods (31). An insertion or deletion that spanned multiple sites was counted as a single informative site. APOBEC3G/APOBEC3F (APOBEC3F/G)-induced mutations were evaluated using Hypermut, version 2.0 (http://www.hiv.lanl.gov/content/sequence/HYPERMUT/hypermut.html), in intrahost datasets by taking the consensus sequence at visit 1 as a reference; one putative APOBEC-induced G-to-A hypermutated sequence in subject S1 was identified and excluded from subsequent analyses. Maximum-likelihood phylogenetic trees were reconstructed using the general time-reversible model of substitution with gamma distribution in PhyML (version 2.4.5) (23). Potential N-linked glycosylation sites (PNGS) in Env were predicted using N-GLYCOSITE (76). All Env sequences were evaluated for CCR5 or CXCR4 coreceptor specificity using the position-specific site matrix (PSSM) web tool (30) (http://indra.mullins.microbiol.washington.edu/webpssm). For each individual with five or more sequenced time points, the rate of nucleotide diversity increase was estimated using univariate linear regression analysis. Overall rates of diversity increase were calculated by pooling all data points and, alternatively, by estimating the mean of rates calculated separately for each individual. Intrahost phylogenies were reconstructed to identify distinct lineages, taken to indicate multiple founders, replicating within each individual. Using sequences from the first time point examined (visit 1), we examined the distribution of pairwise genetic diversity and Hamming distances (HD; the uncorrected count of nucleotide differences between two sequences) for all nucleotide sites and for phylogenetically informative sites.
Two statistical tests of neutral evolution implemented in the DnaSP software (60) were used. Tajima's D (69) is based on the difference between two estimates of θ (θ = 2Neμ in a haploid population, where Ne is effective population size, and μ is the mutation rate per generation); one estimate is based on the number of segregating nucleotide sites (θW), and the other is based on the average pairwise distance (π, θπ). In a population of constant size in neutral equilibrium, the two estimates of θ will be statistically indistinguishable, and values of D are near zero. Deviations from zero (the null hypothesis of neutral evolution) can reflect selective or demographic processes. The D* of Fu and Li (18) compares θW to θ based on the total number of mutations on a genealogy. D and D* were calculated for genome nucleotide alignments at each time point. Bonferroni corrections for 36 tests (P = 0.05) were done after P values were estimated from null distributions created from 104 simulations under a neutral coalescent model with no recombination, conditioned on the sample size and level of variation in the observed data (60).
D and D* were calculated separately for each gene, with the heuristic assumption of free recombination among genes (67) and no recombination within them (i.e., to test for deviations from neutrality among genes, we assumed that genes are independent and that selection operating on Gag will not affect Env). D and D* values were then compared across env, gag, nef, and pol, with statistical correction for 144 tests after estimating P values as above (60). Quantitative evidence of distinct evolutionary processes among loci were evaluated using a Hudson-Kreitman-Aguadé (HKA) test (28), treating each time point and gene alignment (env, gag, nef, and pol) as a separate population. The HKA test is based on the fact that selection acting on a specific locus will violate the neutral condition where Ne is equivalent across loci, with statistical significance estimated with a χ2 goodness-of-fit test (28).
We tested for evidence of positive selection in all nine HIV-1 genes in each individual with five or more time points. First, we measured the ratio of nonsynonymous (dN) to synonymous (dS) substitutions, dN/dS, or ω, (20, 56) using HyPhy (http://www.datamonkey.org/) (53). The fixed-effects likelihood (FEL) method with the general reversible nucleotide substitution model (REV) was used, and sites with ω of >1 and P of <0.1 were considered to be under positive selection. Second, we tested for directional positive selection using the method of Liu et al. (41), which compares the accumulation rate of amino acid mutations to the expected rate if the accumulation were due to genetic drift alone (determined by simulation).
HLA-specific HIV-1 epitopes were predicted in all protein sequences using Epipred (25; http://atom.research.microsoft.com/bio/epipred.aspx) and NetMHC (10, 49). Epipred identifies known and potential CTL epitope motifs using 2-digit HLA information; we accepted all epitope motifs with a posterior probability of >0.5. NetMHC predicts binding of peptides to 4-digit HLA alleles; we accepted both strong and weak binders.
For each individual, we derived a consensus from sequences found at visit 1 (in the event of two founder viruses, two respective consensus sequences were derived). Each sequence from later visits was compared to the visit 1 consensus, and we tracked the frequency of all amino acid mutations longitudinally.
For each amino acid mutation, we calculated the frequency of the mutant and consensus amino acid in circulating HIV-1 sequences, using a data set of independent HIV-1 clade B sequences (comprised of 200 sequences from Env, 125 from Gag, 227 from Pol, 514 from Nef, 184 from Rev, 286 from Tat, 327 from Vif, 225 from Vpr, and 203 from Vpu) (57). We defined an amino acid mutation as a forward (putative escape) mutation when there was a decrease of >50% between the database frequencies of the visit 1 consensus amino acid and the mutated amino acid. Conversely, a reversion corresponded to an increase of >50% between the database frequencies of the visit 1 consensus amino acid and the mutated amino acid.
We analyzed HIV-1 subtype B evolution in 11 individuals, including four transmitter-recipient pairs (Fig. 1) and three independent seroconverters. At the time of transmission, three transmitters were chronically infected while one was acutely infected (Table 1). Sequences from transmission pair 1 (subjects T1 and R1 in Table 1) have been described previously (41); from the nine other individuals, 475 nearly full-length viral genome sequences (“genomes”) were generated, representing an average of 10 genomes at up to 12 serial time points (Table 2). No evidence of dual infection was found. All viruses were predicted by PSSM (30) to use the CCR5 coreceptor, as expected for early HIV-1 infection (63).
Eight of the nine acutely infected individuals had infections founded by a single HIV-1 lineage, while one individual (R4) replicated two lineages. Founder viral populations were remarkably homogeneous (Fig. 1) based on Hamming distances and pairwise diversity measures (Table 1). For single founder infections, the mean pairwise diversity among genomes at visit 1 was 0.32% (range, 0.17 to 0.44%) (Table 1). For R4, the individual with two founder lineages identified at 13 days postonset of symptoms of acute retroviral infection (referred to as “days”), the two distinct variant lineages differed by a mean of 1.12%, whereas each individual lineage was nearly homogeneous (Table 1).
Transmission pair 2 corresponds to two individuals with acute infections. Since the transmitter, T2, had a single variant with little diversity among sequences, infection was founded by a single strain in the recipient partner (R2) (Fig. 1). Sequences from both individuals were intermingled in the tree, and interhost genome pairwise distances ranged from 0.18% to 0.43%, a range conforming to the variation seen with a single founder within one host at the earliest time point (Table 1). Transmissions in two other pairs resulted in infections established by a single founder strain (pairs 1 and 3) even though each of the transmitting partners was chronically infected, with extensive diversity among their sequences (41). Transmitting partner T4 had been enrolled during primary infection, and little viral genetic variation was observed (Fig. 1 and Table 1), but 9 years later, at the time of transmission to R4, genomes from T4 contained extensive variation, and two variants were found in primary infection in the recipient (see Fig. 1 posted at http://mullinslab.microbiol.washington.edu/publications/herbeck_2011/).
For the four transmission pairs, we compared sequences from the recipient to sequences from the respective transmitter. There were exact matches (100% similarity) between transmitter and recipient sequences when we considered the conserved genes gag or pol. However, over the whole genome there were no exact matches between recipient sequences and those from the transmitting partner; the closest sequence between transmitter and recipient had between 1.69% (T3-R3) and 0.18% (T2-R2) divergence.
An important question is whether the founder variant in the recipient can be distinguished from sequences in the transmitter due to properties advantageous for the establishment of infection. That is, is the founder variant rare or common (typical) in the transmitter? To address this question for each transmission pair, we compared the consensus of the recipient population to each sequence in the transmitter. From the resulting ranked pairwise distances, we identified an approximate transmitted variant, i.e., the transmitter sequence that is most closely related to the recipient consensus (founder variant). Next, we compared all transmitter sequences to the transmitter consensus sequence under the hypothesis that an approximate transmitted variant that is rare would have a greater distance to the transmitter consensus than most, if not all, other transmitter sequences. Figure 2 shows the distribution of genetic distances for all the sequences from the transmitter. For each transmission pair, the transmitter sequence that matches most closely the consensus sequence in the recipient at visit 1 (i.e., our best approximation of the transmitted/founder virus) was found to be representative of the sequences in the transmitter. Indeed, it was generally very close to the mean genetic distance corresponding to all the transmitter sequences. Thus, the approximate founder variants did not appear to be unusual or rare in the transmitter (Fig. 2).
Viral genome diversity increased over time across all individuals at a yearly rate of 0.55% for all nucleotide sites (Fig. 3A); the rate of accumulation of selected sites corresponded to an average of 40 sites in each subject in the first year of infection (Fig. 3B). However, the evolutionary rate at the genome level masks decoupled rates in the different genes. When we examined individual genes, as expected, the average rates of diversification were lower in gag (0.33%) and pol (0.31%) and higher in env or C2V5 (1.07%) and nef (1.34%) (all values are pooled estimates for the four individuals chosen because they had five or more time points evaluated).
At visits in the first month of infection, we observed a transient decrease (a dip) in nucleotide diversity for both genomes (Fig. 3C) and independent gene sequences (data not shown). This suggested a contraction in diversity following the establishment of infection. We also noted a decrease in APOBEC3F/G-mediated mutations that coincided with the dip in nucleotide diversity (see Fig. 2 posted at http: //mullinslab.microbiol.washington.edu/publications/herbeck_2011/), yet the dip in nucleotide diversity was of substantially larger magnitude and thus not due to the decrease in APOBEC-induced mutations (see Fig. 2 posted at the URL mentioned above).
To evaluate potential factors behind the dip in nucleotide diversity and assess the forces acting on the viral population very early in infection, we assessed how the data conformed to the neutral theory, given that the dramatic, several-order-of-magnitude change in plasma viremia that occurs during acute infection suggests that changes in HIV-1 population size (i.e., demographic processes) might influence genetic diversity in this time period. Trends in genome diversity and divergence are plotted along with viral load data in Fig. 4. We performed neutrality tests on genomes from the four individuals with five or more sequential visits (R3, S1, S2, and S3). Both Tajima's D (69) and Fu and Li's D* tests (18) revealed negative deviations from neutral evolution, suggesting either positive selection and/or demographic events (69) (see Table 2 posted at http://mullinslab.microbiol.washington.edu/publications/herbeck_2011/). The most significant negative deviations (P < 0.001) were observed in the earliest time points after infection, specifically before ~50 days, coinciding with the rapid viral population growth and contraction during acute infection (shaded in Fig. 4). Next, to distinguish demographic and selective processes, we calculated D and D* separately for env, gag, nef, and pol; there was no evidence of selection acting specifically on a particular gene as genomes and individual genes showed similar patterns, implying the existence of demographic processes acting uniformly across genomes. Significant negative deviations were again more common at the first time points, and the strongest P values in the gene-specific analyses coincided with negative deviations in the whole-genome analyses. Since sequential visits are not independent due to shared evolutionary history, the number of independent tests can be reduced (compared to strict Bonferroni correction for 144 tests), thus revealing significant deviations from neutrality in the early time points (see Table 3 posted at the URL mentioned above). In addition, in pairwise comparisons of genes for each time point, the Hudson-Kreitman-Aguadé (HKA) tests (28) revealed no sign of adaptive positive selection.
The significant negative deviations observed for both genomes and separate genes persisted until the rapid decline in viral loads (Fig. 4). Importantly, negative D and D* values that are due to demographic processes can result from a founder effect or from a recent population expansion with a subsequent delay in the population reaching neutral equilibrium (69). Evolution of HIV-1 during acute infection is therefore marked by both a founder effect and subsequent population expansion. We conclude that the observed early dip in viral diversity is likely caused by rapid viral population expansion; the process of population expansion can result in decreased mean population diversity as most lineages in the growing population are descendant from a limited number of ancestral lineages.
While demographic processes predominated at the earliest time points (before ~50 days), later visits revealed the role of positive selection in HIV-1 primary infection. Using a comparative dN/dS approach (53) and a simulation approach that identifies directional selection (41), we identified amino acid sites under positive selection for the four individuals whose data are shown in Fig. 3B. Over the whole proteome, an average of 24 sites were under positive selection for each individual (range, 20 in R3 with 222 days of follow-up to 37 in S2 with 346 days of follow- up) (see Table 4 posted at http://mullinslab.microbiol.washington.edu/publications/herbeck_2011/). No significant change in the number of potential N-linked glycosylation sites (PNGS) was seen over these time periods or between transmitters and recipients (Fig. 3E). The mean number of PNGS ranged between 27 and 34 per sequence. However, only two to five PNGS had variation (of which only one site, in S1, had a positively selected mutation).
To assess T cell-mediated pressure on HIV-1 evolution, we analyzed CTL responses and predicted epitopes based on each individual's HLA type. Akin to the dip in viral diversity, we noted that the average number of predicted epitopes also decreased in the first ~50 days after infection (Fig. 3D). However, with the exception of subject S2, these dips occurred later and for a more prolonged period than the dips in viral diversity for the same individuals. The above data along with CTL response data are illustrated for four newly infected individuals: three enrolled in Fiebig stage I (Fig. 5A; see also Fig. 3 posted at http://mullinslab.microbiol.washington.edu/publications/herbeck_2011/) and one in Fiebig stage V (see Fig. 4 posted at the URL mentioned above). Overall, mutations accumulated gradually over the genome through time. The initial appearance of a mutation that later came to fixation in the Tat protein was detected at 7 days in subject S1 (Fig. 5A). The earliest fixations of mutant amino acids were at 21 (in Tat from S1) (Fig. 5A) and 33 (in Nef from R3) (see Fig. 4 posted at the URL mentioned above) days although the mutation in Tat was not identified as positively selected by the two algorithms used here due to the extremely abrupt change in the population. By ~6 months postonset of symptoms (181 to 210 days), positively selected mutations were much more frequent, ranging from 9 in subject S2 to 18 in S3. Selected loci were more frequent in the 3′ half of the genome, which includes the most variable HIV-1 genes.
When examining mutations in CTL epitopes (recognized and predicted), we noted several instances of the initial mutations being replaced by secondary mutations located nearby and usually in mutually exclusive sequence patterns. These patterns were seen in each of the four individuals followed for more than 180 days at one to eight sites across the proteome (Fig. 5A; see also Fig. 3 and and44 posted at http://mullinslab.microbiol.washington.edu/publications/herbeck_2011/). Similar to the first amino acid mutations, the second mutations noted were most often to amino acids of low database frequency. These mutual exclusion patterns were seen in epitopes corresponding to three CTL responses against Env in S1 (outlined in green boxes in Fig. 5). In this complex case, the original Env epitopes were replaced by day 68 by two to four variants harboring mutually exclusive mutations. A response was detected against the known epitope SFNCGGEFF (C04; residues 375 to 383) (SFC of 620 at day 127; not measured at day 7), which had been replaced by day 68 by two mutually exclusive variants (mutated residues are underlined in the sequences) SVNCGGEFF and SFNCRGEFF. The epitope RRGWEILKY (A01; residues 787 to 795) represented >90% of sequences until day 13 and only 27% at day 21 and was not detected afterwards, while the variant RRGWETLKY became the consensus (the ELISPOT assay response was 15 SFC at day 7 and 715 at day 127). A stronger response at day 127 (SFC of 1,310; not detected at day 7) was elicited against RQGLERALL (B08; residues 848 to 856), which was the predominant variant until day 44, when it was replaced by RQGLERVLL/RQGLERAFL. Five more CTL responses were detected by ELISPOT assay in subject S1; however, their targeted epitopes showed no sequence variation over 181 days of follow-up. Three other examples of mutually exclusive mutations were observed in this subject. In Pol, two mutations were 7 amino acids (aa) apart in the predicted epitope B*0801; in RGRRKVVSL an R-to-K mutation found at position 1 was in mutual exclusion with an S-to-P mutation at position 8 of the epitope. In Rev, the mutually exclusive mutations were 7 aa apart, but only one site was found within a predicted epitope. In one case, the known Gag B08-restricted epitope DCKTILKAL (residues 197 to 205) was transiently replaced between days 68 and 96 by DCRTILKAL (Fig. 5A). This corresponded to a switch in database frequency from 94% to 3% for the K331R mutation (a previously documented HLA-B08-associated polymorphism ). The resurgence of the original amino acid at day 127 was accompanied by a mutually exclusive A-to-S mutation located 2 aa downstream of the epitope, corresponding to a 76% decrease in database frequency (from 87% to 11%).
We assessed the direction of mutations by comparing the conservation level of the founder and mutant amino acids in a database of circulating HIV-1 sequences (41). We defined forward (likely to be escape) mutations as those that reflected a decrease in database frequency of at least 50% (Fig. 5, shown in orange) and reverse (likely to be reversion) mutations as those that reflected an increase of at least 50% (Fig 5, shown in turquoise; see also Fig. 3 and and44 posted at http://mullinslab.microbiol.washington.edu/publications/herbeck_2011/). Amino acids with less substantial changes in database frequency are highlighted in green. A predominance of forward mutations was observed in all individuals. When we counted the mutations that became fixed, the majority corresponded to forward mutations with a drastic switch to amino acids with lower database frequencies. The ratio of forward to reversion mutations was 37/2 for subject S7, 6/0 for T4, 53/7 for S1, 91/7 for S2, 55/10 for R3, and 4/1 for R4 (for R4 for whom two founder variants were identified, we qualified mutations relative to the consensus corresponding to the respective founder variant).
Reverse mutations were also rare when we analyzed only mutations located in targeted/predicted epitopes. Regarding S2, only one reverse mutation among 11 potential epitopes was suggested (Env; C08-QFEDKTIIF replaced by QFENKTIIF at day 98); the original residue D was found in 0.005% of sequences in the HIV database, while N was found in 96%. For S3, two putative reversions were found, both in Env, including the mutation of IYAPPIQGL to MYAPPIQGL, corresponding to a switch from residues found in 1% (I) to 98% (M) of database sequences. In contrast, several possible escape mutations were seen in Env, Nef, and Gag, including some complex patterns with, for example, four different amino acid mutations in the known Nef epitope VLMWKFDSHL (A02); all were found in less than 8% of circulating sequences. Six independent ELISPOT assay responses were detected in R3, all against invariable epitopes, except for one Env response in which the original DPNPQEIRL epitope was replaced by DPNPQEIGL from day 144.
In this study, we described the evolution of HIV-1 genome sequences encompassing the entire viral proteome in 11 individuals, including four transmission pairs, and with viral sequencing prior to and following the peak viremia of acute infection in four individuals.
We reaffirmed that, in MSM, HIV-1 infection is typically established by a single founder variant. A recent report underlined a higher proportion of infections with multiple variants in MSM (36%; 10 of 28) than in heterosexual transmissions (40). In contrast, we have observed multiple founder infections in about 20% of MSM transmissions we studied (1 of 9 in this study; 5 of 37  and 16 of 65  previously). It has been unclear, however, whether the presence of a single founder reflects a selective process and whether that process occurs at transmission or in the earliest stages of HIV-1 infection. A large fraction of transmissions are thought to occur during acute/early infection (12, 66, 72), when viral load is highest (27, 51). On this basis alone, one would expect to find single founder variants in recipients infected by transmitters in the early stages of infection (when viral populations are typically homogeneous). If the transmitting partner were in a later stage of infection with a diverse viral population, however, infections could be established by several variants. Moreover, the number of founder strains may reflect the network of HIV-1 transmissions; recent data showed that 25% of transmissions occurred in the first 6 months of infection in a cohort of MSM in the United Kingdom, as opposed to 1% for heterosexual transmissions (29). Importantly, while our cohort consisted of MSM, the restriction to a single founder virus was not due to a lack of variation in the transmitter viruses. We studied three transmission pairs in which the donor was chronically infected, and, despite extensive variation in the transmitters, only one recipient contained as many as two founder variants (and different genomes of these two were nearly homogeneous). Thus, establishment of infection by a single variant is likely not a result of lack of variation in the transmitter and must be related to a biological mechanism that restricts the establishment of multiple variants.
The founder variant in the recipient did not appear to be rare in the transmitter but was, rather, a representative variant from the complex viral population in the transmitter. This corroborates data showing that viruses isolated during acute or chronic infection were not distinguishable in terms of viral fitness (7, 36). Although the founder variants were typical of the population of sequences found in the transmitter, the founder in the recipient differed from variants predominating in the transmitter, and these differences warrant further analyses. This issue will need to be addressed with larger sampling and deep sequencing of the viral populations of the transmitter and recipient, the evaluation of viruses from semen in addition to plasma (as the possibility of compartmentalization has been shown [e.g., in references 3, 11, and 52], albeit not consistently [3, 13, 15]), and characterization of selective processes that could explain potential differences between transmitter and recipient viruses. Across transmission pairs, there were about 100 sites that varied over the genome. We considered the possibility that the distinguishing mutations between transmitter and recipient viruses represented reversions in the recipient of CTL escape mutations developed in the transmitter, yet we found little evidence of such a retrograde process over the time periods examined.
During the rapid viral population growth of primary infection, HIV-1 evolution appears to be stochastic before signs of adaptation emerge within 1 to 3 weeks after the onset of symptoms of acute HIV-1 infection. Our data conform to a model of exponential population growth with the development of substitutions in a star-like phylogenetic pattern (68), consistent with the proposal of Lee et al. (38) describing a single infection. We observed star-like tree topologies as a result of both the single-variant founder effect (short distances to the most recent common ancestor [MRCA]) and the multiplicity of variable sites rapidly developing in the genomes (data not shown); sequences from these early time points (before ~50 days) showed no temporal clustering, in contrast to more protracted periods of evolution (65). By including longitudinal sequence data starting before peak plasma viremia, we showed that demographic effects were dominant during the rapid population expansion and contraction of the first 50 days after the onset of symptoms although positively selected sites were detected (in a sampling of an average of 10 viral genomes per time point) within 1 to 3 weeks postonset of symptoms. Whatever forces are responsible for the successful outgrowth of the founder strain, positive selection is minimal at this stage.
Evidence of purifying selection was suggested by the slight contraction of genetic diversity in the first 20 to 40 days. We consider this contraction to be a qualitative observation consistent with (i) the results of the neutrality tests, (ii) the rapid population growth over the same time period, and (iii) the lack of positive selection at the same time period. Although our sample size of 10 sequences per time point limits our ability to comprehensively test the contraction of genetic diversity (with 10 sequences we have a 60% chance of missing a variant that would be found in 5% of sequences), we consider this dip to be an important observation that shall be tested with larger sampling. While this dip in diversity can be explained by the concurrent rapid viral population growth, we also found that the average number of APOBEC-specific G-to-A mutations per genome also dipped during this time frame (see Fig. 2 posted at http://mullinslab.microbiol.washington.edu/publications/herbeck_2011/). This suggests that sequences found at the earliest time points (Fiebig stage I) may have been weakly enriched with APOBEC-induced mutations and that the dip in diversity may be due in part to the elimination of these variants via purifying selection. A previous study also highlighted that APOBEC-induced mutations may be an important phenomenon in early HIV-1 infection (74). In contrast, a change in the number of predicted CTL epitopes, while decreasing, generally did so over a more protracted period (Fig. 3).
Amino acids were identified as positively selected (in a sampling of an average of 10 viral genomes per time point) at about 4 weeks postonset of symptoms, with some of these mutations emerging as early as 1 week postonset of symptoms. The replacement of the founder variant was more rapid in the 3′ half of the genome, with more mutated amino acids and higher rates of mutation of these amino acids, echoing higher nucleotide evolutionary rates in env and nef than in gag and pol. The rate found in the C2V5 region of env is comparable to rates calculated from longitudinal sampling over 6 to 12 years in chronically infected individuals (65). When assessing selected sites observed over time, we focused on the relative importance of forward (putative CTL escape) and reversion mutations as CTL have been found to be the major selective force acting on the virus population early in infection (2, 21, 41). We found more examples of mutations to rare amino acids than to conserved amino acids, suggesting more escape than reversions. Conflicting reports about the relative prevalence of reversions in early HIV-1 infection have been published (16, 21, 26, 32, 33, 39, 45). The large number of reversions found in earlier studies may have been due to the experimental methods, i.e., using (i) consensus sequencing (39), (ii) a cross-sectional cohort (45), or (iii) HXB2 or another unrelated sequence as a reference to score reversions (rather than a baseline sequence) (16). In our study, we analyzed the database frequency of mutations relative to that of the consensus sequence at the first time point, using a perhaps stringent criterion for assigning the direction of mutations: a 50% decrease/increase in database frequency for forward/reverse mutations (the reference database consists of independent circulating HIV-1 subtype B sequences from the Los Alamos National Laboratory HIV database [HIVDB]). Nonetheless, a less strict threshold of 30% also revealed many more forward than reversion mutations. A 50% threshold allows us to partially avoid counting as forward/reversion mutations sites where variation appears well tolerated. For example, we did not consider as a reversion the L80V mutation in Nef where the initial amino acid L was found in 39% of circulating sequences while V was the consensus amino acid found in 45% of sequences. Based on these criteria, evidence for reversions was minimal during acute/early infection.
Initial forward mutations were often replaced by (or fluctuated in the population with) variants with one or more secondary mutations that were mutually exclusive. Given the substantial decreases in database frequency observed for the mutated amino acids, the first and, in some cases, every mutation may have been detrimental to HIV-1 viability (42, 71). These maturational escape patterns may be explained by the selective pressure acting on CTL epitopes driving virus escape via multiple pathways through single amino acid mutations. The simultaneous presence of two forward mutations may be too debilitating for variant survival (in almost all cases, the mutation corresponded to a sharp drop in database frequency), resulting in mutual exclusion. Each mutation may be detrimental to viral fitness although the second mutation is perhaps sufficiently less so to provide a selective advantage over the primary mutation. Alternatively, the secondary mutations may confer escape against responses elicited by the initial escape mutant although substantial population shifts reflecting these changes occurred in less than 2 weeks. The A-to-S mutation observed in the 5′ upstream region of the Gag B08-restricted potential epitope DCKTILKAL in subject S1 could also be a processing mutation, as previously reported in R1 (41). Detailed analysis of the functional consequences of these changes and CTL responses toward each of these variants will shed light on the selective forces driving these alternative mutations. We conjecture that this sequential, maturational pattern of linked mutually exclusive mutations might be more flagrant in acute infections as an unstable mutation might rapidly be removed by selection. Later in infection, other new mutations might serve as compensatory sites for previously deleterious mutations, and mutually exclusive patterns may be harder to identify due to the readily available set of potential compensatory mutations in a diverse viral population that could be obtained via recombination.
The results reported here should influence the design of vaccine immunogens. For example, understanding the forces (selective or stochastic) acting on the establishment of the founder strain(s) in HIV-1 infections can help in the design of vaccines that take into account evolutionary pathways shared among founder viruses. Recognition of the dynamic evolution of CTL epitopes will assist efforts to develop antigen cocktails that seek to block escape pathways. However, our studies illustrate the difficulty in blocking such a dynamic repertoire of antigenic determinants.
Funding for this study was provided to J.I.M. by U.S. Public Health Service grants P01AI57005 and R37AI47734, to J.T.H. and J.I.M. by the University of Washington Center for AIDS Research (P30 AI27757), to J.T.H. by NIH T32 AI07140, and to M.R. by an amfAR Mathilde Krim Fellowship, 107005-43-RFNT.
Published ahead of print on 18 May 2011.