|Home | About | Journals | Submit | Contact Us | Français|
Identifying the specific genetic characteristics of successfully transmitted variants may prove central to the development of effective vaccine and microbicide interventions. Although human immunodeficiency virus transmission is associated with a population bottleneck, the extent to which different factors influence the diversity of transmitted viruses is unclear. We estimate here the number of transmitted variants in 69 heterosexual men and women with primary subtype C infections. From 1,505 env sequences obtained using a single genome amplification approach we show that 78% of infections involved single variant transmission and 22% involved multiple variant transmissions (median of 3). We found evidence for mutations selected for cytotoxic-T-lymphocyte or antibody escape and a high prevalence of recombination in individuals infected with multiple variants representing another potential escape pathway in these individuals. In a combined analysis of 171 subtype B and C transmission events, we found that infection with more than one variant does not follow a Poisson distribution, indicating that transmission of individual virions cannot be seen as independent events, each occurring with low probability. While most transmissions resulted from a single infectious unit, multiple variant transmissions represent a significant fraction of transmission events, suggesting that there may be important mechanistic differences between these groups that are not yet understood.
The development of a human immunodeficiency virus (HIV) vaccine remains a global priority. Since vaccines need to target viruses at, or close to, the time of infection, there is great interest in understanding both the genetic characteristics of viruses that are successfully transmitted and the genetic diversification that ensues during the earliest stages of infection. There is evidence from cross-sectional studies of individuals with acute and early HIV infections that transmission is associated with a population bottleneck (4, 9, 16, 35, 36, 39, 46-48). The best evidence of transmission bottlenecks is obtained from studies of HIV discordant couples, where infection is associated with transmission of one or a few genetic variants despite high degrees of viral diversity in the transmitting donors (13). A recent study was able to quantify this bottleneck by using methodologies that could identify precisely the transmitted, founder virus population (16). That study found that as many as 76% of transmission events involved productive clinical infection by only a single genetic variant in HIV-1 subtype B transmission by either heterosexual or homosexual routes. These findings are encouraging since in the majority of infections a vaccine-mediated response may have to protect against only a low-multiplicity infection event that is associated with low viral diversity in the initial days and weeks of infection.
However, it is also apparent that the transmission bottleneck can be overcome, as evidenced by the transmission of multiple viral variants (9, 10, 12, 16, 35, 36). The clinical implications of multiple variant transmission are potentially serious because coinfection with genetically divergent viral lineages has been associated with more severe disease progression (10, 12, 37). Current estimates of the frequency of multiple variant transmissions vary widely with between 0 and 50% of successful sexual transmissions estimated to involve the transfer of multiple genetic variants (9, 16, 35).
It is still an open question as to whether there are distinctive features of viruses that enhance their transmissibility and whether different risk behaviors are associated with different numbers of transmitted viruses. For example, intravenous infection has been associated with more heterogeneous infections than intravaginal infection in a rhesus macaque model (11). Besides the potential identification of previously unrecognized transmission risk factors, attempts to discover the variables influencing multivariant transmission event frequencies should also provide insights into natural barriers to HIV infection. Unfortunately, because of methodological variations between studies it has not been possible to directly compare results and to assess the impacts of biological factors such as gender, routes of transmission, the presence of sexually transmitted diseases and viral subtype on multiple variant transmission frequencies (39). Complex cofactors in a given risk category may impact rates of transmission of multiple variants just as they impact overall transmission rates (32).
In the present study, we investigated the genetic characteristics of subtype C variants transmitted via a heterosexual route in order to understand mechanisms involved in HIV transmission at genital sites. We used here the approach described by Keele et al. (16), which allowed us to directly compare multiple transmission frequencies between subtype B and C viruses. Our results indicate that similar proportions of subtype B and subtype C infections involve the transmission of multiple variants. Analysis of a total of 171 transmission events in both the subtype B and C studies showed that infection with multiple variants does not follow a Poisson distribution, suggesting that transmissions of multiple variants are not independent events in a setting of a low probability of infection. The question of the biological basis of the mucosal viral transmission bottleneck and factors associated with its breach, however, remains unanswered.
Plasma was obtained from 69 individuals experiencing early HIV-1 subtype C infection. Twenty-six were identified through prospective monthly monitoring of HIV-negative women using rapid HIV tests and RNA PCR (44) (CAPRISA 002 cohort, Durban, South Africa); 22 were identified (i) with negative rapid HIV or enzyme immunoassay assay (EIA) antibody test results but positive HIV RNA and/or p24 antigen test results or (ii) who were antibody positive but antibody negative or Western blot indeterminate within the previous 45 days (CHAVI 001 cohorts, South Africa and Kamazu Central Hospital Sexually Transmitted Disease Clinic, Malawi). An additional 21 individuals enrolled from the Sexually Transmitted Disease Clinic in Malawi were identified as HIV PCR positive with a rapid HIV or EIA antibody-negative test or who had indeterminate HIV Western blot results (30).
This study was approved by institutional review boards, and all participants provided written informed consent.
The durations of HIV-1 infection were categorized into six stages based on evolving HIV-1 RNA or antibody profiles developed by Fiebig et al. (8). Plasma was tested for HIV RNA by using Roche Amplicor vRNA assays (Rotkreuz, Switzerland) and for antibodies by EIA (BEP 2000 [Dade Behring, Marburg, Germany] or Determine AntiHIV-1/2 3rd Generation EIA [Abbott, Illinois] and Uni-Gold Recombigen [Trinity Biotech, Ireland]) and a GS HIV-1 Western blot analysis kit (Bio-Rad, Washington). Individuals classified as being in stage I were viral RNA positive and p24 antigen and EIA antibody negative, those in stage II were RNA and p24 antigen positive but EIA antibody negative, those in stage III were EIA antibody positive but Western blot negative, those in stage IV had an indeterminate Western blot result, those in stage V were Western blot positive but without reactivity to the p31 integrase band, and those in stage VI were Western blot positive with a p31 band present.
HIV-1 RNA was extracted from 140 to 200 μl of ACD or EDTA plasma and eluted in a 50-μl final volume. The full volume of RNA was reverse transcribed to cDNA (100-μl reaction volume) by using a Superscript III reverse transcriptase (RT) system (Invitrogen, California) with an OFM-19 primer as described previously (16) or with oligo(dT). The cDNA was then serially diluted to obtain no more than 30% positive amplification reactions so that each amplicon would theoretically be amplified from a single template more than 80% of the time (39). This limiting dilution approach of env gene amplification was initially described by Simmonds et al. (41) and Edmonson and Mullins (7) and was thereafter modified by Palmer et al. (29) by sequencing of the amplicon directly and finally modified by Salazar-Gonzalez et al. (39), who showed that single genome amplification of the env gene with direct sequencing precludes recombination and Taq-induced error, and provides proportional representation of each viral sequence. PCR products were directly sequenced by using an ABI 3000 genetic analyzer (Applied Biosystems, Foster City, CA) and BigDye terminator reagents. To ensure that sequences reflected single templates from the viral populations in vivo, amplicons with sequence chromatograms with “double peaks,” indicative of coamplification of more than one template, were excluded. Sequences with deletions larger than 100 nucleotides compared to the intraparticipant consensus were excluded.
Differences in sequences were visualized by using neighbor-joining trees (MEGA 3.1) (43) and Highlighter nucleotide transition and transversion plots (www.hiv.lanl.gov). Pairwise DNA distances were computed by using MEGA 3.1.
Conformance of intraparticipant sequence diversity to a mathematical model of random evolution was evaluated as described by Keele et al. (16) and Lee et al. (H. Y. Lee, E. E. Giorgi, B. F. Keele, B. Gaschen, G. S. Athreya, J. F. Salazar-Gonzalez, K. T. Pham, P. A. Geopfert, J. M. Kilby, M. S. Saag, E. L. Delwart, M. P. Busch, B. H. Hahn, G. M. Shaw, B. T. Korber, T. Bhattacharya, and A. S. Perelson, submitted for publication) whereby exponential viral replication from a single lineage is assumed to fix mutations at a constant rate, in the absence of positive selection, using the following parameters: an HIV-1 generation time of 2 days (24), a reproductive ratio of 6 (i.e., assuming that the virus replicates exponentially, for each currently infected cell six new cells will be infected in the next generation) (42), and a replication error rate of 2.16 × 10−5 substitutions per site per replication cycle (23).
Time of divergence from the most recent common ancestor (MRCA) was estimated by using BEAST (i.e., Bayesian Evolutionary Analysis Sampling Trees, v1.4.7) (5, 6) with a relaxed (uncorrelated exponential) molecular clock and general time-reversible substitution model, with relative substitution rate parameters estimated by using HyPhy (31), as described previously (16). We used a gamma distribution with four categories and a proportion of invariant sites to model rate heterogeneity across sites. Substitution rates were unlinked across codon positions with a mean fixed at 2.16 × 10−5 substitutions per site per generation (23).
Sequences were analyzed for evidence of APOBEC3G-induced hypermutation by using the Hypermut 2.0 tool (www.hiv.lanl.gov). Sequences with a P value of ≤0.1 were considered enriched for mutations consistent with APOBEC3G signatures. In sequence sets showing evidence of enrichment for APOBEC3G-driven G-to-A transitions but with no single significantly hypermutated sequence, hypermutation was tested for after superimposition of all mutations within that sequence set onto a single representative sequence.
We used randomization to test whether there was evidence of clustering of mutations within 10-amino-acid stretches putatively associated with cytotoxic-T-lymphocyte (CTL) immune responses (14). Sites were classified as mutated in a given patient if there was at least one sequence from the patient differing from the patient consensus at that site. For each mutated site, we calculated its nearest-neighbor distance as the distance between the site and the closest mutated site to the left or right of it on the intrapatient sequence alignment. We then compared the number of mutations with a nearest neighbor within 10 amino acids to the number expected by chance by randomizing the locations of the mutated sites 1,000 times. A P value was calculated as the fraction of randomized datasets for which the proportion of mutated sites within 10 amino acids of another mutated site was equal to or greater than the observed proportion. The null hypothesis here is random distribution of mutated sites, which is consistent with the model of neutral drift of the infecting virus in acute infection, proposed by Keele et al. (16). Rejection of the null hypothesis suggests clustering of the mutations on a scale consistent with escape from CTL responses.
Multivariant transmission was considered if (i) within-patient env diversity was heterogeneous, with multimodal distribution of pairwise Hamming distances (HDs; that is, the number of differing sites between sequences) and structure within the phylogenetic trees, and (ii) if these deviations from the model of random evolution from a single founder virus could not be accounted for by APOBEC3G mutations, immune pressure, or stochastic events. For individuals infected with more than one variant, our expectation was that when multiple variants were transmitted, the sequences in the recipient should coalesce at a time predating the estimated time of infection. The number of infecting variants was enumerated, after accounting for recombination, by identification of distinct lineages on phylogenetic trees, together with examination of Highlighter transition and transversion plots (www.hiv.lanl.gov).
N-linked glycosylation sites were identified by using the N-glycosite program (www.hiv.lanl.gov). HXB2 env protein amino acid locations for putative epitopes were obtained by using the HIV sequence locator tool (www.hiv.lanl.gov). Motif Scan (www.hiv.lanl.gov) was used to detect HLA anchor residue motifs within putative epitopes and to search for matching potential epitopes.
The GARD (www.datamonkey.org/GARD/) (17, 18), RAP Beta version (www.hiv.lanl.gov), and RDP version 3.27 (http://darwin.uvigo.es/rdp/rdp.html) (25) tools were used to detect recombination in intraparticipant sequence sets.
We have assumed that established infection with a single genotypic variant signifies that a single virus particle was involved in the transmission event and that, in the setting of low probability of transmission, this represents the minimal infectious dose. The Poisson distribution was used to model the frequency of transmission of one, two, or more variants under the assumption that the transmission of each variant occurs with the same probability, i.e., as independent events. A left truncated Poisson model was used (since the zero events, i.e., no transmission, are not observed), and the model fitted to all of the data with a maximum-likelihood method; this then allowed the frequency of zero events to be estimated given the distribution of one, two, etc., variants being detected. A corresponding transmission probability was estimated as the sum of all probabilities for observing one or more variants.
Categorical variables were compared by using Fisher exact tests (two-tailed). P values of <0.05 were considered significant.
The GenBank accession numbers for the 1,505 env sequences are FJ443128 to FJ444362.
Blood plasma samples were obtained from 69 study participants (30 men and 39 women) from South Africa and Malawi who had recently acquired HIV-1 infection through heterosexual contact (Table (Table1).1). A total of 42 of the participants were in the very early stages of infection: 22 were in stage I/II (viral RNA positive, p24 antigen and EIA antibody negative, or RNA and p24 antigen positive but EIA antibody negative), 6 were in stage III (EIA antibody positive but negative by Western blotting), and 14 were in stage IV (EIA antibody positive with an indeterminate Western blot result). Individuals in these stages had high viral loads (mean of >6 million copies/ml) typical of acute infection (Fig. (Fig.1).1). Twenty-seven participants were in the later stages of primary infection, eighteen in stage V (Western blot positive but without reactivity to the p31 integrase band) and eight in stage VI (Western blot positive with a p31 band present), with lower viral loads (medians of 84,572 and 21,176 copies/ml, respectively); one individual was classified as being in stage V or VI, with reactivity to p31 integrase not yet determined.
A total of 1,505 env gene sequences encoding gp160 (with an average of 22 sequences per participant; range, 15 to 42) were generated from the 69 participants using single genome amplification and direct sequencing of amplicons. All sequences were classified as HIV-1 subtype C (Fig. (Fig.2).2). Fourteen participants clearly harbored highly heterogeneous viral populations (maximum intrapatient DNA distances of 0.73 to 8.49%) forming more than one phylogenetic lineage. These individuals were postulated to be infected with multiple variants and were further characterized to quantify the number of infecting variants. The remaining 55 participants harbored virus populations with lower diversity (maximum intrapatient DNA distances ranging from 0.08 to 0.64%) with sequences clustering as single lineages with no, or very little, structure within phylogenetic trees (see Tables S1A to S1C in the supplemental material). On further analysis, one of the 55 individuals with a low diversity env population, participant 1176, was subsequently found to be infected with three closely related viruses.
All sequences were initially analyzed for evidence of APOBEC3G-induced G-to-A hypermutation and, after removal of these mutations from the relevant sequence sets, all sequences were compared to a model of random evolution in which infection with a single founder virus is followed by neutral expansion characterized by exponential population growth (16). This is expected to result in a star-like phylogeny and a Poisson distribution for the intrapatient HD. Sequence sets that did not conform to the model after the removal of the G-to-A hypermutation were analyzed for evidence of immune pressure, selection, or evidence of scattered APOBEC3G-related hypermutation, although not detected as significant (P > 0.1), to determine possible reasons for the deviation from the model. Finally, the time of infection estimated from laboratory staging was compared to the estimated time to the MRCA of the sequences.
Two study participants were identified as being a donor-recipient pair based on their sexual history and the close phylogenetic linkage of their viruses (Fig. (Fig.2).2). Participant 703010131 was assumed to be the donor since he was classified as being in stage III infection, while the assumed recipient, participant 703010159, was found to be in stage I/II.
Of the 55 individuals harboring viruses with low diversity, sequences from 30 individuals displayed Poisson distributed HDs and a star-like phylogeny consistent with infection with a single founder virus and with the subsequent incorporation of randomly distributed mutations (Fig. (Fig.3A3A and see Table S1A in the supplemental material). Sequences from a further five individuals conformed to the model following the removal of G-to-A substitutions embedded in APOBEC3G signature patterns after showing significant enrichment of these substitutions within a single sequence and/or within the overall participant sequence set with no single overtly hypermutated sequence (P < 0.1) (see Table S1B in the supplemental material). In a sixth individual with significant APOBEC3G-driven hypermutation (P = 0.0067), participant CAP85, sequences did not conform to a star-like phylogeny even after removal of APOBEC-driven substitutions (data not shown). This was not investigated further. Sequence sets for an additional two individuals (CAP225 and 704810053) conformed to the model following removal of scattered G-to-A mutations, despite the fact that this hypermutation was not significant (P > 0.1) (see Table S1B in the supplemental material).
Thus, in seven of eight individuals, the model violation prior to the removal of G-to-A substitutions would likely be due to an increased relative mutation rate in these sites. This result is similar to that reported for a small set of individuals identified in the acute subtype B-infected cohort of Keele et al. (16), where an overall enrichment of G-to-A substitutions in APOBEC motifs was scattered throughout the available sequences in 7 of 81 homogeneous infections, with an additional six subjects carrying one or more overtly hypermutated sequence. The balance between HIV-1 Vif and APOBEC3G (22) may be altered in such cases. In addition, a recent study suggests that this type of pattern may be associated with improved clinical outcome (19).
The remaining 17 of the 55 individuals with low-diversity env populations had no evidence of enrichment for G-to-A hypermutation but harbored virus populations that did not conform to the model due to their having either non-Poisson-distributed HDs and/or non-star-like phylogenies. Six of these individuals were classified as infected with a single founder virus based on divergence within a single lineage (n = 2) and/or on patterns of shared mutations between sequences (n = 4) (i.e., internal branching in the intrapatient tree inconsistent with the expected star-like phylogeny). These shared mutations were possibly due to stochastic accumulation of neutral mutant alleles, either as neutral mutations that occurred very early in the infection and could thus be established and retained in a high enough frequency to be sampled or due to very early CTL escape mutations.
A further 7 of these 17 individuals had evidence of early antibody pressure with sequences showing loss or gain of N-linked glycosylation sites or changes in envelope loop length; all 7 were recruited in stages IV to VI of infection (Fig. (Fig.4).4). In four of the seven participants, these changes occurred in the V1/V2 loop region, which may indicate that this area is an early target for immune pressure during primary infection.
Three of the seventeen individuals had evidence of cellular immune pressure. In one individual there was a mutation associated with a known epitope (CAP8, HLA-B*0801). In the other two individuals there was clustering of mutations within 10-mers (participants 70310054 and 705010015) (Fig. (Fig.4),4), a phenomenon previously found to be associated with CTL escape (14). The clustered mutations within these two participants' 10-mers fell within sequences found to contain HLA anchor residue motifs for a range of HLA genotypes and which were also matched to potential epitopes by Motif Scan (www.hiv.lanl.gov). However, since the HLA types for these two individuals are not known, the inference of putative CTL escape remains speculative, and the relevance of these mutations unknown.
Finally, the last individual harboring low diversity virus, subject 1176, had sequences that exhibited shared mutations between sequence subsets and an estimated time to MRCA of 88 days, which exceeded the estimated time of infection since this individual was in stage I/II infection; thus, we infer that this subject was infected with three closely related viruses (average DNA distance, 0.1%) (Fig. (Fig.3B3B).
In summary, of the 55 individuals with low diversity after transmission, 54 were classified as likely to be infected with a single variant, and 1 was classified as likely to be infected with three closely related variants. For participants identified as infected with a single infectious unit, the estimated time to the MRCA was consistent with the expected time of infection with one exception, participant CAP217, whose infecting virus displayed lower genetic diversity (mean, 0.02%) than expected, given this participant was in Fiebig stage IV of infection (see Table S1A in the supplemental material).
After correcting for recombination and hypermutation, phylogenetic analyses of the high diversity sequence sets indicated that all 14 were infected with more than one viral variant (see Table S1D in the supplemental material). Including participant 1176 infected with three closely related viruses brings the total number of individuals with multivariant infections to 15. Interlineage recombination between transmitted variants was observed in 10 of the 14 individuals, using the Recombination Analysis Program (www.hiv.lanl.gov) (Fig. (Fig.5).5). In 9 of these 10 cases, recombination was also detected by GARD or RDP3.27 (18, 19, 26) or both. However, since donor samples were not available for these individuals, the transmission of recombinants cannot conclusively be ruled out. In all 14 sequence sets, the estimated number of days since the MRCA significantly exceeded the period for which the associated individual could realistically have been infected (MRCA range, 605 to 5,998 days).
Therefore, including the individual infected with three closely related variants, a total of 15 individuals were inferred to be infected with multiple viruses. The median number of readily distinguishable infectious units per individual was three (range, two to five), with each viral lineage having a unique pattern of nucleotide variation (Fig. (Fig.5).5). This represents the minimum number of transmitted viruses in these individuals, given the limitations in sampling. While we cannot exclude the possibility that multiple variants were transmitted and a single variant grew out at the time of detection, there is no bias for the detection of multiple variants at earlier Fiebig stages (Table (Table11).
Two of the fourteen participants, CAP37 and 1335, were both infected with viruses with env sequences differing from one another by more than 6% nucleotide sequence identity (Fig. (Fig.2).2). Although the complete env sequences from these individuals clustered as an outlier to the participant sequences, the region encoding gp41 separated into distinct phylogenetic branches separated by epidemiologically unlinked sequences. This suggests that these two subjects had dual infections, although it is not possible to determine whether these individuals were infected by two independent transmission events from different donors or if variants were cotransmitted from a donor who had a dual infection.
Recombinant genomes result from the dual infection of cells, followed by recombination in a subsequent round of infection and thereafter outgrowth, allowing detection. We sought to determine whether the detection of recombination was related to the Fiebig stage. To have a large enough data set, we pooled data from Fiebig stage I to III to compare them to Fiebig stage IV to VI and also included the data reported by Keele et al. (16). We detected the presence of recombinant viruses at a significantly higher rate in the later Fiebig samples (P = 0.0015). In most individuals, we detected each recombinant genome once. While we can conclude from these results that the detection of recombinants becomes more likely with later stages of infection, without corresponding donor samples the possibility that detected recombinants were transmitted cannot be ruled out. In particular, this could be the case with participant CAP69 (stage I/II of infection) in whom we saw the outgrowth of some recombinants (Fig. (Fig.5B5B).
We screened all sequences for evidence of the clustering of mutations. We identified tight clusters of mutations by visual inspection, as well as by means of a test based on randomizing the locations of mutated sites within each patient in order to determine whether these mutations were significantly more clustered than would be expected by chance. We focused specifically on clustering on a length scale of 10 amino acids (roughly the size of a CTL epitope), since the presence of clusters of mutations within a region of approximately this size is consistent with selective pressure resulting in evasion of early cellular immune responses such as CTL responses (14). We determined the number of mutations with a nearest neighbor within 10 amino acids and compared this to the number expected by chance, estimated by randomizing the locations of the mutated sites 1,000 times. Under a model of neutral divergence (16), we expect mutated sites to be distributed randomly throughout the sequence. In addition, tight clustering of mutations is unlikely even if purifying selection were to affect a proportion of mutations. This would thus suggest that mutations within these clusters would be favored by selection.
Significant clustering was identified in 16 individuals (Fig. (Fig.4).4). Nine of these individuals with clustered mutations had single variant infections; six of these subjects (CAP129, CAP217, 0626, 703010193, 703010217, and 706010164) harbored sequences consistent with the model of neutral evolution from a single founder virus (see Table S1A in the supplemental material), suggesting that the virus populations from these individuals were under early selection despite the failure to reject the model of neutral evolution. Clustered mutations were detected in seven individuals with multivariant transmission.
As expected, most participants harboring sequences with evidence of clustered mutations were in later stages of primary infection (stages IV to VI) although, interestingly, three individuals with very early infection (stages I to III) harbored sequences with clustered mutations, providing evidence for very early selective pressure. In the absence of sequence data from donor individuals, however, it is unknown whether any of these mutations were transmitted.
Transmission of HIV-1 is infrequent (32), and the homogeneity of sequences in acute infection suggests that most transmission events represent the minimal infectious dose. For both the subtype B (16) and the subtype C datasets we found that ca. 80% of transmissions result in infection by a single genotypic variant or a single virus particle. To determine whether transmission of multiple variants represents independent but concurrent infectious events where transmission of the first variant had no effect on the probability of transmission of the second (or third) variant, we modeled the number of transmitted variants using the Poisson distribution and estimated a putative transmission probability as the likelihood that one or more variants are transmitted (equal to 1 − the probability that no variants are transmitted). In Fig. Fig.66 we show the expected number of infections with one, two, or more than two variants for when the probability of transmission in a single exposure is 0.1, 0.25, and 0.4. Transmission probabilities in this range are needed to observe the transmission of multiple variants at significant levels. In contrast, a more realistic transmission rate of 0.01, for example, would result in the transmission of two variants once in 10,000 transmission events.
In Fig. Fig.66 we also show the observed number of individuals infected with one, two, or more than two variants in the subtype B and subtype C primary infection cohorts. When these data are fitted to the Poisson model we estimate a transmission rate of 0.48 for the subtype C data set (Poisson mean of 0.65 with a standard error of 0.12) and a transmission rate of 0.55 for the subtype B data set (Poisson mean of 0.80 with a standard error of 0.05). These rates of transmission are unreasonably high compared to reported sexual transmission rates, which have a lower bound of ~0.001 (reviewed in Powers et al. ). We therefore conclude that the transmission of multiple variants does not represent low probability independent events but rather results from either transiently high transmission rates or linked transmission of multiple virions.
Of the 24 individuals monitored for 1 year postinfection, we found no significant difference between viral RNA load set point or CD4+ T-cell counts in participants who were infected with a single variant (19 of 24) compared to those infected with multiple variants (5 of 24) (P = 0.3198 and P = 0.2232, respectively) (Table (Table1).1). However, 4 of 6 (67%) individuals with multiple variant infections had CD4+ T-cell counts consistently below 350 cells/μl and were classified as rapid progressors, whereas only 4 of 20 (20%) individuals that were infected with single variants fell into this category. This association between rapid disease progression and multiple variant transmission (P = 0.051) supports previous studies that have shown that high diversity after infection is associated with increased rates of disease progression (10, 12, 37).
This study is the first to provide a comprehensive assessment of the multiplicity of HIV-1 subtype C infection in the context of heterosexual transmission in men and women. Our findings mirror observations of subtype B transmissions via heterosexual or homosexual routes using the same methodology (16). Together with the subtype B study, these data demonstrate that in 171 individuals a single virus is responsible for infection of 77% of individuals, with 23% of individuals infected with multiple variants. Based on the frequency of the transmission of multiple variants, we find that infection with more than one variant does not occur as independent events at low probability. This implies that transmission of the second variant is linked to transmission of the first variant. Understanding the frequency and cause of multivariant transmission is relevant since individuals infected with multiple variants would require a vaccine that protects against greater initial viral diversity instead of a single homogeneous virus population. In addition, it is clinically important since high diversity following transmission has been associated with faster disease progression (10, 12, 37). In keeping with this observation, the present study found an association between multivariant transmission and disease progression.
Discrepant results due to differences in methodological approaches have hindered a clear understanding of multivariant transmission. A key advantage of our study is that we used the same methodological and analytical approaches to define the founder virus population that was used recently to study subtype B acute infection (16), thus enabling us to clearly enumerate the infecting viruses and also directly compare results. Despite different infecting subtypes and routes of transmission, the frequencies of multivariant transmission were strikingly similar: we report 22% in subtype C heterosexually infected men and women compared to the 24% of participants infected with subtype B via homosexual and heterosexual transmission reported by Keele et al. (16). Phylogenetic analysis indicated that the multiple variants came from a single donor in 87% of the cases (13 of 15 subjects), and the time to the MRCA demonstrates that the variants diverged at times significantly before the transmission event.
This estimated frequency of multivariant transmissions should, however, be considered a minimum. Many infections in highly epidemic regions have been attributed to transmissions during the acute stage of infection (30). Since this stage is generally associated with a highly homogeneous viral population, multiple variant transmissions in these instances could be missed. In addition, we may miss variants present at a low frequency (with a sample size of 20 sequenced amplicons, there is 95% confidence of detecting sequences present at frequencies greater than 15%) (16).
Although we used a model which assumes neutral evolution (16), deleterious mutations will be lost through purifying selection, and early innate and adaptive host responses are likely to impact the apparent mutation rate, especially in participants sampled after peak viremia. We did in fact identify putative immune pressure in acute infection, with a third of the sequence sets containing evidence of putative CTL pressure (based on clustered mutations) or antibody pressure (based on changes in N-glycosylation sites or variable loop length). The rates of mutation were also influenced by APOBEC3G-mediated hypermutation observed in eight individuals with single variant infections. In addition, sequences were under purifying selection with a higher rate of synonymous (dS) compared to nonsynonymous (dN) substitutions (mean dN/dS ratio of 0.79; variance, 0.44). A mean dN/dS ratio of <1 suggests that the rate of diversification of the sequences could be slightly less than the rate estimated under a strict assumption of neutrality. However, the impact of a relatively small departure from neutrality on the estimated times to the last common ancestors of intrapatient sequence sets is likely to be minor.
The rate of HIV transmission is in the range of one transmission event per 1,000 exposures (34; reviewed in Powers et al. ), although two studies reported rates of 31 per 1,000 exposures (26), (40) and 97 per 1,000 exposures(3). However, even rates as high as 0.03 to 0.1 cannot account for a frequency of multivariant transmission of 22 to 24%, if multiple variants are transmitted independently. This suggests that transmission of each variant is not an independent event in the context of a low transmission probability. One explanation of the frequency of multiple variant transmissions is that different cofactors transiently change the rate of multivariant transmission. The distribution of frequency of one, two, and more than two variants can be explained if two rates are incorporated: one rate would account for 70 to 75% of transmission events and have a low probability of transmission with only rare occurrences of the transmission of multiple variants; the second rate would account for 25 to 30% of transmission events. However, a probability of transmission of ~0.8 would be required to result in equal numbers of transmission events of two variants and more than two variants, which would approximate the observed data. It is likely that increased transmission occurs as a result of sexually transmitted infections (38) or traumatic breaks in the epithelium. However, it is also possible that the transmission of multiple variants represents a linked event, i.e., infection by one particle (or genome) is in some way linked to an increased probability of infection with a second particle (or genome). If the infectious unit were an infected CD4+ T cell, which can be infected with multiple viruses (15), this could account for at least some of the multivariant transmission events. A recent report has shown the potential for infected cells to penetrate a disturbed epithelium (45), and the apparent need for infection via infected cells in the case of HTLV-1 provides further support for a cell-mediated mechanism of transmission (33). Alternatively, virus particles could be aggregated by biological molecules such as SEVI (for semen-derived enhancer of virus infection) (27) or tetherin during budding (28), potentiating infection with two particles in a single, rare transmission event.
A previous study from Kenya showed women were generally infected with more heterogeneous virus populations compared to men (21). Although in our study, we did not find that females were infected with higher-diversity viral populations compared to men (data not shown), there were differences in frequency of sexually transmitted infections, with genital ulcerative disease being much more common in men from our study than in women (82% versus 13%). Thus, since genital ulcerative disease has an impact on transmission, this could confound our analysis. In addition, uncircumcised men are more susceptible to infection (1, 2); thus, this difference in results between studies may be due to the fact that most of the men in our study were from Malawi where there is a very low frequency of circumcision, whereas the Kenyan study recruited from a cohort of individuals where 87% were circumcised (34).
In conclusion, infection with a single virus in the majority of individuals demonstrates the severity of the genetic bottleneck at transmission. These data in conjunction with the subtype B analysis suggests a universal observation that mucosal HIV-1 infection most frequently originates from a single infectious unit. Less frequently, multiple viral variants are transmitted, which not only increases the genetic diversity, but this increased diversity also provides the virus with greater opportunity to escape early selective pressure through recombination. Although the biological basis for the transmission of multiple variants remains unknown, possible explanations include transiently high rates of transmission due to cofactors, transmission via a multiply infected cell, or transmission of viral aggregates. Since one in five individuals will become infected with multiple infectious variants, it is important to translate how this information impacts on the breadth and targeting needed for protective vaccination.
This study was funded by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, and the U.S. Department of Health and Human Services (AI51794, CAPRISA; DK49381 [M.S.C.], CHAVI), as well as by the National Research Foundation (no. 67385) (South Africa), the South African AIDS Vaccine Initiative, and amFAR grant 106997-43.
We thank the clinical staff and participants from the CAPRISA, CHAVI, and Malawi STI cohorts; Darren Marten for critical comments; and Leslie Arney for assistance with the graphics. We also thank the clinical staff from the CHAVI Lilongwe cohorts, including Francis Martinson, Gift Kamanga, Happiness Kanyamula, and Deborah Kamwendo, for their support.
Published ahead of print on 4 February 2009.
†Supplemental material for this article may be found at http://jvi.asm.org/.