|Home | About | Journals | Submit | Contact Us | Français|
We examined consecutive protease (PR) and reverse transcriptase (RT) sequences from human immunodeficiency virus (HIV) type 1—infected individuals, to distinguish changes resulting from sequence evolution due to possible superinfection. Between July 1997 and December 2001, 2 PR and RT samples from 718 persons were sequenced at Stanford University Hospital. Thirty-seven persons had highly divergent sequence pairs characterized by a nucleotide distance of >4.5% in PR or >3.0% in RT. In 16 of 37 sequence pairs, divergence resulted from the loss of mutations during a treatment interruption or from the gain of mutations with reinstitution of treatment. tat and/or gag sequencing of HIV-1 from cryopreserved plasma samples could be performed on 15 of the 21 divergent isolate pairs from persons without a treatment interruption. The sequences of these genes, unaffected by selective drug pressure, were monophyletic. Although HIV-1 PR and RT genes from treated persons may become highly divergent, these changes usually are the result of sequence evolution, rather than superinfection.
Human immunodeficiency virus (HIV) type 1 genotypic drug-resistance testing performed by sequencing the protease (PR) and reverse transcriptase (RT) genes has, in many countries, become part of the routine care of infected individuals receiving antiretroviral therapy [1, 2]. Individuals experiencing persistent viremia may have 2 of their isolates sequenced at different times. The extent to which different isolates from the same individual diverge has implications for sequence quality control and provides an opportunity to detect superinfection. We examined PR and RT sequences of viruses consecutively isolated from the same individuals, to determine whether sequence differences could have resulted from superinfection with a second HIV-1 isolate, rather than from the accumulation of changes in the original virus population.
Between 1 July 1997 and 31 December 2001, the Stanford University Hospital (SUH) Diagnostic Virology Laboratory sequenced the PR and RT genes of 4366 HIV-1 isolates obtained from 3155 individuals in northern California at the request of their physicians. The present study is based on the HIV-1 isolates of those patients who had 2 samples sent for sequencing at least 1 month apart. The study was approved by the Stanford University Panel of Human Subjects in Medical Research. Human experimentation guidelines of the US Department of Health and Human Services, the Stanford University institutional review board, and the University of California, San Francisco, institutional review board were followed in the conduct of this research.
The complete PR and RT positions 1–250 of plasma HIV-1 were sequenced as described elsewhere . In brief, RNA was extracted from 0.2 mL of plasma using the guanidinethiocyanate lysis reagent in the AMPLICOR HIV Monitor test kit (Roche Diagnostic Systems). Reverse-strand cDNA was generated from viral RNA, and first-round polymerase chain reaction (PCR) was done with Superscript One-Step RT-PCR (Life Technologies). Direct PCR (population based) cycle sequencing was performed using AmpliTaq DNA fluorescent sequencing polymerase and dRhodamine terminators (Applied Biosystems). Electropherograms were generated using an Applied Biosystems Model 377 sequencer, and sequences were assembled using the manufacturer’s Factura and Auto Assembler sequence analysis software.
For quality control purposes, each sequence was examined manually and analyzed by software developed in the laboratory. This software compared each new sequence with all sequences generated within the preceding 2–3 months. Sample sequences having an uncorrected nucleotide distance of <2.0% different from a previous sequences were flagged and examined for the possibility of laboratory contamination. Each sequence was also compared with all previous sequences from the same individual. Sequences having an uncorrected nucleotide distance >3.5% different from a previous sequence were flagged and examined for the possibility of a sample mix-up. Approximately 1–2 confirmed sample mix-ups per year were identified in this manner. These sample mix-ups were excluded from the analysis described here.
We used 3 approaches to identify individuals whose viral sequences diverged the most over time (figure 1): measuring the nucleotide distance between consecutive isolates from the same individual; performing phylogenetic analyses of all the sequences produced by the laboratory, to determine whether sequences from the same individual were clustered with one another, rather than with the sequences of any other individual tested by the laboratory; and developing a genetic pattern similarity (GPS) score for determining whether 2 sequences shared uncommon amino acid substitutions. The 3 approaches were applied to PR and RT separately and were used to identify individuals whose treatment histories would be reviewed and whose isolates would undergo sequencing of tat and/or gag—genes that are not under direct antiretroviral selection pressure (figure 1).
Six different measures of nucleotide distance between 2 sequences were generated, depending on whether positions known to be associated with drug resistance were excluded and on how ambiguous nucleotides (indicative of mixtures) were handled. PR positions considered to be associated with drug resistance included codons 10, 20, 24, 30, 32, 33, 36, 46, 47, 48, 50, 53, 54, 63, 71, 73, 77, 82, 84, 88, 90, and 93. RT positions considered to be associated with drug resistance included codons 41, 44, 62, 65, 67, 69, 70, 74, 75, 77, 100, 101, 103, 106, 108, 115 116, 118, 151, 179, 181, 184, 188, 190, 210, 215, 219, 225, 227, and 230 .
Mixtures were handled in 1 of 3 ways: all differences between the aligned nucleotides in 2 sequences were counted as complete differences (Hamming distance), differences between the aligned nucleotides in 2 sequences were ignored if there was any overlap between nucleotides at a single position (i.e., if one sequence had a Y, indicating a mixture of C and T, and the second sequence had a C; unweighted distance), or differences were scored according to the extent of overlap between the 2 nucleotides being compared (mixture-weighted distance) . For example, an R and an A mismatch was assigned a 0.5 difference, whereas a Y and an A mismatch was assigned a 1.0 difference.
Separate neighbor-joining trees of PR and RT were constructed from the matrices of mixture-weighted distances between all 4366 isolates . Isolates from the same individual were considered to be monophyletic if they were the sole descendants of their most recent common ancestor. Isolates from the same individual were considered to be paraphyletic if they shared their most recent common ancestor with at least 1 other isolate.
We created position-specific profiles of the PR and the RT from the 4366 sequences performed by the SUH diagnostic laboratory. Each profile contained the proportion of sequences containing each amino acid at each position along both genes within the 4366 sequences. The profile, therefore, was based on sequences from both treated and untreated patients, as well as sequences from patients with different subtypes (although ~99% of sequences belonged to subtype B). The GPS score at a position was defined as 0 when 2 sequences had different amino acids at the same position. Otherwise, the GPS score at a position was assigned a score of —log10(p), where p is the proportion of sequences containing the shared amino acid at that position. Thus, the GPS score at a position was also 0 if 2 sequences shared an absolutely conserved amino acid (p = 1; log10(1) = 0). The GPS score at a position was high if 2 sequences shared an uncommon amino acid (i.e., p = 0.001; —log10(0.001) = 3). The total GPS score between 2 sequences was calculated by adding the GPS scores at positions 1–99 in the PR and 1–250 in the RT.
To identify GPS scores that strongly suggest that 2 sequences were derived from isolates from the same individual rather than from different individuals, we compared the distribution of GPS scores of all pairs of sequences from different persons (3155 choose-2) using a bootstrap sample of 1,000,000 pairs with the pairs of sequences from those individuals with >1 sequence (718 individuals). On the basis of the distribution of GPS scores from different persons for the PR and RT, we chose a threshold that strongly suggested that 2 isolates were obtained from the same, rather than a different, individual.
The treatment histories of persons with sequences that had a high mixture-weighted distance and low GPS score or were paraphyletic were reviewed to determine whether a prolonged treatment interruption of either all PR and/or RT inhibitors took place that could explain a large sequence change. If no such treatment interruption occurred and if cryopreserved plasma samples were available for both isolates, we sequenced fragments of the gag and/or tat genes of virus from these samples (figure 1).
Viral RNA was extracted from plasma using the Qiagen viral RNA kit (Qiagen). Eluted RNA was converted to cDNA by incubation with 50 μg of random 6-nt-long oligomers, 1 μL of 10 mmol/L dNTP, 1 μL of 200 U/μL Moloney murine leukemia virus—RT (Life Technologies), and a master mix for 1 h at 37°C. First-round primers for tat were TatED1 5′-GCAGGAGTGGAAGCCATAATAAG-3′ (HXB2 position, 5721–5743) and TatED2 5′-TTCTATGAATACTATGGTCCACACAACTAT-3′ (HXB2 position, 6119–6148). Second-round primers were TatED3 5′-GAATTCTGCAACAACTGCTGTTTAT-3′ (HXB2 position, 5743–5767) and TatED4 5′-ATTGCTGCTACTACTAATGCTACTATTGC-3′ (HXB2 position, 6083–6111). First-round primers for gag were described elsewhere . The second-round primers were p17EDHMA5 5′-GTGCGAGAGCGTCAGTATTAAGCG-3′ (HXB2 position, 794–817) and p17EDHMA3 5′-TTTCTTACTTTTGTTTTGCTCTTCC-3′ (HXB2 position, 1104–1128).
All RNA extractions, reverse transcription, and preparation of the first-round PCR tubes were performed in a preamplification room free of amplified HIV products. After purification, dideoxy terminator reactions of tat and gag PCR products were initiated using the second-round PCR antisense primer. Sequence products were resolved using an ABI 3100 capillary sequencer. Sequences of gag and tat genes were submitted to GenBank (accession nos. AY178912–AY178931 and AY178932–AY178961, respectively).
Between 1 July 1997 and 31 December 2001, 4366 HIV-1 PR and RT isolates from 3155 individuals were sequenced. Seven hundred eighteen individuals submitted isolates for sequencing more than once. The mean number of sequences per individual was 2.5 (range, 2–6 sequences/individual), and a total of 1061 pairs of sequences were examined. The mean time between sequences was 12.2 months (range, 1–46 months), and the total time between sequence pairs was 1072 person-years.
Figure 2 shows the distribution of nucleotide distances between consecutive pairs of sequences from the same individual, by gene (PR or RT) and method for measuring nucleotide distance. Figures 2A and 2C show the distribution of unadjusted nucleotide distances between sequence pairs through PR positions 1–99 and RT positions 1–250, respectively. Figures 2B and 2D show the distribution of distances between sequence pairs of PR and RT, excluding codons associated with drug resistance (mutation-adjusted distance). Each of the graphs in figure 2 shows the distribution of distances using 3 approaches for handling mixtures: scoring mixtures as complete differences (Hamming distance), calculating a weighted distance on the basis of the components of a mixture (mixture-weighted distance), and ignoring mixtures.
The gene, the decision to include or exclude drug resistance positions (mutation adjustment), and the method for handling mixtures all influenced the distribution of nucleotide distance. Table 1 shows the various medians and the 95% and 99% quantiles for the greatest distances between the 1061 sequence-pairs, stratified by gene, mixture handling technique, and mutation adjustment. As expected, mutation-adjusted distances were lower than unadjusted distances, and mixture-weighted distances were lower than Hamming distances but higher than distances that ignored mixtures. For all methods, the median distance and 95% and 99% quantiles were higher for PR than for RT sequences.
In the PR neighbor-joining tree, 962 PR sequence-pairs were monophyletic, and 99 were paraphyletic. In the RT neighbor-joining tree, 1054 sequence-pairs were monophyletic, and 7 were paraphyletic. The median number of weeks between PR (59 vs. 43 weeks; P < .001) and RT sequences (60 vs. 46 weeks; P = .2) and the median nucleotide distance between PR (3.3% vs. 0.8%; P < .001) and RT (3.0% vs. 0.9%; P < .001) were higher for paraphyletic than for monophyletic sequence pairs.
Figure 3 shows the distribution of GPS scores obtained by comparing all PR and RT sequences from different individuals and all PR and RT sequences from the same individual. The PR GPS scores of sequence pairs from different individuals were 0.3–9.6, whereas the scores of sequence pairs from the same individual were 0.9–17.3. The RT GPS scores of sequence pairs from different individuals were 0.9–17.9, whereas the scores of sequence pairs from the same individual were 2.5–25.5. On the basis of the empirical distribution of the GPS interindividual scores, we determined that 2 PR sequences with a GPS score >7 or 2 RT sequences with a GPS score >9 were ~10,000 times more likely to be from the same individual, rather than different individuals.
Figure 4 summarizes the nucleotide distances, phylogenetic analyses, and GPS scores for each pair of PR and RT sequences from the 718 individuals with >1 available sequence. Thirty-seven individuals had highly divergent sequence pairs (characterized by a nucleotide distance of at least 4.5% in PR or 3.0% in RT) and a high GPS score (>7 in PR or >9 in RT). Twenty-four of these individuals had sequence pairs that were paraphyletic (21 in the PR tree, 2 in the RT tree, and 1 in both trees). The remaining paraphyletic sequence pairs contained at least 1 sequence that was very close to the root of the tree (median, 2.5%), which makes the absence of clustering less meaningful.
In 16 of the 37 individuals with divergent sequence pairs, the sequence change occurred during a treatment interruption (14 individuals) or followed the resumption of therapy after a treatment interruption (2 individuals). Of the remaining 21 individuals, 18 had a treatment change but no interruption, and 3 had treatment histories that were unavailable or considered to be unreliable.
We sequenced the tat and/or gag genes of 15 of 21 divergent sequence-pairs from individuals without a treatment interruption from whom cryopreserved samples were available. Phylogenetic trees created from the tat genes of 14 individuals and from the gag genes of 15 individuals (along with 20 additional San Francisco Bay Area control subjects) showed that all sequence pairs were monophyletic. Figure 5 shows the phylogenetic tree of the tat sequences. The median nucleotide distance between the 14 pairs of consecutive tat genes was 0.9% (range, 0%–2.0%). The median nucleotide distance between the 15 pairs of consecutive gag genes was 2% (range, 0%–5.0%). Table 2 summarizes the nucleotide distances, GPS scores, phylogenetic clustering results, and amino acid mutation changes for these 15 isolates.
Sequence divergence between PR or RT sequences obtained at different times from HIV-1—infected individuals receiving antiretroviral treatment may result from the acquisition or loss of mutations at positions associated with drug resistance [7–10], from genetic bottlenecks in which a new therapy selects for preexisting rare variants with differences at positions not associated with drug resistance [11, 12], from mix-ups with a sample from a different person , and from superinfection with a virus from a different person [14–16]. Of these possibilities, superinfection is the most interesting to clinicians and researchers because of its implications for protective immunity, viral interference, and the development of recombinant virus strains . We have, therefore, focused our discussion on the ability of our data to detect superinfection. However, our study also provides new data on the extent to which PR and RT sequences of HIV-1 isolates from treated individuals may change over time. An understanding of these data is essential for maintaining the quality of PR and RT sequencing in laboratories performing genotypic resistance testing.
The present study shows that, although HIV-1 PR and/or RT genes from treated persons may become highly divergent, this divergence almost always results from intrahost sequence evolution, rather than from superinfection. We arrived at this conclusion by identifying 37 (5%) individuals with the most-divergent isolates from our sample of 718 individuals, reviewing their treatment histories, and then sequencing tat and/or gag—genes that are not under selective drug pressure—from these individuals. For 16 of these 37 individuals, the sequence divergence was consistent with a documented treatment interruption or with the resumption of therapy after a treatment interruption. For another 15 individuals, paired tat and/or gag genes were monophyletic, which argues against superinfection. Stored samples were not available for tat and gag sequencing for the remaining 7 individuals.
Although numerous cases of simultaneous infection with 2 distinct HIV-1 strains have been reported [17–20], there have been only 4 well-documented cases of superinfection, in which a second virus infected a person well after an initial infection [14–16]. All 4 cases occurred in persons who were monitored prospectively after the identification of primary infection. In 3 of the 4 cases, superinfection occurred with a virus belonging to a different subtype than the primary strain [14, 15].
Approximately 99% of isolates in our cohort belong to subtype B , which makes it more difficult to detect superinfection and requires the use of 2 new methods for assessing sequence divergence: the mixture-weighted distance allows the calculation of nucleotide distances between sequences containing nucleotide mixtures , whereas the GPS score allows the detection of signature polymorphisms within an infected individual that may be present even in the absence of phylogenetic clustering. The GPS method bears some similarity to the signature pattern analysis method of Korber et al.  but differs in that it evaluates 2 sequences at a time from a large set of sequences for which the amino acid profile (distribution of variants) at each position is known.
Each of the previous published cases of superinfection was initially identified by population-based sequencing. We cannot, however, exclude the possibility that superinfection occurred in our cohort but was undetected, because the superinfection virus remained a minor variant. To detect these cases, it would be necessary to sequence multiple clones from each of the plasma samples, rather than restricting the analysis to those samples with the most genetic divergence. Such a strategy would be optimally suited to a small cohort of persons at high risk of superinfection, rather than a large cohort like ours in which exposure history was not available.
Because HIV-1 is likely to undergo recombination when 2 viruses infect the same cell, we also examined sequences for the possibility that a second virus may have recombined with virus present in the first plasma sample. We divided each of the PR sequences into 2 segments of ~150 nt and each of the RT sequences into 5 segments of 150 nt and calculated the divergence between matched gene segments from the same person. This analysis identified 61 additional persons with virus isolates having PR or RT segments with a mixture-weighted distance >4.5% (~7 nt) between at least 1 pair of matched 150-nt segments. In nearly all these additional cases, divergence appeared to be caused by a major change in treatment (usually an interruption) or by a few unexplained nucleotide changes (data not shown), which suggests that, even if recombination with a superinfecting strain had occurred, it was extremely rare.
Surveillance data from the San Francisco Department of Health suggest that, among persons with the most common risk factors for HIV-1 in Northern California (e.g., male homosexuality and intravenous drug use), there is an ~1%–2% annual incidence of new HIV-1 infection [23, 24]. Prospective studies in cohorts of individuals for whom risk behavior is documented have the potential to better define the incidence of superinfection. However, such studies are difficult to perform. Although we do not know the risk profile of our cohort, it is likely that ~10–20 new cases of HIV-1 would have been expected during 1072 person-years of follow-up had the individuals in the cohort not already been infected.
Superinfection may be prevented as a result of partial immunity, the effect of antiretroviral drugs on superinfection with a drug-susceptible strain, or viral interference from the original virus strain. However, we cannot quantify the risk of superinfection because we do not know the extent of HIV-1 exposure within our cohort and because we cannot completely exclude the possibility that some cases of superinfection escaped detection. Therefore, infected individuals, even those receiving anti-retroviral therapy, should continue to avoid activities that could transmit HIV-1 to others or increase their risk of a second infection.
Financial support: National Institutes of Health (NIH; grant AI-46148 to M.J.G. and R.W.S.); Centers for Disease Control and Prevention (grant U64/CCU917889-01) and NIH (grant AI-447320) to E.D. and R.T.