|Home | About | Journals | Submit | Contact Us | Français|
HIV-1 packages two copies of RNA into one particle, and the dimerization initiation signal (DIS) in the viral RNA plays an important role in selecting the copackaged RNA partner. We analyzed the DIS sequences of the circulating HIV-1 isolates in the GenBank database and observed that, in addition to the prevalent GCGCGC, GTGCAC, and GTGCGC sequences, there are many other minor variants. To better understand the requirements for the DIS to carry out its function, we generated a plasmid library containing a subtype B HIV-1 genome with a randomized DIS, infected cells with viruses derived from the library, and monitored the emergence of variants at different time points until 100 days postinfection. We observed rapid loss of viral diversity and found that the selected variants contained palindromes in the DIS. The “wild-type” GCGCGC-containing virus was a major variant, whereas GTGCAC- and GTGCGC-containing viruses were present at low frequencies. Additionally, other 6-nucleotide (nt) palindromic sequences were selected; a major category of the selected variants contained two GC dyads in the center of the palindrome, flanked by a non-GC dyad. Surprisingly, variants with GC-rich 4-nt palindromes were sustained throughout the selection period at significant frequencies (~12 to 38%); of these, variants containing the CGCGC sequence were observed frequently, suggesting that this sequence has a selection advantage. These results revealed that multiple sequences can fulfill the function of the HIV-1 DIS. A common feature of the selected DIS sequence is a 4- or 6-nt GC-rich palindrome, although not all sequences with these characteristics were selected, suggesting the presence of other unidentified interactions.
Infectious particles of human immunodeficiency virus type 1 (HIV-1) package two copies of RNA, each containing sufficient genetic information for viral replication in susceptible host cells (9, 11). These two copies of RNA do not exist as monomers in the virion but are entwined together as a dimer through noncovalent bonds. Denaturing procedures, such as heat treatment, can convert the RNAs from dimeric to monomeric forms (10). Electron microscopy studies of partially denatured HIV-1 virion RNA revealed that, like in other orthoretroviruses, the two RNAs are linked near the 5′ end, indicating that these sequences contain the strongest dimerized regions (15, 18). Although the precise positions of the dimerized regions have not been directly demonstrated, the dimer initiation signal (DIS) is thought to be a major point of RNA linkage (15).
The DIS was first identified in analyses of short in vitro-generated RNA transcripts as a cis-acting element that promotes HIV-1 RNA dimerization (20, 22, 32, 33, 36, 43). The DIS is located at the loop of a stem-loop structure termed SL1, which is between the primer binding site and the splice donor site in the 5′ untranslated region. The DIS is a short 6-nucleotide (nt) palindromic sequence; it is thought that the palindromic nature of this sequence allows base pairing of two HIV-1 RNAs to initiate RNA dimerization, hence the name dimer initiation signal. Since the first identification of the DIS and its role in RNA dimerization in vitro, many studies have been performed to study its role in HIV-1 RNA dimerization in vivo (2, 7, 12, 26, 31, 35, 41). Three DIS sequences have been reported in multiple HIV-1 isolates: GCGCGC, GTGCAC, and GTGCGC. Mutational analyses indicated that changing the DIS sequence could affect viral RNA packaging or infectivity (1, 5, 23, 24); furthermore, modifying the palindromic length from 6 nt to 2 nt or 10 nt also made the viruses less infectious (1). Replacing the DIS with several other 6-nt palindromes was reported to have varied effects on virus replication (21). Intriguingly, although the DIS appears to play an important role in the dimerization of the wild-type HIV-1 RNA genome, it is not absolutely essential for RNA dimerization or virus replication. Virion RNA from variants containing deletions or mutations that destroy the palindromic nature of the loop sequence are still found as dimers (1). Additionally, these viruses are able to undergo at least limited rounds of virus replication (14, 17, 24, 27-29, 34). The detailed mechanisms of how these DIS mutants can undergo RNA dimerization and virus replication remain unclear.
The DIS may affect more than RNA dimerization during HIV-1 replication. Variants containing GCGCGC in their DIS sequence recombined less frequently with variants containing GTGCAC, indicating that the dimerization mediated by the DIS also affects copackaging of RNA from different proviruses (3, 4). Variants containing an insertion mutation in the SL1 loop were observed in an HIV-1-infected patient with low viral load; this observation led to the suggestion that mutations at the DIS could influence viral pathogenesis (16). The DIS has also been suggested as a target for antiviral therapy (7, 8). Therefore, this short 6-nt sequence may influence more than one aspect of viral replication in cell culture model systems and in the circulating HIV-1 population.
The presence of three naturally occurring DIS sequences indicates that more than one sequence can constitute a functional DIS. We sought to determine the diversity of the DIS sequences in clinical isolates and to elucidate the characteristics of sequences that can serve as a functional DIS. In this report, we first examined the sequences of clinical isolates in the GenBank database to determine the diversity of circulating DIS sequences. We then constructed a library of replication-competent subtype B viruses containing a randomized 6-nt sequence at the canonical DIS. Using viruses generated from this library, we performed three independent selection experiments, each for 100 days, and monitored the DIS sequence in the viral population. These results and their implications are described in this report.
To examine the composition of the DIS in the circulating HIV-1 strains, we extracted 2,308 HIV-1 sequences from GenBank into a database through the Los Alamos National Laboratory web interface (30). To examine genetic variations in the DIS region, sequences were aligned with consideration of the flanking sequences, insertions, deletions, and hairpin stem bases by using ClustalX v.2.0.10 (19) and Bioedit v.7.0.9 (13). Although we collected sequences from all three groups of HIV-1 (M, N, and O), most of the samples (n = 2,279) were from group M because there were far fewer group N or O sequences in the database. The group M sequences that were extracted included those from 8 subtypes (A to D, F to H, and J), 27 circulating recombinant forms (CRFs) (01 to 16, 18 to 20, 23, 24, 27 to 29, 31, 33, 42), and 40 different unique recombinant forms (URFs) (25). Analysis of these 2,308 sequences revealed the presence of 36 different DIS sequences (Table (Table1).1). The majority of the examined sequences contained one of the sequences GCGCGC and GTGCAC, which are considered the “wild-type” sequences for subtype B/D and other non-B/D subtypes, respectively, although a minor portion (0.8%) of the subtype B/D isolates had GTGCAC and a few non-B/D isolates (0.8%) had GCGCGC. Additionally, a frequently observed and previously reported sequence, GTGCGC, was also evident as a distant third prevalent sequence. The remaining 3% of the circulating strains had DIS sequences that are not as common; 33 different sequences were identified from 64 virus isolates. Some of the sequences resembled the “wild-type” sequences and could be derived by a base substitution or a deletion. Twelve of 33 contained GCGC and were similar to the “wild-type” B/D sequence; for example, GCGC, GGGCGC, and GCGCAC (listed as ID 5, 7, and 8, respectively, in Table Table1).1). Similarly, 11 of the 33 sequences resembled the other “wild-type” DIS sequences, such as GTGCAT (ID 10) and GTGCTC (ID 14). However, there were DIS sequences that did not resemble any of the “wild-type” sequences; some of these sequences were palindromes, such as GTAGCTAC (ID 12) and ACGCGT (ID 22), whereas others did not have apparent palindromes, such as CCCCCA (ID 18) and GGGCGA (ID 36).
These results revealed that although the “wild-type” DIS sequences dominate the clinical isolates, there are strains with other sequences. Therefore, it is likely that viruses with alternative DIS sequences can also survive and replicate in the human population. To study the requirement of DIS sequences and how such sequences evolve, we performed the selection experiments described below.
Using primers containing randomized sequences, overlapping PCR, and cloning of the PCR DNA fragments, we generated a plasmid library that was based on NL4-3 but was heterogeneous at the DIS. The possible combination of the DIS sequences is 46 or 4,096. To cover this variation, we constructed a library containing approximately 20,000 colonies generated from 20 independent ligation and transformation reactions; plasmid DNA was isolated from this pool by using standard protocols.
To characterize the library, we sequenced the library DNA, which showed randomization in the DIS region but not the surrounding sequences (Fig. (Fig.11 A). We also isolated DNA from randomly selected bacterial colonies and sequenced the plasmids to identify the DIS sequences. Of the 160 clones sequenced, 131 different sequences were identified in the DIS regions, including 5 clones that had the GCGCGC sequence; however, we did not observe clones that contained GTGCAC or GTGCGC in these 160 clones. We have also analyzed the number of clones that have DISs similar to these three sequences, which is defined by containing at least four of the same bases in the DIS; of the 160 analyzed clones, the DIS of 4, 3, and 1 clone(s) are similar to GCGCGC, GTGCAC, and GTGCGC, respectively. Data obtained from these 160 clones were used to compile the base composition of the 6 nt positions in the DIS. The averaged distributions for A, C, G, and T for each position are as follows: for the most 5′ nucleotide (position 1), 21.8%, 34.3%, 15.6%, and 28.1%, respectively; for position 2, 24.3%, 39.3%, 16.8%, and 19.3%, respectively; for position 3, 16.2%, 40.6%, 21.8%, and 21.8%, respectively; for position 4, 14.3%, 34.3%, 28.1%, and 23.1%, respectively; for position 5, 16.8%, 30.6%, 18.1%, and 34.4%, respectively; and for position 6, 23.7%, 33.7%, 16.2%, and 26.2%, respectively. The C nucleotide is slightly overrepresented in all six positions; this is probably due to the enhanced incorporation of the C base during the synthesis of the randomized primer.
To perform the selection experiment, we transfected the plasmid library DNA into human 293T cells, harvested the virus produced from transfection, and infected Hut/CCR5, a human T cell line (Fig. (Fig.1B).1B). Maintenance of cells, transfection, and infection were performed as previously described (40). Viruses were harvested 5 days later and used to infect fresh target cells at low multiplicities of infection, generally between 0.1 and 0.2. To avoid potential bottlenecks, the experiment was performed such that in each passage, the initial infection included at least 200,000 independent events. The level of initial infection was monitored by detection of Gag/CA expression in the target cells; intracellular p24 staining and detection were performed using a phycoerythrin-conjugated anti-HIV-1-CA antibody (Beckman Coulter) and flow cytometry analyses 24 h postinfection. At the time of each virus harvest, a portion of the virus stock was used to isolate virion RNA, which was reverse transcribed and amplified, and the PCR product was sequenced. Compared with the library DNA (Fig. (Fig.1A),1A), the viral population showed selection in the DIS sequences (representative results from days 5 and 15 in culture are shown in Fig. Fig.1C).1C). At day 5, the C nucleotide emerged as the major sequence at positions 2, 4, and 6; and at day 15, GCGCGC was clearly the major sequence. These results indicate that although the variant with the GCGCGC sequence constituted only a minor portion of the viral population (~3%), this virus outcompeted most of the other viruses within 15 days and became the dominant variant.
We then modified the library to reduce the frequency of the “wild-type” GCGCGC sequence by digesting the library DNA with the BssHII restriction enzyme, which recognized the GCGCGC sequence. The GCGCGC sequence in DIS is unique in the NL4-3 plasmid, and the BssHII restriction enzyme digestion selectively linearized any plasmids containing the “wild-type” subtype B DIS. After a large-scale DNA digestion was performed, the library was reanalyzed by sequencing 300 randomly selected clones; we did not observe any variants with GCGCGC sequences and the library still maintained its complexity, as the four bases were present at all of the six positions (data not shown).
The modified library plasmid DNA was transfected into 293T cells, and viruses were harvested, clarified, and used to infect Hut/CCR5 cells by the same protocols described in Fig. Fig.1B.1B. Infection of each passage was monitored by flow cytometry analyses, detecting intracellular expression of Gag/CA to ensure that at least 200,000 independent infection events occurred. Analyses of the DNA products generated from RT-PCR of the virion RNA after 15 days in culture revealed that, in contrast to those from Fig. Fig.1C,1C, we did not observe the emergence of a dominant sequence (Fig. (Fig.1D).1D). Sequencing of the total PCR product (“bulk” sequencing) can reveal only the presence of a dominant variant; to better analyze the viral population after various passages, a region of the viral genome including the DIS sequence was amplified by reverse transcription-PCR (RT-PCR); the resulting products were inserted into the pBluescript plasmid. Approximately 100 colonies (between 68 and 115) were randomly selected and sequenced to identify the DIS sequence present in the viral population at each analyzed time point; these results were then compiled and analyzed.
We performed three independent selection experiments, each for 100 days. Analyses of the viral population in the culture revealed a decrease in the diversity of the DIS sequences during the selection experiments; the results are summarized in Fig. Fig.2A.2A. These results are shown as a percentage of diversity, which is calculated by dividing the number of different sequences observed by the total number of clones analyzed. For example, if 100 clones were analyzed and 100 different DIS sequences were identified, then the diversity at this time point is 100%. As shown in Fig. Fig.2A,2A, the viral populations on day 5 all contained very high diversity (88 to 99%). During the passage of the viral population, the viral diversity decreased; this decreased diversity appeared to be more rapid in the beginning of the selection and slow down at a later point in the selection. For example, one can compare the changes in diversity in three 30-day periods. From day 10 to day 40, diversity decreased from 80 to 90% to 38 to 42%; from day 40 to day 70, diversity decreased from 38 to 42% to 18 to 26%; and from day 70 to day 100, diversity barely decreased from 18 to 26% to 18 to 21%. The decrease in diversity in the viral population indicates that not all sequences can function equally as a DIS; certain variants were amplified, while others were eliminated during the viral replication process. Additionally, the diversity of the viral population did not change drastically between day 70 and day 100 but hovered around 20%; this finding suggests that multiple variants containing different DIS sequences can sustain long-term growth in the viral population.
It is thought that DIS sequences mediate the initial RNA dimerization event; this proposed function suggests that the palindromic nature of the sequence is important. To address whether palindromic sequences are selected at the DIS, we examined the proportions of variants in the viral population containing DIS sequences with palindromes longer than 4 nt. The two most-predominant HIV-1 DIS sequences in the circulating viruses were 6-nt palindromes. The third most-prominent DIS sequence, GTGCGC, could form 6-nt palindromes in two different ways, either by GU pairing between the nucleotides at positions 2 and 4 or by pairing with the 3′-flanking nucleotide, which is often an A (Table (Table1,1, ID 3). Therefore, when we examined the selected sequences for palindromes, the entire loop was analyzed (AA-NNNNNN-A [shown in Fig. Fig.1A]),1A]), and the GU pairing was also taken into consideration. Results from three independent selection experiments (designated A, B, and C) are summarized in Fig. Fig.2B.2B. In all three experiments, the majority of the variants (58 to 76%) had nonpalindromic DIS sequences at day 5. However, these variants decreased quickly; at day 30, very few variants (3 to 8% of the viral population) had nonpalindromic DIS sequences. In contrast, the proportion of variants with palindromic DIS sequences increased sharply during selection and became the dominating population after 30 days in culture (Fig. (Fig.2B2B).
To examine the selected DIS sequences in more detail, the proportions of variants containing 4-, 6-, and 8-nt palindromes are shown in Fig. Fig.2C.2C. Although we randomized only 6-nt sequences, we did observe several 8-nt palindromes during the selection. One type of 8-nt palindrome maintained the 6-nt length in the randomized region but had TT as the last 2 nt, which together with the AA located 5′ to the DIS formed an 8-nt palindrome, such as AA-CCGGTT-A (underlining indicates palindromic sequences). Another type of 8-nt palindrome appeared to contain insertions in the randomized region, such as AA-GCGCGCGC-A. Although we observed variants with 8-nt palindromes in all three experiments and they were detected at multiple time points in two experiments, these variants occupied only a small percentage of the population (1 to 5%) and eventually decreased to an undetectable level (Fig. (Fig.2C2C).
Variants with 6-nt palindromes were present at a lower level in the viral population at day 5 (7 to 18%) but increased rapidly to a plateau between day 30 and day 60 and were sustained at high proportions until the end of the selection period (76 to 87% at day 100). Unexpectedly, variants with 4-nt palindromes were also sustained throughout the selection period, albeit at lower levels than those with 6-nt palindromes. These variants with 4-nt palindromes constituted 17 to 22% of the viral population at day 5 and 13 to 24% of the population at day 100 (Fig. (Fig.2C2C).
We then examined whether the selected 6-nt palindromes contained multiple sequences; the diversity of the variants containing 4-nt or 6-nt palindromes are shown in Fig. Fig.2D.2D. The diversity of the 6-nt-palindrome-containing variants was very high at day 5 (83 to 100%), indicating that most of the variants contained different palindromic sequences. This diversity decreased with time and then appeared to stabilize around day 70. At day 100, the diversity was 15 to 19%; in each experiment, we identified 67 to 79 variants that contained 6-nt palindromes, and there were 12 to 13 different palindromic sequences. Similarly, variants with 4-nt palindromes had a higher diversity at day 5 (87 to 100%) than at day 100 (19 to 55%) (Fig. (Fig.2D).2D). One of the reasons for the variation in the 4-nt palindrome diversity being larger than that of the 6-nt palindrome diversity is the relatively low number of variants with 4-nt palindromes; at day 100, 11 to 21 variants had 4-nt palindromes, and four to six different palindromic sequences were identified. Together, these results indicated that multiple variants emerged at the end of the 100-day selection period in populations of viruses with 4-nt or 6-nt palindromic DIS sequences.
To examine the specific sequences that emerged during the selection, we first determined whether the three most common DIS sequences observed in the HIV-1 database were selected during the experiments. These sequences, GCGCGC, GTGCAC, and GTGCGC, were observed in all three experiments, albeit at different frequencies (Fig. (Fig.3A).3A). Variants containing GTGCAC were detected between day 5 and day 40, remained at low percentages (~1 to 8%) from day 40 to day 80, and were evident in two of the three experiments at the end of the selection period as a minor variant (~4%). Variants containing GTGCGC showed very similar patterns; they were detected between day 10 and day 30, remained at low percentages (1 to 7%) for multiple days, and were detectable at 2 to 4% in two of the three experiments at day 100. In contrast, viruses with GCGCGC appeared to be more robust; they were observed between days 5 and 10 and remained throughout the selection time as one of the major variants, although their proportions in the viral population varied in the three experiments (9 to 64% at day 100). These results indicated that variants with the three prevalent DIS sequences were selected in our experiments, although the variant with the GCGCGC sequence appeared to have a clear advantage over the variants with the GTGCAC or GTGCGC sequence.
These three “wild-type” sequences all have more GC base pairings than AT base pairings. We then examined whether the GC content of the 6-nt palindromes is important. Variants containing 6-nt palindromes with no GC dyad (such as AAATTT), one GC dyad (such as AAGCTT), two GC dyads (such as ACGCGT), or three GC dyads (such as GCGCGC) are shown in Fig. Fig.3B.3B. Variants containing 6-nt palindromes without any GC dyad were observed at earlier time points in all three experiments; however, their frequency diminished quickly, and they were not observed at the end of the selection. Variants with one GC dyad were a major component of the 6-nt palindrome-containing variants at earlier time points in all three experiments; however, their proportion also decreased with time, and they were either not detected or observed at low frequencies toward the end of the selection period. In contrast, variants containing two GC dyads or three GC dyads were amplified and sustained throughout the selection period. At day 100 of all three experiments, all variants with 6-nt palindromic sequences in their DIS had either two or three GC dyads. The dynamics of 6-nt variants with no or one GC dyad are in sharp contrast with those of variants with two or three GC dyads; these results indicated that the high GC content of DIS is critical to its function.
When 6-nt palindrome-containing variants already contain a high GC content, having an extra GC pair does not appear to generate a strong advantage; variants with two GC dyads and those with three GC dyads both appear to replicate well. In experiments A and B, there were more 6-nt-palindrome-containing variants with two GC dyads than those with three GC dyads toward the end of selection. However, in experiment C, we observed more 6-nt-palindrome-containing variants with three GC dyads than those with two GC dyads.
The importance of the GC pairing led us to question whether the position of the non-GC pairing is important among the two-GC-pair palindromes. The non-GC base pairing can occur at the first position (such as in TGGCCA), the second position (such as in GTGCAC), or the third position (such as in GGTACC); the distributions of variants with these three positions are shown in Fig. Fig.3C.3C. Variants with non-GC pairing in the second or third position do not seem to thrive during the selection, whereas variants with non-GC pairing in the first position seem to dominate. This result was very surprising, as one of the “wild-type” sequences, GTGCAC, has an AT pairing in position 2.
The most dominating 2-GC motifs have A/T at position 1 and G/C at positions 2 and 3 (A/T[C/G]4A/T). There are eight possible combinations: ACCGGT, ACGCGT, AGCGCT, AGGCCT, TCCGGA, TCGCGA, TGCGCA, and TGGCCA. Except for TGGCCA, we detected the other seven sequences as the DIS at multiple time points in each selection experiment. Variants with TGGCCA were detected only at two different time points in experiment B. Interestingly, the distribution of a particular variant varied between experiments. For example, TCGCGA was the most dominant variant in experiment A; this variant was detected in the other two experiments but only in small proportions in both experiment B and experiment C. Similarly, ACCGGT was the most dominant variant in experiment B; this variant was detected in both experiment A and experiment C but only in small proportions. Therefore, although it is clear that the A/T(G/C)4A/T motif was selected during the timeframe of these experiments, different variants can emerge in separate experiments.
Most of the HIV-1 isolates have 9 nt in the loop of SL1; the sequences are AA-GCGCGC-A in NL4-3. Therefore, the A/T(C/G)4A/T sequence can be located at three different positions in the loop; this motif (underlined sequence) can be in the canonical position (or 0 position), such as AA-ACGCGT-A, at the −1 position, such as AA-CGCGTC-A, or at the +1 position, such as AA-CTCGCG-A. To examine whether the position of the A/T(C/G)4A/T sequence in the loop affects its function, we analyzed the dynamics of variants containing the motif at the −1, 0, and +1 positions during the selection. As shown in Fig. Fig.3D,3D, variants containing motifs at all three positions were observed in all three experiments. Variants with the motif at the canonical position were observed most frequently in experiment A but not in experiment B or C. These results failed to show a strong association between a particular position and preference during HIV-1 replication. Therefore, having the palindrome at its canonical position is not an absolute prerequisite for the function of the DIS.
As shown in Fig. 2C and D, there is a sustained presence of the 4-nt-palindrome-containing variants in the viral population throughout the 100-day selection period. Our analyses of 6-nt palindromes demonstrated that low GC content is not favored during selection. We therefore examined the GC content in the variants containing 4-nt palindromes and found that most of the 4-nt palindromes that survived toward the latter half of the selection contained two GC dyads (Fig. (Fig.4A4A).
We then examined individual palindromes; although most of the variants retained the 6-nt length in the randomized region, a variant that contained a deletion of the DIS sequence, AA-GCGC-A, was evident in all three selection experiments, each with two to four time points and at 1 to 4% of the population. However, this variant was not detected toward the end of the selection period in all experiments; it is possible that the GCGC variant was generated from recombination or deletion events involving the GCGCGC sequence.
One frequently observed variant contained the CGCGCA sequence in the randomized region; variants with other related sequences, such as CGCGCT, TCGCGC, and ACGCGC, were also observed. These variants can form base pairing in two different ways, using CGCG from positions 1 to 4 or GCGC from positions 2 to 5. To address whether the ability to form two different types of base pairing is a selective advantage, we compared the frequencies of variants containing 4-nt palindromes but with an additional C at the end ([A/T/C]CGCGC or CGCGC[A/C/T]) with variants containing 4-nt palindromes including CGCG but without two different base pairing, such as TCGCGT. The results are summarized in Fig. Fig.4B;4B; as shown, there are more variants containing CGCGC than those containing CGCG. We also observed a similar trend with the GCGCG sequence and the GCGC sequence; however, the differences were far less distinct, as there were fewer GCGCG-containing variants (data not shown). These results suggest that the ability to form two different 4-nt palindromes can facilitate DIS function.
Taken together, the results from these three selection experiments indicate that GC-rich palindromic sequences, either 6-nt or 4-nt palindromes, are selected for DIS function. GC pairing is more stable than those of AU and GU. These features further support our current view that the DIS performs an important intermolecular base-pairing function during viral replication. Furthermore, multiple variants can fulfill this function during the 100-day selection period.
To directly examine the replication kinetics of the variants containing different DIS sequences, nine mutants containing different DIS sequences were constructed based on NL4-3 by site-directed mutagenesis by using overlapping PCR and cloning. The general structures of these mutants were characterized by restriction enzyme mapping, and the regions amplified by PCR were confirmed by DNA sequencing to avoid inadvertent mutations. These plasmids were separately transfected into 293T cells, viruses were harvested from transfected cells and clarified by passing through a 0.45-μm filter, and the amounts of viruses were quantified by the amount of capsid protein (p24). To measure the growth curve of the DIS variants, viral stocks containing equal amounts of p24 were used to infect Hut/CCR5 cells. Viruses were removed 6 h postinfection, fresh medium was added to each culture, and cells were returned to an incubator at 37°C with 5% CO2; this is defined as day 1. At days 3, 5, 7, 9, 11, and 13, two-third of each culture was removed and replaced with fresh medium. HIV-1 production was determined by measuring the amount of capsid (p24) in the cell-free medium.
The growth curves of five variants, each containing a 6-nt palindromic sequence in its DIS, are shown in Fig. Fig.5A.5A. Similar growth curves were observed in all five variants, including those that contained one of the three major DIS sequences, GCGCGC (NL4-3), GTGCAC, and GTGCGC, and two variants that had ACCGGT or TCGCGA, both of which emerged in all three selection experiments. In all five variants, HIV-1 particle production peaked at day 5 to similar levels and decreased at days 7 and 9 and became negligible at day 11. We also examined the growth curves of three variants containing 4-nt palindromes in their DIS sequences (Fig. (Fig.5B),5B), two with GC-rich sequences and one with an AT-rich sequence. Variants with CGCGCA and ACCGGA, which were observed in all three and two of the three selection experiments, respectively, have growth curves similar to that of NL4-3. Compared with NL4-3, the growth curve of the variant with AAAATT was delayed; this variant was identified during the characterization of the library but was not observed in our selection experiments. We have also tested the growth curve of two variants that contained nonpalindromic DIS sequences, both of which were observed during the characterization of the library, and found that both variants have delayed growth curves. Events occurring in these growth curve experiments are close to those in one passage of the selection experiments; therefore, it can distinguish variants with larger replication differences but does not have sufficient sensitivity to detect subtle differences (38, 39). These results are in agreement with those of our selection experiments; mutants that emerged during the selection experiments have better replication kinetics than those from mutants that did not emerge. Variants with GC-rich palindromic sequences replicate better than do variants with nonpalindromic or AT-rich 4-nt palindromic sequences.
HIV-1 group M variants, which include most of the circulating strains, are thought to have originated from a single zoonotic transmission event that introduced simian immunodeficiency virus SIVcpz into the human population (42). Recent studies suggest that HIV-1 group O variants and a newly identified group P variant are likely to have originated from SIVgor, a virus closely related to SIVcpz (37, 44, 45). The identified SIVcpz and SIVgor viruses have GTGCAC in their DIS. Analyses from the currently identified HIV-1 variants indicated that most of the non-B/D subtype viruses, including the subtype C virus, have GTGCAC in their DIS, whereas most of the subtype B or subtype D viruses have GCGCGC in their DIS (Table (Table1).1). Currently identified group O, group N, and the proposed group P HIV-1 also contain GTGCAC. These results suggest that the GCGCGC sequence emerged mainly during or after the divergence of the subtype B variant. We used NL4-3, a subtype B molecular clone, as the backbone to construct our randomized library. It is conceivable that coevolution and adaptation have occurred in the viral sequence such that the GCGCGC sequence is the most functional DIS for this subtype B backbone. This possibility can explain the surprising results that variants with GTGCAC, a “wild-type” DIS sequences, were observed in all three experiments but did not increase in frequency with selection passages.
Although we have observed that variants with many 6-nt palindromic sequences can sustain replication in our selection experiments, our database search revealed that most of the subtype B viruses have GCGCGC in their DIS. These seemingly contradictory results can be explained by founder effects. The power of the founder effects can also be seen in our selection experiments; we performed selection experiments with viruses derived from a library containing ~3% GCGCGC variants and a library containing a reduced amount of GCGCGC (estimated to be less than 0.4%). We observed that the GCGCGC variant quickly took over the culture, even when it started at only ~3% of the library. In contrast, when we used the viruses from the library with the reduced amount of GCGCGC variants, we observed that many other variants also emerged during the selection. It is of interest to note that we did observe two variants in the human population that resembled some of our selected non-wild-type 6-nt palindromes: a variant with ACGCGT and a variant with G-TGCGCA (Table (Table1,1, ID 22 and ID 13, respectively).
To our surprise, in addition to variants containing 6-nt palindromes in the DIS, we found that variants with 4-nt palindromes were also sustained during the selection. As shown in Fig. Fig.5B,5B, we found that variants containing CGCGC were amplified and sustained in all three experiments. We also found that variants with ACGCGC were observed in several clinical isolates (Table (Table1,1, ID 4). The difference between ACGCGC and GCGCGC is a G-to-A substitution at position 1 of the DIS sequence. The flanking sequence of this base does not generate a preference site for human APOBEC3G or APOBEC3F (GG or GA, respectively), suggesting that the presence of multiple ACGCGC variants is unlikely to be attributed to G-to-A hypermutation via human APOBEC function. Together, these results support the observation that variants with CGCGC can sustain HIV-1 replication.
Using a Rous sarcoma virus (RSV)-based system, the sequence of the “kissing loop,” which is equivalent to an HIV-1 DIS, was randomized and selected (6). After five rounds of passages, all of the recovered 23 sequences contained 6-nt GC-rich palindromes. The GCGCGC sequence that HIV-1 favored was observed in round 2 but not round 5 of RSV selection, whereas the GGGCCC sequence favored by RSV was not observed in our study. Therefore, although the GC content and the palindromic nature of the DIS sequence are important for the functions of both HIV-1 and RSV DISs, there are other currently unknown selection pressures that dictate the precise sequence.
We thank Anne Arthur for her expert editoral help; Vinay K. Pathak for discussions and intellectual contributions throughout this work; and Vinay K. Pathak and Alan Rein for critical reading of the manuscript.
This research was supported by the Intramural Research Program of the National Institutes of Health, National Cancer Institute, Center for Cancer Research.
Published ahead of print on 21 April 2010.
§The authors have paid a fee to allow immediate free access to this article.