|Home | About | Journals | Submit | Contact Us | Français|
HIV-1 sequences in intravenous drug user (IDU) networks are highly homogenous even after several years, while this is not observed in most sexual epidemics. To address this disparity, we examined the human immunodeficiency virus type 1 (HIV-1) evolutionary rate on the population level for IDU and heterosexual transmissions. All available HIV-1 env V3 sequences from IDU outbreaks and heterosexual epidemics with known sampling dates were collected from the Los Alamos HIV sequence database. Evolutionary rates were calculated using phylogenetic trees with a t test root optimization of dated samples. The evolutionary rate of HIV-1 subtype A1 was found to be 8.4 times lower in fast spread among IDUs in the former Soviet Union (FSU) than in slow spread among heterosexual individuals in Africa. Mixed epidemics (IDU and heterosexual) showed intermediate evolutionary rates, indicating a combination of fast- and slow-spread patterns. Hence, if transmissions occur repeatedly during the initial stage of host infection, before selective pressures of the immune system have much impact, the rate of HIV-1 evolution on the population level will decrease. Conversely, in slow spread, where HIV-1 evolves under the pressure of the immune system before a donor infects a recipient, the virus evolution at the population level will increase. Epidemiological modeling confirmed that the evolutionary rate of HIV-1 depends on the rate of spread and predicted that the HIV-1 evolutionary rate in a fast-spreading epidemic, e.g., for IDUs in the FSU, will increase as the population becomes saturated with infections and the virus starts to spread to other risk groups.
The evolutionary rate of human immunodeficiency virus type 1 (HIV-1) has been shown to differ in different genes as well as in different regions of the same genes. Using molecular-clock analyses, the rate of evolution of the env V3 region was estimated to be from 2.3 × 10−3 to 6.7 × 10−3 substitutions site−1 year−1, while the rate for the whole env was estimated to be between 1.0 × 10−3 and 1.7 × 10−3 substitutions site−1 year−1 (25, 26, 33, 54, 63). Stronger diversifying pressure by the immune system on the V3 region is thought to be responsible for the higher rate, while genes, such as gag and pol, that are under less diversifying and more purifying selection pressure show lower rates (25, 33, 54).
It has been shown that the selective pressure of the human immune system is one of the driving forces of HIV-1 intrahost evolution (42). In addition to this, the intrahost evolutionary rate has been suggested to be influenced by the initial fitness of the HIV-1 population, depending on the route of infection (40). Strikingly, HIV-1 intravenous drug user (IDU) sequences, such as those collected from the region of the former Soviet Union (FSU), show very high genetic similarity to each other. This high similarity was not observed in HIV-1 sequences spreading through the sexual route. Previous studies have suggested that the intrahost evolutionary rate of HIV-1 either does (18) or does not (28, 31) differ between patients belonging to the IDU and sexual risk groups.
Currently, Eastern Europe and central Asia are the regions of the world with the fastest-growing HIV epidemics. The fast spread of the virus in the FSU region is closely linked with a rise in injecting drug use after the collapse of the Soviet Union in the 1990s, connected to the increase and diversification of drug trafficking routes through Central Asia and eastern Europe from the world's largest opium producer, Afghanistan (58). Until 1995, countries of the FSU experienced only a few hundred cases of HIV-1 infection, most of which were associated with sexual transmissions. The first outbreaks of HIV-1 among IDUs in the FSU were reported in 1995 in southern Ukraine (48), followed by rapid spread of the virus in this risk group in the Russian Federation (3, 5) and its neighboring countries Belarus and the Republic of Moldova in 1996 (41, 50). In 1997, the virus had spread to the Baltic region, inducing a fast-growing epidemic among IDUs in Latvia and later Estonia (2, 15, 64). Other regions of the FSU also started to experience the rapid growth of IDU HIV-1 infections, such as Kazakhstan, with more than 90% of HIV-1 cases belonging to this risk group (6), and Uzbekistan, with an exponential rise in HIV-1 prevalence among IDUs between 2001 and 2002 (29).
In South and Southeast Asia, a sudden rise in HIV-1 prevalence to an epidemic proportion was first observed in Thai IDUs in 1988 (61), mainly involving subtypes B and B′. Thailand, like Laos, Myanmar, and China, is a part of the Golden Triangle, a main heroin trafficking route in Southeast Asia. Consequently, only a year later, high HIV prevalence rates of the same subtype were detected among IDUs in Myanmar and the Yunnan province of China (19, 30, 65). Then, an increase of HIV-1 CRF01_AE infections was detected among female sex workers in Thailand, and soon thereafter, CRF01_AE spread to IDUs, suggesting an early and complicated interconnection between epidemics among drug users and commercial sex workers (49). By 1995, CRF01_AE was widespread among Thai IDUs (20, 56), and IDU populations in other countries, such as Vietnam and Malaysia, had experienced rapid increases in CRF01_AE infections (21, 43, 46). In 1996 and 1997, an outbreak of CRF01_AE, genetically highly similar to the Vietnamese strain, was detected among IDUs in China (22). Today, the HIV-1 epidemic in South and Southeastern Asia is complicated not only by the interplay of injecting drug use and commercial sex work but also by the circulation of numerous HIV-1 subtypes in these risk groups, resulting in the emergence of novel intersubtype and unique recombinant strains (57, 62).
Slower heterosexual spread of HIV-1 can be found in Western Europe and North America, consisting mainly of subtype B. Other mainly heterosexual epidemics are those of the predominant HIV-1 subtypes in Africa, A and C. Subtype A is mostly found in central Africa, while subtype C is predominant in the south and southeast. Although it is hard to estimate how and when these epidemics started, there have been some indications that the spread of subtype C in Africa has been greater than the spread of other subtypes (1, 17, 24, 59).
The high homogeneity of HIV-1 sequences derived from IDU outbreaks throughout the world suggests, thus, that the evolutionary rate of the virus spreading on the population level is lower in the IDU risk group. An introduction of HIV-1 into a social standing network, such as an IDU population, results in a very fast spread of the virus. Moreover, the virus is transmitted shortly after initial infection, possibly before the host immune system exerts a selective pressure on the viral population. This may result in the transmission of viral variants that have not experienced the pressure to evolve and, consequently, are genetically very similar to each other. On the other hand, slow HIV-1 spread in a heterosexually driven epidemic, where the virus spreads only as new social contacts are formed, may involve some amount of intrahost evolution before being transmitted to a new person. Here, we hypothesize that it is not the route of infection, but rather the amount of time between the initial infection and the further transmission of the virus, that determines the evolutionary rate and similarity of sequences in an epidemic.
HIV-1 sequences spanning 260 to 300 bp of the V3 env region were downloaded from the Los Alamos HIV sequence database (www.hiv.lanl.gov). Sequences belonging to subtypes A1 and C from Africa and subtype B from Western Europe and North America were collected from heterosexual populations. Sequences belonging to IDU populations were collected from the FSU subtype A1 IDU epidemic and from two different mixed (IDU and sexual) epidemics from Southeast Asia, those of subtypes B′ and CRF01_AE. Only one sequence from each individual was included in the study. The material is summarized in Table Table11.
All sequences were manually aligned using Se-Al (Se-Al sequence alignment editor; A. Rambaut [http://evolve.zoo.ox.ac.uk/]). Separate neighbor-joining (NJ) trees (PAUP* [Phylogenetic Analysis Using Parsimony and Other Methods], version 4; D. L. Swofford, Sinauer Associates, Sunderland, MA) were constructed using sequences from each epidemic, together with subtype reference sequences downloaded from the Los Alamos HIV sequence database (34), in order to verify the designated subtypes of the sequences. A simple check for recombination between subtypes was performed by dividing each sequence alignment into halves and constructing NJ trees for each half, looking out for sequences that jumped between subtypes, indicating recombination. In order to verify that the collected sequences belonged to the respective epidemics, NJ trees were constructed and it was made sure that there was no geographic or sampling time clustering.
All the trees were constructed on a parallel computing cluster using a maximum likelihood code with a general-time-reversible model with variable rates among sites (T. Bhattacharya and M. Daniels, unpublished) (26). For each epidemic, phylogenetic trees were constructed using sequences sampled at two separate time points, corresponding to a certain time interval (Δt). Trees were made for all possible combinations of sampling years resulting in Δt values from 1 to 17 years. For example, phylogenetic trees representing a Δt of 5 years would be made with sequences sampled in 1992 and 1997, in 1993 and 1998, in 1994 and 1999, and so on. In cases where there were few sequences sampled in one year, these were pooled together with sequences sampled in the following year and analyzed as if they were sampled together, using a weighted sampling time point for that pool. The weighting expression used was (YiNi + YjNj)/(Ni + Nj), where Yi is sampling year i, Yj is sampling year j, Ni is the number of sequences sampled in year i, and Nj is the number of sequences sampled in year j.
We developed a tool for analysis of genetic distances (Δds) between two groups of sequences in a phylogenetic tree. The input file is a phylogenetic tree in Newick format. The taxa in the tree can be divided into two groups, between which Δd is calculated. The Δd between the two groups is calculated as the average genetic distance of group 2 to the root (d2) minus the average genetic distance of group 1 to the root (d1) (Fig. (Fig.1).1). This approach removes all errors in branch lengths prior to time point 1, since only the time and distance between the two defined groups are used. Since the Δd between the groups can differ depending on how the tree is rooted, Δd values for all possible rooting points of the given tree are calculated. The tree rooted in a way that gives a minimum P value, and thus the best separation of group 1 and 2 sequences, is scored as the best tree. This is calculated with Welch's t test, which is designed to provide a valid t test in the presence of unequal population variances. The output file gives Δd, P values, and variances for not only the best-scored root of the tree but all possible rooting points in the given tree. An additional group in the tool is the discard group, where sequences not desired in the rate calculation can be put. In this way, taxa in one phylogenetic tree can be rearranged between groups and reanalyzed in several different ways. The root-and-rate-optimization tool is fast and easy to use. It will be available on the Los Alamos HIV-1 sequence database homepage (www.hiv.lanl.gov).
Sequence evolution follows a Poisson process (68). Thus, the average evolutionary rate () in each of the six different epidemics studied here was estimated using Poisson error correction such that
where R is the evolutionary rate, R = Δd/Δt, calculated for an individual maximum likelihood tree, and var(R) is the expected Poisson error in that evolutionary rate. Δd is the delta distance from the root and rate optimization, and Δt is as described above. The error is given by the following equation:
where v1 is the variance of group 1 distances and v2 is the variance of group 2 distances, as calculated by the root-and-rate-optimization method. This approach corrects for the greater error found in rates from samples collected closer in time to each other. The confidence interval (95%) for was calculated from 100 replicates derived by resampling R with replacement at the same sample size as that for the original data in each epidemic.
To test our root-and-rate-optimization method, we simulated trees with different amounts of information to cover the situations that we encountered in the real data. Random trees with 40 taxa, 20 at sampling time 1 (t1) and 20 at sampling time 2 (t2), were generated using MacClade (version 4 [analysis of phylogeny and character evolution]; W. P. Maddison and D. R. Maddison, Sinauer Associated, Sunderland, MA). Branch lengths were scaled to reflect an expected Δd (Δdexp) between t1 and t2 between 0.001 and 0.1 substitutions site−1, and the fraction of the tree with this distance was 0.2, 0.5, or 0.8. A Poisson error process was used to get realistic branch lengths, assuming a sequence length of 1,000 characters. For each expected rate and tree fraction, 100 simulations were performed (3,300 trees in total).
In order to further validate the results generated by root and rate optimization, we also estimated the evolutionary rates of HIV-1 subtype A1 from the FSU and Africa by using Bayesian Evolutionary Analysis Sampling Trees (BEAST), a program for Bayesian Markov Chain Monte Carlo analysis of molecular data (BEAST v1.4; A. Drummond and A. Rambaut [http://evolve.zoo.ox.ac.uk/beast/]). The substitution rates were generated using a general-time-reversible substitution model with gamma distribution and invariable rates among sites, with Markov Chain Monte Carlo runs of 50,000,000 steps sampled every 1,000 steps and analyzed with Tracer (A. Rambaut and A. J. Drummond [http://evolve.zoo.ox.ac.uk/software.html?id=tracer]) with a discarded burn-in of 10%. Since the number of sequences in each epidemic was too large to analyze simultaneously, 50 sequences from each epidemic were randomly sampled 10 times and the average evolutionary rate was estimated for each epidemic. In addition, both datasets were tested using three different clock models available in BEAST 1.4: the strict clock, the relaxed uncorrected exponential clock, and the relaxed uncorrected lognormal molecular clock (13). Furthermore, two different a priori assumptions for population growth were used, coalescent constant size and coalescent exponential growth.
We developed a structured epidemiological model to investigate differences in the evolutionary rates in fast epidemics and in slow epidemics. The model is an abstraction of the complex processes involved in transmission of HIV, using a standard susceptible-infected-population model to track the infection in different population groups (7). At each time point, we are interested in calculating the average divergence since the start of the epidemic across all infected individuals. Thus, we simulated a structured population where infected individuals are stratified by the divergence (since the beginning of the epidemic) of the infecting virus and the time since the individual's infection. For instance, at any given time, we know the number of individuals (Ixy) who were infected by a virus that had diverged for y (say, 3) years before infection and who themselves have been infected for x (say, 2) years. For this group, the virus has diverged for 5 (x + y) years since the beginning of the epidemic. We then follow the transitions occurring among the infected groups (different x and y values) as time goes by and new infections arise. For each year of the epidemic, the equations that describe the epidemic are as follows:
Here, π is the influx of new susceptibles (S), which was chosen as μ × S0, where μ is the natural mortality (0.001 year−1) and S0 is the susceptible population at the start of the epidemic. δx is the infected-individual death rate, which depends on the time since infection (x) and is set as in reference 11. And λx is the infection rate, which during the initial 6 months of the infection is 10 times higher for drug users than for heterosexual transmission (10−4 year−1 per person early on for drug users and 10−5 year−1 per person for heterosexual transmission throughout and also for drug users after the first 6 months). In addition, we tested a situation with a higher infection rate during the first 6 months in the heterosexual population (10−4 year−1 per person early on for drug users, 10−5 year−1 per person early on for heterosexual transmission, and 10−6 year−1 per person for both of the epidemics after the initial 6 months). We start the epidemics with 100 infected people, we let y range between 0 and 15, and the total number of infected individuals is given by .
The main assumptions of the model are as follows: (i) initially, over the first 6 months, when the immune pressure is low, the viral evolution is negligible; (ii) after that, the virus evolves and linearly diverges within an infected individual from the initial virus at 1% per year; (iii) the initial infection rate for drug users, over the first 6 months, is 10 times higher than that for heterosexual transmission; and (iv) after that time, the infection rates are the same for both populations. Also, we note that the within-individual divergence rate chosen (1% per year) is the same for both drug users and sexual transmission and is somewhat arbitrary, although it was chosen for consistency with the empirical data. The model was implemented with programming language C, and solution of the equations was carried out by a fourth-order Runge-Kutta method.
We calculated the average divergence of the virus in the population by summing the viral divergence (x + y) for all groups (Ixy), weighted by the fraction of all infected individuals (I) that Ixy represents. Thus, the divergence d at any time point is given by the following equation:
In order to ensure that all the sequences used in our study were of the designated subtype and belonged to the designated epidemic, NJ trees were constructed for each of the epidemics (data not shown). Sequences that did not group with their designated subtypes or that showed indications of recombination were omitted from the analyses. All sequences used in this study were shown to belong to their respective epidemics (see Appendix 1 for accession numbers), and there was no subclustering within an investigated epidemic that correlated with time or geography. As shown in Fig. Fig.1,1, sequences at t1 and t2 were not separate clusters, i.e., t2 leaves were mixed with t1 leaves, and thus, we are confident that all samples belonged to the same epidemic. The resulting selection of sequences (n = 1,336), investigated Δts, and trees (n = 332) are indicated in Table Table11.
Tree simulations showed that the root-and-rate-optimization method accurately estimated the Δd between two groups of sequences in a tree. When the Δdexp in the simulated trees was 0.01 or above, the method estimated a correct Δd (Fig. (Fig.2).2). When Δdexp was below 0.01, the method tended to somewhat overestimate Δd, with the overestimation being greater at a lower Δdexp. However, there was less overestimation for greater fractions of the tree containing Δd. In trees simulated with a perfect molecular clock (infinite long sequences), our method always found the correct root and rate.
When plotting Δd and evolutionary rates generated from each maximum likelihood calculation against Δt, we observed stochastic error-like noise for the evolutionary rate in each epidemic. These fluctuations for subtype A1 from the FSU and Africa can be seen in Fig. Fig.3.3. The evolutionary rate fluctuated around the true value, with the variation being greater the closer in time the samples were to each other. Eventually, the rate stabilized to its true value. Interestingly, fluctuation appeared to correlate with evolutionary rate in that higher evolutionary rates had larger variations, supporting a Poisson-like process. Thus, a Poisson-corrected average evolutionary rate was calculated for each epidemic, with correction for the greater error present in rates for trees with low Δt values (Table (Table2).2). Generally, the fluctuations of evolutionary rate stabilized after an approximate Δt of 3 to 4 years, suggesting that a period of at least 3 to 4 years between collection of samples is needed for accurate estimation of an evolutionary rate on the population level. An exception was subtype B, with an unusually long stabilization time of approximately 10 years (data not shown).
The evolutionary rate of the V3 region of HIV-1 subtype A1, spreading rapidly in the IDU population of the FSU, was calculated from samples collected between 1996 and 2002. Maximum likelihood trees were calculated, and the roots were optimized using our t test method (Table (Table1).1). Over increasing Δts (1 to 6 years), a slow increase in Δd was observed (Fig. (Fig.3a),3a), resulting in an average evolutionary rate of 2.02 × 10−3 substitutions site−1 year−1 (Table (Table2).2). There was no sign of this rate changing over the studied time period (Fig. (Fig.3b3b).
The evolutionary rate of the V3 region of subtype A1 virus, spreading in a slow heterosexual manner in Africa, was calculated for the period between 1987 and 2004 based on maximum likelihood trees and root optimizations (Table (Table1).1). As in the fast-epidemic case, Δd increased with increased Δt (Fig. (Fig.3a).3a). The average rate was estimated to be 16.9 × 10−3 substitutions site−1 year−1, that is, 8.4 times higher than the rate of the fast-spreading virus of the same subtype in the FSU (Table (Table2).2). The rate of this virus was the highest one observed among the epidemics studied here. The evolutionary rate over Δt for this virus is shown in Fig. Fig.3b3b.
The epidemics of subtypes B′ and CRF01_AE in South and Southeastern Asia were classified as mixed epidemics, since these subtypes spread among IDUs, commercial sex workers, and their customers, using both standing social networks and forming of new contacts. The average V3 evolutionary rate of B′ spreading in this geographical region between 1991 and 1996 was found to be 10.7 × 10−3 substitutions site−1 year−1. The evolutionary rate of CRF01_AE for the period between 1994 and 2002 was found to be 8.3 × 10−3 substitutions site−1 year−1 (Table (Table2).2). Interestingly, the evolutionary rates for these mixed epidemics were between the evolutionary rates for the more clearly defined epidemics for IDUs in the FSU and heterosexual transmission in Africa.
Although HIV-1 subtype C in Africa is mainly spreading heterosexually, it has been shown that the rate of spread for this virus was somewhat higher than that for other subtypes in Africa (17, 24, 59). The average V3 evolutionary rate of the virus collected from samples from this region between 1987 and 2004, as estimated by 130 maximum likelihood trees, was 9.7 × 10−3 substitutions site−1 year−1 (Table (Table2),2), i.e., an evolutionary rate somewhat lower than what was observed in the African A1 epidemic, agreeing with a somewhat faster spread. The average evolutionary rate for the hetero- and homosexual epidemic of subtype B from Europe and North America in the period between 1981 and 2002 was estimated to be lower than those for the other subtype epidemics (4.3 × 10−3 substitutions site−1 year−1). We also attempted to estimate the early (1978 to 1984) subtype B epidemic separately, but a shortage of samples unfortunately limited reliable estimates of that rate (1 to 2% per year). Subtype B from the Western world differs from all the other subtypes by the fact that it is subjected, in addition to immune pressure, to selection pressure from powerful antiretroviral treatment. However, if the evolutionary rate was calculated after an unusually long stabilization time (Δt = 10 years), it was estimated to be 9.4 × 10−3 substitutions site−1 year−1, i.e., similar to the other intermediate evolutionary rates. The other studied subtype epidemics did not have a discrepancy between average evolutionary rate and rate after stabilization (data not shown).
BEAST was used to calculate the evolutionary rates of subtype A1 in fast and slow epidemics in the FSU and Africa, respectively. The rates were calculated for two different population growth scenarios, constant and exponential, combined with three different molecular-clock models: strict, relaxed uncorrected exponential, and relaxed uncorrected lognormal. While the 95% highest-posterior-density (HPD) values for several comparisons overlap, and the evolutionary rates of A1 in Africa differed greatly between different models used, significant differences were observed in all models between the evolutionary rates of A1 in the FSU and A1 in Africa (Table (Table3).3). Thus, the rate of the fast-spreading IDU virus was found to be consistently lower than the rate of the virus spreading in a slow heterosexual way. For A1 in Africa, however, the mean posterior values were lowest for the relaxed-exponential-clock model, suggesting the best fit. Here, the evolutionary rates were the highest ones, on average estimated to be 16.6 × 10−3 substitutions site−1 year−1 for the constant population growth and 10.6 × 10−3 for the exponential population growth (Table (Table3),3), thus suggesting a 3.3- to 7.8-times-higher evolutionary rate among virus that spread slower.
Our epidemiological model indicated that the evolutionary rate of a virus spreading rapidly, with most transmissions soon after infection (such as for IDUs), is lower than the evolutionary rate of a virus spreading in a slow heterosexual epidemic. This observation is independent of the specific values used in the model, that is, it occurs whether we consider that most infections occur over the first 3 or 6 months or whether we assume that the transmission rate among drug users in this initial period is 5, 10, or 20 times higher than that for heterosexual transmission. However, as the IDU epidemic aged, its evolutionary rate increased and became similar to the one for heterosexual transmissions, indicating population saturation. This essentially corresponds to moving from a phase dominated by many individuals recently infected to a phase where most individuals have been infected for several years. The same phenomenon was observed when we compared heterosexual epidemics in small and large populations. Thus, a small difference in evolutionary rates was observed between large and small heterosexual populations in the beginnings of these epidemics (Fig. (Fig.4).4). This difference became more obvious when the infection rate for heterosexual transmission was higher during the first 6 months of infection than later on (data not shown).
We have shown that the rate at which HIV-1 spreads in a host population affects the evolutionary rate of HIV-1 in that population. Thus, the resulting rate will be affected depending on how much immune selection the virus has experienced before retransmission. There are three stages in the pathogenesis of HIV-1 within an infected individual (32), during which the pressure of the immune system varies. The initial stage lasts from the time of infection until development of the full immune response. This is a stage of rapid viral replication with low viral diversity and high viral load (10, 66). The second stage starts at the point where the immune response is fully activated and efficiently kills the vast majority of virus variants, lowering the viral load. HIV-1 is now forced to evolve into escape variants, and hence, viral diversity increases (Fig. (Fig.5).5). The second stage is a mostly asymptomatic period that can last for many years. Finally, the third stage starts when the immune system collapses and the patient progresses to AIDS.
In a heterosexual epidemic, where HIV-1 is transmitted from person to person as new contacts are formed (slow spread) and thus mostly in the second stage of infection, we observed an evolutionary rate on the population level that corresponded to the intrahost evolutionary rate or, due to some convergent bottleneck effects as discussed below, a rate slightly lower than that. However, in an IDU epidemic, HIV-1 spreads from person to person very soon after infection, i.e., in the first stage. This would result in a fast spread of highly similar viruses across that population, as observed in sequences collected from IDUs in the FSU. In other words, with a virus spread before immune selection in each infected individual, we see not only a rapid outbreak but also an evolutionary rate on the population level that basically corresponds to a virus that has not been subjected to immune selection, i.e., a rate much lower than that of a virus that has been subjected to selection before it could spread to a new host (Fig. (Fig.5).5). In addition, it has recently been shown that HIV-1 evolution in newly infected patients includes a large proportion of reversions toward subtype consensus (16, 36, 37). Such convergent mutations will also contribute to a lower rate in rapid spread. The low evolutionary rate is observed as long as there are new individuals to infect in the epidemic, i.e., before the population is saturated with HIV. When the infected population gets closer to saturation, the evolutionary rate is expected to increase for two reasons: (i) a higher proportion of newly infected individuals become infected by someone in the second stage of infection, thus giving the population an overall higher rate of evolution (Fig. (Fig.44 and and5),5), and (ii) resampling of individuals already infected for a while becomes more frequent, and thus, the sampled viruses are from the second stage of infection, with the higher intrahost evolutionary rate. Mixed epidemics, such as those found in Southeast Asia, are composed of viruses spreading during both the first and the second stages of infection. In such epidemics, the evolutionary rates are expected and were observed to be intermediate. Furthermore, our epidemiological modeling suggested that the evolutionary rate in the beginning of an epidemic also depends on the susceptible-population size. This is essentially because the exponential spread is proportional to the population size. Thus, the rate of spread is somewhat higher in the beginning of an epidemic in a large population than in a small one, which is reflected in the lower evolutionary rate of the virus spreading in the large population.
It has been suggested that bypassing the mucosal barriers by injecting HIV-1 may remove selective mechanisms (18, 60). Such selective mechanisms may constitute parallel and convergent evolution that would lower the evolutionary rate on the population level or diversify the evolution, increasing the evolutionary rate (9, 39). Since we do see a lower rate for IDUs than for heterosexual transmissions, however, convergent evolution in sexual transmission could not have had a great impact. Also, it was recently shown that coreceptor adaptation was independent of the route of infection (8); thus, such adaptation would affect the rates in both risk groups. Similarly, a mucosal barrier selection or other bottleneck effects occurring in sexual transmission but not in intravenous transmission, resulting in host-specific virus adaptations, would most likely affect only certain amino acid positions. Thus, it is unlikely that any of these effects would overshadow the general diversifying and neutral evolution seen on the population level.
In this study, we used a root-and-rate-optimization method to calculate the evolutionary rates of HIV-1 spreading in different epidemics throughout the world. The molecular clock was originally suggested by Zuckerkandl and Pauling (67), and many later methods have been based on Kimura's neutral evolution theory (23). Our method was inspired by earlier work by Li et al. (38), which also has inspired related methods, such as TipDate (52). The advantages of our method include that it is fast and thus can be used on large data sets, it can use a tree calculated with or without the explicit use of a molecular clock (using any model or method that the user finds adequate), and it gives the possibility of finding and comparing genetic distances and node heights for all possible rooting points in a given phylogenetic tree. Furthermore, the branch lengths prior to the first group of sequences are subtracted in the calculation so that errors in this part of the tree are essentially ignored. For each epidemic investigated here, we divided the samples into as many Δts as was possible, while retaining enough data in each sampling year (usually above 20 sequences per year). This means that some data points (Δts) partly reuse some data, and thus, there is covariance among some data points. The fact that we use large-scale meta-data, however, mostly overcomes any bias that a certain sampling year might have added to the overall results. Similarly, tree distances (node heights) usually contain shared branches, and thus, also in this case there will be a lack of independence among distances based on phylogenetic trees. Note, however, that such branches prior to the first time point are removed in the analysis, so the distances may actually be independent segments in the part that is used for the rate estimate. Strictly speaking, and generally for tree methods, however, since the whole tree is optimized, not even the terminal branches of a tree are fully independent estimates. Importantly, our tree simulations showed that our method was accurately estimating Δds between two groups of sequences in a tree (Fig. (Fig.2).2). However, when the Δd between the two groups was very small (0.01), the method tended to somewhat overestimate the rate. When a molecular clock, whether strict or sloppy, had operated on the studied sequence set, we found that our method was able to find a root that reasonably estimated the rate. Recombination and different types of selection make the accuracy and the value of a molecular clock debatable. Therefore, we compared the results from our method with calculations of an evolutionary rate using BEAST, a software program with which we could test relaxed-molecular-clock models (13). In agreement with the estimates from our method, the relaxed-clock models suggested similar differences in the evolutionary rates of viruses spreading in a slow or rapid manner in IDU or heterosexual transmissions, respectively.
Our results showed that HIV-1 subtype A1 spreading heterosexually in Africa had an evolutionary rate 8.4 times higher than the same subtype spreading rapidly among IDUs in the FSU (Table (Table2).2). The IDU HIV-1 epidemic in the FSU started with the introduction of two HIV-1 strains into networks of IDUs, subtype A in IDUs in Odessa and subtype B in Nikolaev (45). From here, the virus rapidly spread to other regions of the FSU, causing IDU HIV-1 outbreaks (2, 3, 6, 15, 29, 41, 50). Molecular analyses of these outbreaks show, however, that this is mainly a single interconnected IDU epidemic caused by the rapid spread of subtype A virus (4). In contrast, the heterosexual A1 epidemic in central and eastern Africa is characterized by a stabilizing or even decreasing prevalence of HIV infections (1). Although injecting drug use remains the driving force for the rapid spread of HIV-1 in the FSU, the patterns of the epidemic are now starting to change, with an increase in heterosexual transmissions mainly due to the growth of the sex industry. At this point, we predict that the evolutionary rate will increase due to a change in spread patterns to an epidemic that is more similar to the mixed-risk-group epidemics.
In South and Southeast Asia, rapid growth of HIV-1 infections was first observed in Thai IDUs in 1988 (61), mainly involving subtypes B and B′. Shortly after, an increase of CRF01_AE was observed among female sex workers in this region, spreading it further to the IDU population (49). Since many sex workers also belong to the IDU risk group, a complicated interconnection between commercial sex work and injecting drug use in South and Southeast Asia arose, where the epidemics of subtype B′ and CRF01_AE spread simultaneously in both IDU and sexual risk groups, making them mixed epidemics. Consistent with this, the evolutionary rates of B′ and CRF01_AE were estimated to be 10.7 × 10−3 and 8.3 × 10−3 substitutions site−1 year−1, respectively, putting them between the evolutionary rates of subtype A1 in fast spread in the FSU and slow spread in Africa. These intermediate evolutionary rates mirror the mixed nature of these epidemics, where the virus is spreading fast among IDUs and slow by sexual transmissions.
The evolutionary rate of subtype C spreading heterosexually in south and southeast Africa was estimated to be 1.7 times lower than the rate of subtype A1 in central and east Africa (Table (Table2).2). This intermediate evolutionary rate of subtype C is not, however, a result of the virus spreading by different risk groups, as seen in the mixed epidemics of subtypes B′ and CRF01_AE. Instead, it has been shown that the rate of spread for subtype C has been higher than that for other subtypes in Africa (1, 14, 17, 24, 59); thus, a higher proportion of these infections must have occurred during the first stage of infection. In addition, it has been shown that the V3 loop of this virus is relatively well conserved compared to those of other subtypes and that it is the region downstream of the loop that is highly variable (14, 27, 51). Hence, both the conservation of the V3 loop and the more rapid spread of this subtype across the African continent may have influenced the evolutionary rate of this virus.
The evolutionary rate of subtype B from Europe and North America was much lower than expected in a slow epidemic. However, subtype B differs greatly from all the other subtypes by the fact that it is subjected to the selective pressure of antiretroviral treatment, common in these parts of the world. If antiretroviral therapy is successful, the viral replication within a host will be diminished, and there will be no measurable accumulation of substitutions in env. It has previously been shown that effective antiretroviral treatment can slow down and even totally abolish the evolution of HIV-1 in the envelope region (12, 47, 53). It is possible that this effect is reflected in the low evolutionary rate found in our subtype B analyses. On the other hand, it has also been suggested that suboptimal therapy can lead to increased evolutionary rates (35). Furthermore, the initial spread of subtype B showed a rapid increase within specific risk groups, such as the homosexual risk group, until it slowed down during the 1990s (44, 55). Thus, the overall evolutionary rate for subtype B may have been affected also by the combination of initial rapid and late slow spread.
In conclusion, there are large differences in the evolutionary rates in different epidemics worldwide, and there may be different rates during different phases of an individual epidemic, all depending on the specific epidemic growth rate. On the pandemic level, this is yet another component in the evolutionary dynamics of HIV and it is interesting to note that such rate differences may affect estimates of the times of origin of HIV-1 in different parts of the world. Thus, it is possible that more-accurate dates could be inferred if the epidemic rate were taken into account.
We thank Jan Albert, Eric Hunter, Cynthia Derdeyn, Bette Korber, Jean Carr, and Andrew Rambaut for useful advice and discussions.
This study was supported by an NIH-DOE interagency agreement (Y1-AI-1500-07).
The GenBank accession numbers for FSU subtype A1 are as follows: AY829203, AY829205, AY829206, AY829208 to AY829212, AB097179 to AB097181, AB097184 to AB097193, AF061584 to AF061603, AY290866, AY290868 to AY290909, AY290911, AY290912, AY290915 to AY290918, AY290920 to AY290937, AY290939 to AY290955, AY290959, AY290961, AY290962, AF165126 to AF165132, AY623822, AY623824 to AY623830, AF284039 to AF284061, AF170647 to AF170651, AF170655 to AF170662, AF051492, AF051498, AF051517, U93611, AF100934, AF100935, AF100939, AF025588 to AF025590, AF025593 to AF025596, and AY755125 to AY755139.
The GenBank accession numbers for Southeast Asia subtype B′ are as follows: L07449 to L07456, L07460, L19238, AF209202 to AF209204, U15576 to U15587, U65538 to U65548, U22543 to U22547, U22549, U22551, U22552, U22554 to U22556, U22558 to U22560, U22562 to U22566, U22568 to U22574, U22576 to U22603, U22605 to U22608, U22610, U22613 to U22616, U22618 to U22623, U22626, AF082536 to AF082555, AF151767 to AF151773, AF101121, AF101122, AF385992, and AF417200.
The GenBank accession numbers for Southeast Asia CRF01_AE are as follows: U22542, U22548, U22553, U22557, U22561, U22567, U22575, U22604, U22609, U22611, U22612, U22617, U22624, U22625, U48719, U48720, U90068 to U90073, U90077, U90079, U90082, U90085 to U90088, AF081710 to AF081780, AF151737 to AF151766, AF151774, AF151775, L07457, L07462, AF160732 to AF160736, AF080193 to AF080199, AF080201 to AF080203, AF160737 to AF160740, AF160746 to AF160750, AB034839 to AB034844, AB034846 to AB034848, AB034852, AB034853, AB034855 to AB034860, AY358036 to AY358039, AY358041 to AY358049, AY358051 to AY358054, AY358056 to AY358068, AY635620 to AY635626, AY635628 to AY635639, AF326123 to AF326129, AF326131, AF326133 to AF326135, AF326139, AY283413 to AY283416, and AY283418 to AY283426.
The GenBank accession numbers for Africa subtype A1 are as follows: AY905583, AY905586 to AY905590, AY905596, AY905601 to AY905605, AY669700 to AY669705, AY669769, AY713407, U51190, U88823, AY288084, AY713406, AF004885, AF028319, AF028324, AF028328, AF119189, AF119200, AF119201, AB052867, AF361871, AF361873, AF361876, AF361879, AF372392, AJ404009, AJ404013, AJ404015, AJ404016, AJ404020, AJ404022, AJ404052, AJ404085 to AJ404087, AJ404090, AJ404100, AJ404101, AJ404106, AJ404108, AJ404110, AJ404114, AJ404116, AJ404118, AJ404140, AJ404141, AJ404143, AJ404150, AJ404152, AJ404153, AY140594 to AY140596, AY140598, AY140607, AY322190, AY322193, AF484478, AF484479, AF484491 to AF484493, AF484496, AF484503, AF484507 to AF484509, AF484520, AF457052, AF 457055, AF457056, AF457059, AF457062 to AF457070, AF457073, AF457075, AF457078, AF457079, AF457081, AF457083, AF457084, AF457086 to AF457089, AJ490697, AJ490700, AJ490704, AJ490705, AJ490708, AJ490711, AJ490714, AJ490721 to AJ490724, AJ490726, AJ490729, AJ490731, AJ490734, AJ490740 to AJ490743, AJ490745, AJ490746, AJ490748, AJ490752, AJ490753, AJ490755, AJ490757 to AJ490759, AJ490761, AJ490763 to AJ490766, AJ490768, AJ490770, AJ490777, AJ490781, AJ490782, AJ490789, AJ490790, AJ490792, AJ490793, AJ490795, AJ490798, AJ490799, AJ490803, CQ891777, CQ891778, AY253305, AY253306, AY253314 to AY253316, AY253318, AY243319, AY371143, AY371163, AY371164, AY456285, AY456286, AY456291 to AY456294, AY456300, AY456301, AY456303, AY456306, AY676582 to AY676586, AY174897, AY174925, AY174981, AY175006, AY175045, AY175079, AY175104, AY175152, AY175183, AY175234, AY175291, AY175340, DQ155223, and DQ155257.
The GenBank accession numbers for Europe and North America subtype B are as follows: Z29219 to Z29222, Z29224 to Z29258, L08312, L08334, AF065562, AF065590, U27434, AF041125, AF112539, U23112, U23126, U23128, U23130, U68502, U68503, U68508, U58777, U28668, U28671, U28675, U28676, U28680, U28681, AF041132, AF041134, AF191349, AF191365, AF191372, AF191375, AF191377, AF191404, AF191407, AF191410, AF191415, AF191423, AF191380, AF191381, AF191384, AF191429, AF016548, AF016550, AF016553, AF016554, AF016556, AF016558, AF016561, AF016563, AF016564, AF016568 to AF016570, AF016572, AF016575, AF016577, AF016579, AF181291 to AF181293, AF181295, AF181296, AF181298, AF181326, AF181328, AF181329, AF181359 to AF181362, AF181364, AF181365, AF181377, AF181378, AF181380, AF181382, AY308760, AY314044, AY322103, AY331291, AY331282, AF223984, AF223985, AF223987, AF223988, AF223994, AF223997, AF224001, AF224003, AF125300, AF325930, AF325933, AF325948, AF325950, AF325959, AY535455, AY835447, AY835451, AF466232, AY290956, AY290957, AY835449, and AY524180.
The GenBank accession numbers for Africa subtype C are as follows: AY146085, AY146087 to AY146090, AY146091 to AY146104, AY146106, AY146108 to AY146140, AY146142 to AY146151, AY146153 to AY146180, AY146182 to AY146201, AY146203 to AY146214, AF085304, AF085315, AF085320, AF254773, U06716 to U06719, U07237, Z76039, Z76306, L48067, L48068, U88730 to U88732, U88745, U88746, U88766, U88778 to U88780, U88787 to U88790, U88794 to U88799, U88802 to U88812, U33762, AF101462, AY669739, AY669749, U08453, U08455, AF095826, AF101458, AF101460, AF101461, AF101463, AF101465, AF110959, AF110962, AF110969, AF110972, AF110973, AF110976, AF110979, AF245525 to AF245529, AF245531 to AF245534, AF245536, AF245545 to AF245548, AF245556, AF245566, AF245568 to AF245571, AF245579, AF245582, AF245588, AF245590 to AF245592, AF245596, AF245602 to AF245608, AF245611 to AF245613, AF290027, AF095831, AF158871, AF159869, AF159872, AF160813 to AF160828, AF361874, AF361875, AJ272645, AJ272653, AJ272663, AY140600, AF254766 to AF254772, AF254774 to AF254777, AF254782, AF254783, AF391240, AY158534, AY158535, AY423929, AY423965, AY424151, AY424167, AY424176, AY424195, AY529659, AY529661 to AY529663, AY585272, AY135970, AY136014, AY136041, AY424034, AY424061, AY424078, AY424099, AY529668 to AY529673, AY529675, AY529676, AY529678, AY135965, AY463217 to AY463222, AY463225 to AY463234, AY463236, AY529667, AY585266, AY838565 to AY838568, AY137011 to AY137026, AY137028 to AY137031, AY137042 to AY137051, AY137053 to AY137062, AY137064, AY137065, AY137069 to AY137073, AY529671, AY529672, AY196497, AY736823, AY703908, AY772690 to AY772692, AY772694 to AY772696, AY772700, AY878056, AY878057, AY878060, AY878061, AY878063 to AY878065, AY878068, AY878070, AY901966 to AY901972, AY901975, AY901981, DQ011165, AY703909 to AY703911, AY772697 to AY772699, AY821310, AY878054, AY878055, AY878058, AY878059, AY878062, AY878071, AY878072, AY901973, AY901974, and AY901976 to AY901980.
Published ahead of print on 18 July 2007.