|Home | About | Journals | Submit | Contact Us | Français|
To investigate the origins and evolutionary history of subtype C HIV-1 in Zimbabwe in a context of regional conflict and migration.
HIV-1C pol sequence datasets were generated from four sequential cohorts of antenatal women in Harare, Zimbabwe sampled over 15 years (1991–2006).
One hundred and seventy-seven HIV-1C pol sequences were obtained from four successive cohorts in Zimbabwe. Maximum-likelihood methods were used to explore phylogenetic relationships between Zimbabwean HIV-1C sequences and subtype C strains from other regions. A Bayesian coalescent-based framework was used to estimate evolutionary parameters for HIV-1C in Zimbabwe, including origin and demographic growth patterns.
Zimbabwe HIV-1C pol demonstrated increasing sequence divergence over the 15-year period. Nearly all Zimbabwe sequences clustered phylogenetically with subtype C strains from neighboring countries. Bayesian evolutionary analysis indicated a most recent common ancestor date of 1973 with three epidemic growth phases: an initial slow phase (1970s) followed by exponential growth (1980s), and a linearly expanding epidemic to the present. Bayesian trees provided evidence for multiple HIV-1C introductions into Zimbabwe during 1979–1981, corresponding with Zimbabwean national independence following a period of socio-political instability.
The Zimbabwean HIV-1C epidemic likely originated from multiple introductions in the late 1970s and grew exponentially during the 1980s, corresponding to changing political boundaries and rapid population influx from neighboring countries. The timing and phylogenetic clustering of the Zimbabwean sequences is consistent with an origin in southern Africa and subsequent expansion. HIV-1 sequence data contain important epidemiological information, which can help focus treatment and prevention strategies in light of more recent political volatility in Zimbabwe.
HIV-1 subtype C now accounts for approximately 50% of the estimated 33 million people living with HIV/AIDS and half of the 2–3 million new infections annually . Although the majority of subtype C infections are in southern Africa, this subtype also dominates the epidemics in India, Ethiopia, and southern China, and has entered east Africa, Brazil, and many European countries. Recombinant viruses including genes derived from subtype C have been increasingly recognized in China, Thailand, and Taiwan, where phylogenetic studies indicate that complex BC subtype recombinants, such as circulating recombinant forms CRF_07 and CRF_08, comprise much of the current epidemics [2–4].
The predominance of a single clade of HIV-1 in the most severely affected countries in sub-Saharan Africa has been ascribed to a founder effect . High levels of continued subtype C transmission are thought to be sustained by sexual-social factors, including low rates of male circumcision, the frequency of concurrent partnerships, increased viral load, and shedding of virus in a context of other sexually transmitted infections [6,7]. Comparisons with subtype B isolates, which predominate infections in the Americas and western Europe, have identified characteristics of subtype C viruses that may explain differences in infectivity, including enhanced tropism for macrophages and dendritic cells, elevated viral replication rates through transcriptional regulation , and significantly higher rates of mutation and emergence of drug resistance in women receiving single-dose nevirapine [9,10]. Studies of in-vitro replication rates of subtypes A, B, C, and D suggest lower pathogenic fitness but equivalent transmission efficiency of HIV-1 subtype C, suggesting a higher rate of transmission [11,12]. This may provide a partial, virologic explanation for the disproportionately high rates of HIV-1 infection in southern Africa, where a longer period of persistent infection and transmission preceding symptomatic disease could increase both the population prevalence and the reproductive rate of the epidemic.
The routine, population-based genotyping of circulating viruses, as a surveillance tool for drug resistance, has been exploited for evolutionary and phylogenetic mapping and for exploring the origins, molecular epidemiology, and genetic diversity of HIV-1 [13,14]. Analysis of spatiotemporally sampled sequence data enables the reconstruction of epidemic histories and estimation of demographic parameters, including measures of circulating viral diversity and population-level prevalence over time . A number of studies have demonstrated the suitability of the HIV-1 pol gene for such phylogenetic and evolutionary analyses [16–18]. We analyzed subtype C pol sequences over a 15-year period between 1991 and 2006 obtained from successive cohorts of women screening for HIV infection in antenatal clinics in Harare, Zimbabwe. Using a combination of molecular clock analysis, to estimate the timescale of the epidemic, and a Bayesian coalescent-based approach, to infer demographic parameters of virus transmission, we present new information on the origins, timing, and epidemic growth patterns of subtype C HIV-1 during a period of in-migration and political change in Zimbabwe.
Plasma samples were obtained from four studies of HIV infection and pregnancy in which HIV-positive women were enrolled at antenatal clinics in Harare, Zimbabwe (1991) and the neighboring suburb of Chitungwiza (1998, 2001, and 2006). Each cohort consisted of HIV-infected women enrolled in prevention of mother-to-child-transmission (pMTCT) studies approved by the Medical Research Council of Zimbabwe and Stanford University. Samples were collected at 28–36 weeks of pregnancy before antiretroviral drugs were initiated for pMTCT (Table 1).
We obtained 177pol sequences from four sequential cohorts of young treatment-naive women presenting to antenatal clinics in Zimbabwe from 1991 to 2006. One dataset of HIV-1 subtype C pol used in this analysis was previously published (HPTN023 2001 ). Three additional datasets [WHO 1991 , Swedish International Development Cooperation Agency (SIDA) 1998 , National Institutes of Health (NIH) 2006 ] were created through bidirectional dideoxynucleotide sequencing of pol codons 1–213. For all datasets, RNA was isolated from plasmausing th eNuclisens Extraction Kit (Biomerieux, Durham, North Carolina, USA) and reverse transcribed into cDNA using random hexamer primers and Superscript III reverse transcriptase (Invitrogen, Carlsbad, California, USA). We used a nested PCR amplification strategy employing first round PCR primers RT21(5′-CTGTATTTCTGCTAT TAA GTC TTT TGATGG G-3′) and MAW26 (5′-TTG GAA ATG TGG AAA GGA AGG AC-3′) and second round PCR primers RT20 (5′-CTG CCA GTT CTA GCT CTG CTT C-3′) and PRO-1 (5′-CAG AGC CAA CAG CCC CAC CA-3′). This was followed by direct sequencing of PCR products. Sequencing for all datasets was performed in the same laboratory using the ABI 377 DNA Sequencer (Applied Biosystems Inc., Foster City, California, USA). All Zimbabwe HIV-1C pol sequences have been deposited in GenBank under accession numbers GQ463284-GQ463434. Accession numbers for the previously published HPTN023 (2001) sequences are available in the reference publication .
Sequences were aligned with ClustalW  and manually edited using BioEdit . A large dataset of reference HIV-1 subtype C pol sequences (n = 981) was downloaded from the BioAfrica website (http://www.bioafrica.net/subtype/subC/) and used to characterize the relationship between the Zimbabwean sequences and other subtype C sequences worldwide. All alignments are available from the authors upon request.
HIV-1 subtype was characterized for the Zimbabwean sequence datasets using the REGA Subtyping Tool v2.0 . Evidence for intersubtype recombination was assessed with bootscanning analysis implemented in Simplot v3.5 .
A best-fitting nucleotide substitution model for the Zimbabwean HIV-1 subtype C pol sequences was estimated using hierarchical likelihood ratio tests (hLRTs) implemented in the program Modeltest v3.7  and manual examination in PAUP v4.0 . Maximum likelihood phylogenetic trees were constructed using the inferred model, GTR + I + G, with the program PhyML v2.4. This method employs a neighbor-joining tree as a starting tree and implements the tree bisection-reconnection (TBR) branch-swapping algorithm to identify the final maximum likelihood tree. Support for internal nodes in the trees was obtained via parametric bootstrapping with 1000 replicates.
A coestimate of nucleotide substitution model parameters, phylogeny, and time to the most recent common ancestor (tMRCA) was obtained using the Bayesian Markov chain Monte Carlo (MCMC) method implemented in BEAST v1.4.8 . The approximate marginal likelihoods were calculated for six coalescent demographic models. These included both parametric (constant population size, exponential growth) and nonparametric models (Bayesian skyline plot) with both a strict and relaxed (uncorrelated LogNormal prior) molecular clock. All analyses were performed using the best fitting model of nucleotide substitution as determined by Modeltest v3.7 .
For each demographic model, two independent runs of length 1.0 × 108 steps in the Markov chain were performed using BEAST and checked for convergence using Tracer v1.4 . Samples of trees and parameter estimates were collected every 10 000 steps to build a posterior distribution of parameters. The estimated sample sizes (ESSs) for each run were more than 200, indicating sufficient mixing of the Markov chain and parameter sampling. When similar results were produced from the independent runs of the Markov chain, the log files were combined with the program LogCombiner v1.4.7 available in the BEAST package . A final maximum clade credibility tree, the tree in the posterior sample with the maximum sum of posterior clade probabilities, was determined for each demographic model using TreeAnnotator v1.4.7. (A more detailed description of the parameters used in these analyses is available upon request as supplementary information.)
The program TreeStat v1.1  was used to calculate the proportion of lineages that existed over 5-year intervals of 1980–1985, 1985–1990, and 1990–1995. This methodology utilized the posterior distribution of trees obtained previously in the Bayesian molecular clock analyses (as described by de Oliveira et al.) .
Model comparison in a Bayesian framework was achieved by calculating a measure known as the Bayes factor, which is the ratio of the marginal likelihoods of the two models being compared [33,34]. This flexible method enables the comparison of non-nested models [such as the non-parametric Bayesian skyline plot (BSP) vs. parametric constant or exponential demographic models] that cannot validly be compared using mean log posterior probabilities.
All 177 Zimbabwean HIV-1 pol sequences generated from samples collected between 1991 and 2006 were identified as subtype C with no evidence for intersubtype recombination, reflecting the predominance of this subtype in the HIV-1 epidemic in Harare. A maximum likelihood phylogenetic tree constructed from the Zimbabwe sequences demonstrated a clear relationship between sampling year and phylogenetic distance (branch length). For example, the earliest 1991 sequences were closest to and the recent 2006 sequences were most divergent from the MRCA node (Fig. 1a).
To place the Zimbabwean sequences in the broader context of the subtype C pandemic and identify cross-epidemic relationships that could suggest common geographic origins, another maximum likelihood tree was constructed with the Zimbabwean sequences and an additional 981 subtype C pol reference sequences isolated from several other HIV-endemic countries (Fig. 1b). Most Zimbabwean sequences (158/177) were highly intermingled with sequences from neighboring African nations of South Africa, Botswana, Zambia, Malawi, and Mozambique; very few Zimbabwean sequences (19/177) clustered with subtype C sequences isolated from individuals sampled in nonadjacent African countries of Tanzania and Somalia, or from Sweden, Denmark, Yemen, and India.
To identify the tMRCA and test hypotheses concerning the initial entry of HIV-1C into the Zimbabwean population, time-resolved phylogenetic trees were constructed under a Bayesian coalescent framework with BEAST (Fig. 2a). BEAST maximum clade credibility trees showed evidence for multiple independent introductions of subtype C HIV-1 into Zimbabwe between 1979 and 1981. Notably, as the Zimbabwean sequences were scattered among other sequences from neighboring countries in Fig. 1b, BEAST trees taken in this context support a scenario of multiple cross-border transmissions of HIV-1C with subsequent spread. However, a relatively modest statistical support for the major branches in these trees, which is expected given the relative genetic similarity of the Zimbabwe pol sequences, means that these findings should be taken with caution and considered alongside other evidence.
The estimated nucleotide substitution rates and tMRCA dates for the Zimbabwe pol sequences obtained under the six different evolutionary models of population growth are presented in Table 2. All parameter estimates were highly consistent across the different evolutionary models, and replicate runs of the same model produced almost-identical results. The mean rate of 2.33 × 10−3 nucleotide substitutions/site per year produced an average estimate of the date of origin of the HIV-1C pol sequences in the year 1972 (highest posterior density, HPD: 1969–1974). Given the wide dispersion of the Zimbabwean sequences among regional strains in Fig. 1b, this tMRCA date approximates the date of origin of regionally circulating subtype C viruses, including those that gave rise to the Zimbabwean epidemic. The median estimates of the coefficient of variation parameter were 0.43, 0.24, and 0.25 for the constant, exponential, and Bayesian skyline relaxed clock analyses respectively, indicating relatively little variation in evolutionary rates among branches in the tree irrespective of the evolutionary model employed. Estimates of the Bayes factor (see Methods – Model comparison) for the HIV-1C pol dataset supported models enforcing a relaxed clock over a strict clock and population growth models over a constant population size model. In turn, a relaxed clock exponential growth parametric model was statistically supported over the nonparametric BSP model (Table 3). (More detailed estimates of evolutionary parameters are available from the authors upon request).
Specification of a BSP coalescent tree prior enables the estimation of effective population size (Ne) through time directly from sequence data. Our reconstruction of the demographic history of HIV-1C in Zimbabwe through the BSP analysis identified three epidemic growth phases: an initial, slow growth phase in 1974–1976, followed by an exponential growth phase in 1979–1984, and an asymptotic phase approaching the present time (Fig. 2b). The most rapid increase in the curve occurred during 1979–1981, reflecting a logarithmic expansion in Ne, or effective number of infections, over this short-time period. Estimates indicated an initial median value of eight effective infections (95% HPD 3–18) around 1972, whereas final estimates in the year 2006 were approximately 20 115 effective infections (95% HPD 8334–61 200). Sequence diversity was estimated within each cohort by calculating average pairwise nucleotide distances using the HKY + G model of nucleotide substitution in Phylip v3.6 implemented in BioEdit. Consistent with the results of the BSP model, mean sequence diversity significantly increased from 1991 to 1998 (P < 0.0001), from 2001 to 2006 (P < 0.0001), and over the entire 15-year period (P < 0.0001), calculated using one-way analysis of variance (ANOVA; Supplementary Figure S1).
In addition to estimating changes in Ne through time using a BSP, we also calculated the median proportion of current lineages that existed during 1980–1985, 1985–1990, and 1990–1995 (see Methods – Evolutionary Rate Estimation and Analysis). The results showed that for four of the six evolutionary models (expo-strict clock, expo-relaxed clock, BSP-strict clock, and BSP-relaxed clock), approximately 80% of the lineages were already present in Zimbabwe by 1985 (ranging from 55.4 to 98.3%; Table 2).
The persistence and rapid increase of HIV-1C infection in much of southern Africa has been attributed largely to heterosexual transmission. This is certainly true of Zimbabwe, where continued heterosexual transmission underlies a generalized HIV-1 epidemic with relatively high population prevalence, particularly among young women. In the present investigation, phylogenetic analysis of pol sequences from cohorts of young women sampled from 1991 to 2006 confirms the predominance of subtype C infection in the Zimbabwean HIV-1 epidemic. Our demographic and evolutionary reconstruction of the Zimbabwean epidemic suggests that multiple closely-related subtype C viruses with a common ancestor originating in the early 1970s entered the country in the early 1980s, followed by an explosive growth in effective number of infections over the next decade.
Indeed, similar scenarios of multiple HIV-1C introductions have been previously suggested in a number of southern African nations [35–37]. However, a rapid and disproportionate dissemination of HIV in Zimbabwe in the 1980s relative to neighboring countries is reflected in UNAIDS estimates from serosurveillance programs. In 1990, national HIV prevalence rates among adults aged 15–45 in southern African nations varied from 14.2% in Zimbabwe compared with 8.9% in Zambia, 4.7% in Botswana, 2.1% in Malawi, less than 2% in Namibia and Mozambique, and less than 1% in South Africa and Swaziland .
The origin and timing of the rapid epidemic expansion in Zimbabwe may be partly explained by the political and military history of the region. From 1953 to 1963, migration of populations in southern Africa was facilitated by a pre-existing colonial infrastructure, in which Zimbabwe (Southern Rhodesia), Zambia (Northern Rhodesia), and Malawi (Nyasaland) were politically and economically merged as the Central African Federation. In 1965, in response to the end of colonial rule and the independence of Northern Rhodesia (Zambia), the Southern Rhodesian government issued a unilateral declaration of independence to maintain minority rule and denial of majority rights. This led to a prolonged civil conflict throughout the 1970s between the self-declared Rhodesian government and Black Nationalist liberation groups, during which movement in and out of Rhodesia was severely constrained by international sanctions and government restrictions. The conflict in Zimbabwe ended with the Lancaster House Agreement in December 1979, followed by the return of exiled liberation forces. Thus, in 1980, tens of thousands of expatriates and liberation fighters returned from adjoining southern African countries to a newly independent Zimbabwe.
Our phylogenetic and evolutionary analyses of HIV-1C in the region reflect these historical events. The clustering of nearly 90% of Zimbabwean sequences with sequences from adjacent southern African countries (Fig. 1b) suggests a regional origin and localized expansion of the subtype C epidemic in Zimbabwe. The subsequent rapid increase of HIV-1C following Zimbabwean independence in the early 1980s, reflected in the high seroprevalence of infection in 1990 outpacing infection rates in neighboring countries , is consistent with our retrospective reconstruction and Bayesian estimates of HIV-1 growth, with 98% of the current lineages present in Zimbabwe by 1990 (Fig. 2a). This exponential growth of the Zimbabwean epidemic over a period of less than a decade in the 1980s provides an example of in-migration of a small number of ancestors (founders) and subsequent amplification during a period of heightened political and demographic change.
Rapid epidemic expansion in Zimbabwe in the 1980s, as we have estimated by calculating effective number of infections in a coalescent framework, is supported by three independent sources estimating historical HIV prevalence within the country: blood donor screening, antenatal surveillance, and back-calculation of incidence from mortality statistics. The first clinical case of AIDS in Zimbabwe was documented in 1985, the same year that diagnostic screening for donated blood units was initiated by the Zimbabwe National Blood Transfusion Service (NBTS). Our estimated increase in HIV-1C prevalence corresponds well with NBTS records, which documented a near doubling in seroprevalence among blood donors each year from 1986 to 1990 . Our estimates also mirror WHO sentinel surveillance data, which document a rapidly increasing prevalence among antenatal women and the general population through the early 1990s . Notably, our demographic estimates indicate a peak in the number of infections between 1989 and 1991. Epidemiologic modeling studies back-calculating HIV incidence from mortality statistics estimate a likely peak in incidence in Harare during the same period between 1988 and 1990 . These independent approaches paint a remarkably similar picture, each identifying a period of rapid, peaking expansion of HIV-1C in Zimbabwe in the 1980s. Potential explanations for the observed slowing in the rate of the epidemic after 1990 include a decrease in the number of susceptible individuals, reducing the epidemic reproductive rate, as well as a 6-year nationwide drought, which heightened food insecurity and decreased overall economic output in Zimbabwe [41,42].
Analysis of population demographics using Bayesian coalescent methods has a number of limitations. The presence of recent (slightly) deleterious mutations, which have not yet been eliminated at the population level by purifying selection, would result in an overestimation of the time to the most recent ancestor in the tree . Moreover, all phylogenetic trees contain inherent uncertainty, including variable substitution rates between viral lineages and possible differences in the demographic history of included viruses. Although Bayesian MCMC methods robustly incorporate this uncertainty, inferences based on such trees should be taken with caution and considered alongside other supporting evidence. The present analyses may also be limited by an exclusive focus on HIV isolated from cohorts of pregnant women, in whom infection was identified through antenatal screening. In sub-Saharan Africa, young women comprise a demographic group with a high risk of infection in association with patterns of sequential overlapping partnerships, intergenerational sex, and low condom usage . Recent studies and surveillance data from Zimbabwe have provided encouraging evidence that gradual behavioral changes have reduced the prevalence among 18–24-year-old women from more than 25% to about 20%, suggesting that incidence in this vulnerable group has declined since 1998 [38,45]. This is consistent with our Bayesian estimates of a steady-state prevalence as the number of effective infections reaches an asymptotic phase approaching the present time (Fig. 2b). However, it has been noted that all BSPs show some signal of steady-state dynamics approaching the present that may be partly attributed to within-host evolution . Moreover, like recent surveillance studies, our findings cannot rule out the possibility that the observed decrease in HIV-1 prevalence may be due, in part, to selective AIDS-induced mortality rates, especially given the disruptions in health infrastructure introduced by political volatility in Zimbabwe.
We have focused our analyses exclusively on pol as these are the most routinely available sequences in sub-Saharan Africa and provide the largest possible reference dataset of regional subtype C sequences. The ongoing WHO focus on global HIV drug resistance surveillance (HIVResNet) based on pol sequence datasets will provide an expanding resource to track the evolution of the subtype C epidemic. We note, however, that estimates of evolutionary parameters obtained with any single molecular marker should be taken cautiously and should be validated with alternative viral genes and sample collections. We believe our dataset of Zimbabwe sequences to be the most comprehensive available and through comparative analysis, may provide a basis for understanding the role of migration and evolution as drivers of the epidemic.
Large-scale migration in response to political and economic instability is just as evident in Zimbabwe today. The recent cholera epidemic, spiraling inflation, and political turmoil are limiting disease prevention services and treatment and escalating out-migration to neighboring countries. The political and economic displacement of millions of individuals from Zimbabwe poses further challenges to regional programs operating in the context of a generalized HIV-1 epidemic in southern Africa. These programs urgently need to expand testing, prevention, and treatment services for HIV and other infectious diseases to reach migratory and displaced populations in a rapidly changing political and economic environment.
We are grateful to all study participants in Harare and Chitungwiza, Zimbabwe. The authors wish to thank Keyan Salari and Lee Riley for critical reading of the manuscript. This study was supported in part by the National Institutes of Health. S.D., S.K., and D.K. were supported by the NIH R01 Program (award R01 AI060399-04). J.M. was supported by the NIH Fogarty International Center through the International Clinical, Operational and Health Services and Training Award (ICOHRTA award U2R TW006878-4). T.d.O. was supported by the South African Medical Research Council and the Wellcome Trust (082384/Z/07/Z). G.H. was supported by the National Research Foundation of South Africa and the Atlantic Philanthropies Grant (number 62302). S.D. was supported by the Howard Hughes Medical Institute (HHMI) Research Fellowship and the Paul and Daisy Soros Fellowship for New Americans.
The authors state that they have no conflicts of interest.
S.D. conceived and designed the study and wrote the manuscript. T.d.O. and G.H. provided critical revisions for the manuscript. S.D., T.d.O., and G.H. implemented the phylogenetic and coalescent analyses. S.D., S.K., J.L., and J.M. performed genotypic sequencing. E.J. provided technical assistance for laboratory methods and genotyping. D.K. provided historical information for the region and critical revisions for the manuscript.