|Home | About | Journals | Submit | Contact Us | Français|
HIV-1 CRF02_AG accounts for >50% of infected individuals in Cameroon. CRF02_AG prevalence has been increasing both in Africa and Europe, particularly in Italy because of migrations from the sub-Saharan region. This study investigated the molecular epidemiology of CRF02_AG in Cameroon by employing Bayesian phylodynamics and analyzed the relationship between HIV-1 CRF02_AG isolates circulating in Italy and those prevalent in Africa to understand the link between the two epidemics. Among 291 Cameroonian reverse transcriptase sequences analyzed, about 70% clustered within three distinct clades, two of which shared a most recent common ancestor, all related to sequences from Western Africa. The major Cameroonian clades emerged during the mid-1970s and slowly spread during the next 30 years. Little or no geographic structure was detected within these clades. One of the major driving forces of the epidemic was likely the high accessibility between locations in Southern Cameroon contributing to the mobility of the population. The remaining Cameroonian sequences and the new strains isolated from Italian patients were interspersed mainly within West and Central African sequences in the tree, indicating a continuous exchange of CRF02_AG viral strains between Cameroon and other African countries, as well as multiple independent introductions in the Italian population. The evaluation of the spread of CRF02_AG may provide significant insight about the future dynamics of the Italian and European epidemic.
The human immunodeficiency virus type 1 (HIV-1) is characterized by an extensive and ever increasing genetic variability. Four major Groups (M, N, O, and P), at least nine different subtypes (A to K) within the major M Group, and 48 circulating recombinant forms (CRFs) have been described so far.1 Recent estimates indicate that 33.4 to 35.8 million people worldwide are infected by HIV-1,2 with sub-Saharan Africa as the most heavily affected region accounting for 67% of all infections. The sub-Saharan African epidemic, however, varies significantly from country to country in both scale and scope. Adult national HIV-1 prevalence is below 2% in several countries of West and Central Africa, as well as in the horn of Africa, but it exceeds 5% in most Central and East African countries including Cameroon, the Central African Republic, Gabon, Malawi, Mozambique, Uganda, and the United Republic of Tanzania.2
Cameroon hosts one of the broadest genetic arrays of HIV viruses suggesting that the country may be one of the epicenters of the African epidemic. In addition to each of the Group M subtypes and several CRFs, HIV-1 Groups N, O, and P as well as HIV-2 strains have been identified in the country.3–17 The first AIDS case in Cameroon was diagnosed in 1985.18 Since then, about 540,000 cases have been officially reported. Cameroon is currently facing a generalized epidemic, with adult (aged 15 to 49 years) prevalence in the range of 3.9–6.2%.2 Recent studies show a consistent increase in prevalence of several CRFs including CRF02_AG, CRF01_AE, CRF06_cpx, CRF09_cpx, CRF11_cpx, CRF13-cpx, CRF18-cpx, CRF22-cpx, CRF25_cpx, and CRF37_cpx.9,11,12,15 In 2002, the phylogenetic characterization of isolates obtained from subjects living in the cities of Yaoundé, the capital, and Douala showed that 60% of samples were CRF02_AG.13 Recent data from Yaoundé also indicate that the CRF02_AG strain represents up to 50% of the total infection.15 CRF02_AG prevalence has been increasing not only in West and West-Central Africa,8,13–15,19–23 but also in different countries of Europe,24–27 such as Italy,28–32 because of migrations from the sub-Saharan region. This viral variant is one of the most prevalent recombinant forms of HIV-1 in the world, responsible for at least 5% of infections.33
In the present study we sought to investigate the origin and demographic history of CRF02_AG in Cameroon by employing phylogenetic and population genetic (phylodynamic) analysis in conjunction with viral gene-flow estimates from genetic data (phylogeography). Moreover, as Italy's position in the Mediterranean Sea makes it a strategic migration route in all Europe, we also analyzed the relationship between African and Italian HIV-1 CRF02_AG lineages circulating in Italy and those prevalent in different African geographic regions in order to understand the link between both epidemics.
The analysis was performed on reverse transcriptase (RT, amino acid positions analyzed: 36–213) from HIV-1 CRF02_AG pol sequences. All available African sequences were downloaded from the Los Alamos HIV databases [http://www.hiv.lanl.gov/] or generated for clinical routine testing in Italy at the Monitoring Unit of Antiretroviral Therapies, INMI, Lazzaro Spallanzani, Rome, or in the Division of Infectious Disease center, Bergamo. More than 2000 sequences were retrieved from the HIV databases. However, since the main focus of the present work was to analyze CRF02_AG molecular epidemiology, only viral sequences satisfying specific inclusion criteria were included in the final data set:34 (1) sequences had already been published in peer-review journals (except for the new sequences described below), (2) there was no uncertainty about the subtype assignment of each sequence, (3) sequences were not epidemiologically linked by direct donor–recipient transmission, (4) only one sequence per individual could be randomly selected, and (5) city/state of origin and sampling date were known and clearly established in the original publication. The final data set included 824 sequences, including 53 new sequences and 771 reference sequences (Table 1). Among the new sequences, 11 were from patients followed in Cameroon and 42 were from patients living in a small geographic area of Northern Italy (36 of African origin and 6 of Italian origin). The full list of reference sequences analyzed with accession numbers is given in Supplementary Table S1 (Supplementary Data are available online at www.liebertonline.com/aid).
The new 53 sequences analyzed were generated by HIV genotype analysis on 1ml plasma samples by means of a commercially available kit (ViroSeq HIV-1 genotyping system; Abbott Laboratories). Briefly, RNA was extracted using a commercially available kit (QIAmp Viral RNA mini-kit, Qiagen Inc., USA), retrotranscribed by murine leukemia virus RT, and amplified with Amplitaq-Gold polymerase enzyme by using two different sequence-specific primers for 40 cycles. RT-PCR was regularly launched with a positive and a negative PCR control. Pol-amplified products (containing the entire protease and the first 335 amino acids of the RT open reading frame, 1302 nucleotides) were full-length sequenced in sense and antisense orientations by an automated sequencer (ABI 3130) by using seven different overlapping sequence-specific primers.35 The sequences were analyzed using SeqScape-v.2.5 software. The quality endpoint for each individual was ensured by a coverage of the protease and RT sequence by at least two sequence segments. Sequences having a mixture of wild-type and mutant residues at single positions were considered to have the mutant(s) at that position. HIV-1 subtypes were determined by phylogenetic analysis of pol region sequences, as previously described.36
Multiple sequence alignments were obtained with the Clustal algorithm37 and manually edited for optimization. Maximum likelihood (ML) phylogenetic trees were inferred with the PhyML program [http://www.atgc-montpellier.fr/phyml/],38 using the GTR+G+I nucleotide substitution model, which was selected with the hierarchical likelihood ratio test described by Swofford and Sullivan.39 Neighbor-joining (NJ) trees were also obtained using the same nucleotide substitution model with the program PAUP* version 4.0 written by David L. Swofford. The reliability of specific clades in the inferred trees was evaluated by using the SH-like approximate likelihood ratio test (aLRT), which compares the likelihoods of the best and the second best alternative arrangements around the branch of interest. According to type I error rate (test significant|branch is not corrected) analysis, the aLRT of an interior branch is almost exact for a cut-off value ≥0.9 and is considered well supported for a cut-off value >0.75.40
Bayesian genealogies were also inferred with the BEAST v.1.5.3 software package [http://evolve.zoo.ox.ac.uk/beast/]41 using the HKY substitution model, a relaxed molecular clock (see next section), and a constant population size coalescent prior. A Markov Chain Monte Carlo (MCMC) was run for 100,000,000 generations with sampling every 10,000th generation. The results were visualized with Tracer v1.4.1 [http://beast.bio.ed.ac.uk/Tracer]. The effective sample size (ESS) value for each parameter was >500 indicating sufficient mixing of the Markov chain. The maximum clade credibility (MCC) tree was then selected from the posterior tree distribution using TreeAnnotator v.1.4.8 available within the BEAST software package. Final trees were visualized and annotated with FigTree v.1.2.2 [http://tree.bio.ed.ac.uk/software/figtree/].
To obtain a Bayesian estimate of the origin of the major CRF02_AG subepidemics in Cameroon, sequences belonging to each highly supported clade were constrained to be monophyletic. The evolutionary rate (nucleotide substitutions per site per year) and the time of the most recent common ancestor (TMRCA, years) were inferred using sequences sampled at different time points by the MCMC approach implemented in BEAST. The analyses were performed with the same nucleotide substitution model and coalescent prior described in the previous section assuming a strict or a relaxed molecular clock.42 Separate analyses were performed using either the root height of the tree or uniform root height, setting up the lower and upper values to 1908 and 1933, respectively, as assumed to be the 95% confidence interval for the HIV-1 group M origin.43 An MCMC was run for 100,000,000 generations with sampling every 10,000 generation. The results were visualized with Tracer. The ESS value for each parameter was >500, indicating sufficient mixing of the Markov chain.
For each well-supported Cameroon clade in the CRF02_AG genealogy, demographic curves of effective viral population size change over time were estimated according to both parametric (constant and exponential) and nonparametric (Bayesian Skyline Plot, BSP) models. For the BSP calculation, a Bayesian skyline coalescent tree prior was used under a constant skyline model with 10 groups. Parametric and nonparametric curves, and the parameters of each model (including upper and lower 95% high posterior density, HPD intervals), were estimated by a MCMC run for 100,000,000 generations with sampling every 10,000th generation. The results were visualized with Tracer v.1.3. Convergence of the Markov chain was assessed by calculating the ESS for each parameter. All ESS values were >500, indicating sufficient sampling.
Different molecular clock and demographic models were compared by calculating the Bayes Factor (BF), which is the ratio of the marginal likelihoods (marginal with respect to the prior) of the two models being compared.44 We calculated approximate marginal likelihoods for each coalescent model via importance sampling (1000 bootstraps) using the harmonic mean of the sampled likelihoods (with the posterior as the importance distribution). The difference (in loge space) of marginal likelihood between two models is the loge of the Bayes Factor, loge(BF). Evidence against the null model (i.e., the one with lower marginal likelihood) is indicated by 2>[2·loge(BF)] >6 (strong) and >10 (very strong). BF calculations were performed with Tracer v1.4.1.
For the phylogeography analysis, sequences from each one of the Cameroon clades were analyzed separately. The hypothesis of metapopulation structure, i.e., the existence within each clade of different subpopulations linked to different Cameroon geographic regions, was tested with a modified version of the Slatkin and Maddison test45,46 using the MCC trees. A one-character matrix was obtained from the original dataset by assigning to each taxon in the tree a one-letter code indicating its geographic region of origin. The putative origin of each ancestral sequence in the tree was then inferred by finding the most parsimonious reconstruction (MPR) of the ancestral character using either the ACCTRAN or DELTRAN option. The final tree length, i.e., the number of observed migrations in the genealogy, was computed and compared to the tree length distribution of 10,000 trees obtained by random joining-splitting. Observed genealogies significantly shorter than random trees (p<0.01) indicate the presence of subdivided populations with restricted gene flow. Calculations were carried out with MacClade v.4.06.47 The viral gene-flow (migrations) among different regions was traced using the State changes and stasis tool (MacClade software), which counts the number of changes in a tree for each pairwise character state. Viral gene-flow counts were traced for each of the four datasets and then averaged.
Accessibility maps were drawn with ArcGIS software with data obtained from the Africover Initiative (FAO-UN). An accessibility map shows the travel time to the nearest city of population >100,000 people, using road/track-based travel. This accessibility is computed using a cost-distance algorithm, which computes the "cost" of travelling between two locations on a regular raster grid48 based principally on road network data extracted from the Vector Map Level 0 (VMap0) released by the National Imagery and Mapping Agency (NIMA). The cost landscape was derived from road and railway network data, navigable rivers and major water bodies, shipping lanes, national borders, land cover, urban areas, elevation, and slope. The full methodology is described here at http://gem.jrc.ec.europa.eu/gam/sources.htm. Demographic data on the number of African immigrants living in Italy between 2002 and 2008 were obtained from the Italian National Institute of Statistics (http://demo.istat.it/).
Among the 291 sequences from Cameroon analyzed, about 30% appeared to be intermixed with other African sequences in the ML tree (Fig. 1), indicating a continuous exchange of CRF02_AG viral strains between Cameroon and other African countries. The remaining Cameroon sequences clustered within three well-supported (aLRT, p>0.75) monophyletic clades, henceforth referred to as clades 1, 2, and 3. The presence of three well-supported major Cameroonian clades was confirmed in the NJ (data not shown). Clades 1 and 2 shared a common ancestor and appeared to be related to sequences from West Africa. Clade 3 belonged to a distinct lineage related to strains from Gabon, Ivory Coast, Mali, and Senegal (the tree with fully labeled tips is given in Supplementary Fig. S1; Supplementary Data are available online at www.liebertonline.com/aid). HIV-1 strains in clade 1 were isolated mostly from Eastern Cameroonian cities between 1996 and 2007, but no city appeared to be significantly more represented within a specific clade (Supplementary Fig. S2; Supplementary Data are available online at www.liebertonline.com/aid).
On the other hand, clades 2 and 3 included strains isolated mostly from Yaoundé, as well as other cities in central Cameroon, between 1996 and 2007. Overall, the result suggested the presence of at least three separate subepidemics, two of which (clades 1 and 2) possibly originated from a common introduction from Western Africa and the other (clade 3) from Northwestern Africa. Five of the 11 new sequences from patients infected and residing in Cameroon appeared to be intermixed with other African strains (Supplementary Fig. S1). The remaining ones were distributed within the three monophyletic clades: sequence CM39x07 clustered within clade 1, CM98FT07 and CM88FK07 clustered within clade 2, while sequences CM95F06, CM85B06, and CM91FT07 clustered within clade 3. The new sequences from African patients residing in Italy were all intermixed in the tree and did not cluster within any well-supported clade. Most of the new sequences from Italian patients appeared to be only distantly related to each other. Two highly supported clades, comprising two Italian strains each, clustered with one strain from Mali (aLRT, p=0.67) and strains from Cameroon and Mali (aLRT, p=0.81), respectively. The remaining two sequences were significantly related to strains from Ivory Coast (aLRT, p=1.0) and Mali (aLRT, p=0.78). Overall, the results strongly suggest at least four independent events leading to infection of Italian subjects with African CRF02_AG.
The evolutionary rate and the TMRCA of each of the three Cameroon monophyletic clades were estimated by molecular clock analysis. Separate Bayesian genealogies were obtained for strains belonging to each clade and the molecular clock was calibrated by employing the known sampling time of each strain. As expected, the relaxed molecular clock fitted the data significantly better than the strict molecular clock for each clade (Supplementary Table S2; Supplementary Data are available online at www.liebertonline.com/aid).
The median estimate of the evolutionary rate resulted in 1.3×10–3 (95% HPD=0.7×10–3−2.3×10–3), 1.4×10–3 (95% HPD=0.8×10–3−2.6×10–3), and 1.6×10–3 (95% HPD=0.7×10–3−3.7×10–3) for clades 1, 2, and 3, respectively. The marginal density of the rates obtained from the Bayesian analysis, which represents the variance of the molecular clock, was also largely overlapping and all three clades appeared to have emerged at about the same time during the mid to late 1970s (Fig. 2).
To investigate further the population dynamic patterns of each CRF02_AG Cameroon clade, we compared different demographic models of effective population size (Ne) change over time. Surprisingly, the constant size model could not be rejected when compared to the exponential growth model or the nonparametric Bayesian Skyline Plot (Supplementary Table S3; Supplementary Data are available online at www.liebertonline.com/aid).
Bayesian estimates of median Ne for the constant model were about 1.5 times larger for clades 2 and 3 than for clade 1, but since the 95% HPD appeared to be completely overlapping, the hypothesis that Ne was not significantly different for different clades could not be rejected (Table 2). Overall, the data indicate that clades 1, 2, and 3 emerged simultaneously and have been spreading at a relatively low but similar rate within three distinct Cameroon epidemiological networks.
The next step of the analysis was to ascertain whether specific phylogeographic trends existed in different clades. Clades 1 and 3 did not show any significant metapopulation structure (p>0.05; Fig. 3). A weak metapopulation structure was observed for clade 2 (p=0.0001; Fig. 3) where two distinct subpopulations, one including strains sampled from Western cities and the other sequences from Central Cameroon, were evident in the Bayesian genealogy (Supplementary Fig. S3; Supplementary Data are available online at www.liebertonline.com/aid).
To better characterize the geographic distribution and spread of HIV-1 CRF02_AG strains within the country, the location of all Cameroonian strains was superimposed on the country map and compared with accessibility data (Fig. 4). The sampling sites were most densely distributed in the Southwestern part of Cameroon (Fig. 4A). The accessibility map (Fig. 4B) suggested a potential correlation between the strong accessibility network in the south and the dissemination of discrete subepidemics. The map displayed the estimated time to travel from any location to the nearest major urban center, defined as an urban area with >100,000 inhabitants. Cities interconnectivity was very strong in the Litoral and Central regions, around Douala, the most populated city in Cameroon, and Yaoundé, but became progressively lower toward the Northern region. Indeed, Northern Cameroon appeared to be largely disconnected from the south and more connected with N'Djamena, the capital of Chad.
The present study characterized new CRF02_AG strains sampled from Cameroonian subjects, as well as strains from both Italian and African individuals residing in Italy. First, the molecular epidemiology of this subtype in West-Central African countries was investigated, with particular focus on Cameroon, an epicenter of the African epidemic. The phylogenetic analysis showed a continuous exchange of viral strains between Cameroon and other African countries, as well as the presence of three different monophyletic clades within Cameroon, all of which originated around the mid-1970s. All clades were related to strains from different West African countries, none of which, however, is geographically adjacent to Cameroon. A potential explanation is that the French occupation of Burkina Faso, Ivory Coast, and Cameroon until the 1960s led to a founder effect in Cameroon, arising from connections among countries within the French sphere of influence.
The lack of metapopulation structure within the Cameroonian epidemic, in which none of the three major clades was significantly associated with a specific geographic area, is consistent with GIS data. Accessibility maps indicated that Southern Cameroon is characterized by developed road networks and harbor areas that may have significantly fostered HIV-1 spread after initially limited introduction from other African countries. This is in agreement with the hypothesis, recently suggested by Gray et al.,49 that accessibility plays a major role in the emergence and spread of viral regional epidemics. This hypothesis is also supported by data showing that the most vulnerable groups in Cameroon include truck drivers, mobile populations, and military personnel.50
The phylogenetic analysis also showed that CFR02_AG strains from Italian individuals, as well as from non-Cameroonian African immigrants residing in a small locale of northern Italy, were intermixed throughout the tree. In particular, most of the Italian sequences were only distantly related in the phylogeny, which was indicative of at least four independent events leading to infection of Italian subjects. Additional studies including sequences from multiple regions in Italy are needed to assess the frequency and extent of CRF02_AG spillover into the country. However, given that the HIV strains analyzed were sampled from a relatively small geographic area, it is remarkable that several independent introductions were already observed.
Data from the Italian National Institute of Statistics showed that in recent years African immigrants have constantly been increasing in the country (Fig. 5). In particular, from 2002 to 2008 the number of Cameroonians residing in Italy has almost tripled, from 2926 (8% of immigrants living in Italy) in 2002 to 7994 (21% of immigrants living in Italy) in 2008. A similar trend could be observed for immigrants from other African countries with a significant AG epidemic both bordering and not bordering Cameroon. Taken together these findings suggest that conditions may be present for the development of a generalized epidemic of this recombinant form in Italy that might significantly impact HIV-1 molecular epidemiology thus far predominantly characterized by subtype B infections.
In the past years the migration trends from Africa to Western Europe have been changing the face of the AIDS epidemic in terms of subtype distribution/prevalence. Italy's position in the Mediterranean Sea makes it a strategic migration route. Therefore, understanding the CRF02_AG epidemic from Africa to Italy may also play a fundamental role in assessing the potential spread of this viral strain within Europe and North America, especially given the enormous exchange of persons and goods between the two continents.
Understanding HIV molecular epidemiology and the potential future spread of different non-B subtypes also has clinical relevance. It is already known that differences among HIV-1 genetic forms may impact clinical management and surveillance of drug resistance, particularly as treatment is expanded to HIV-1 non-B strains.51–55 Moreover, HIV-1 subtypes are relevant for vaccine design. Although cross-clade immune reactivity has been detected among individuals and vaccine recipients, it is reasonable to expect that a vaccine with an antigenic composition including CRFs may induce a more effective response.56
This work was financially supported by grants from the Italian National Institute of Health, the Ministry of University and Scientific Research, Current and Finalized Research of the Italian Ministry of Health, by the European Commission Framework 7 Programme (CHAIN, the Collaborative HIV and Anti-HIV Drug Resistance Network, Integrated Project no. 223131), PHS R01 AI065265, PHS T32 CA09126, the Center for Research for Pediatric Immune Deficiency, and the Laura McClamma Fellowship and Stephany W. Holloway University Chair for AIDS Research. We thank the Organizers of the XV Workshop in Virus Evolution and Molecular Epidemiology for the training and support that made this study possible. Nazle Mendonca Collaço Véras and Maria Mercedes Santoro contributed equally to this work.
No competing financial interests exist.