|Home | About | Journals | Submit | Contact Us | Français|
Human immunodeficiency virus type 1 (HIV-1) variants show considerable geographical separation across the world, but there is limited information from Central America. We provide the first detailed investigation of the genetic diversity and molecular epidemiology of HIV-1 in six Central American countries. Phylogenetic analysis was performed on 625 HIV-1 pol gene sequences collected between 2002 and 2010 in Honduras, El Salvador, Nicaragua, Costa Rica, Panama, and Belize. Published sequences from neighboring countries (n = 57) and the rest of the world (n = 740) were included as controls. Maximum likelihood methods were used to explore phylogenetic relationships. Bayesian coalescence-based methods were used to time HIV-1 introductions. Nearly all (98.9%) Central American sequences were of subtype B. Phylogenetic analysis revealed that 437 (70%) sequences clustered within five significantly supported monophyletic clades formed essentially by Central American sequences. One clade contained 386 (62%) sequences from all six countries; the other four clades were smaller and more country specific, suggesting discrete subepidemics. The existence of one large well-supported Central American clade provides evidence that a single introduction of HIV-1 subtype B in Central America accounts for most current cases. An introduction during the early phase of the HIV-1 pandemic may explain its epidemiological success. Moreover, the smaller clades suggest a subsequent regional spread related to specific transmission networks within each country.
Central America has a significant human immunodeficiency virus type 1 (HIV-1) epidemic, with an estimated adult prevalence that ranges from 0.2% to 2.3% (1). Belize has the highest prevalence (2.3%) and Nicaragua and Costa Rica the lowest (0.2% and 0.3%, respectively) (2). WHO estimates that approximately 200,000 people in Central America live with HIV, with Guatemala, Honduras, and El Salvador contributing the largest number of cases (3–5) (see Table S1 in the supplemental material). The first cases of HIV-1 in Central America were reported among men who have sex with men (MSM) in the middle 1980s, but since then the virus has primarily spread heterosexually in the general population (6). Today, heterosexual transmission accounts for 70% of the cases, even though HIV-1 prevalence is still higher among MSM, female sex workers (FSWs), prisoners, and some ethnic groups, such as the Kuna population in Panama and the Garifuna population in Honduras (3, 7–12).
The distributions of HIV-1 group M subtypes and circulating recombinant forms (CRFs) differ considerably across the world. HIV-1 group M subtype B dominates the epidemics in North America and Western and Central Europe, whereas subtype B and BF recombinants predominate in South America (13). The information on the molecular epidemiology of HIV-1 in Central America is still limited, but available data suggest that HIV-1 group M subtype B predominates (14–17).
As a consequence of rapid evolution, HIV-1 gene genealogies (phylogenetic trees) contain important information concerning the patterns, processes, and dynamics of viral spread at the population level (18–20). In the present study, we employed phylogenetic methods to obtain, for the first time, detailed information about HIV-1 origin and epidemiological history in Central America. We found five monophyletic HIV-1 group M subtype B clades, suggesting multiple independent introductions into Central America, possibly dating back to the 1960s to 1970s and followed by early population subdivision in several country-specific subepidemics.
The study included plasma, serum, or dried-blot-spot samples from 632 individuals who were diagnosed as HIV-1 positive in Belize, Honduras, El Salvador, Nicaragua, Costa Rica, and Panama during 2002 to 2010. For the purpose of this study, we generated 302 new sequences and included 330 published sequences recently obtained from Honduras (16, 21), Belize (22), and Panama (23). The study included samples from the general population in Honduras (n = 305), Costa Rica (n = 38), Nicaragua (n = 52), Panama (n = 37), and Belize (n = 21) and from MSM and FSWs from El Salvador (n = 78 and n = 38, respectively) and Honduras (n = 36 and 27, respectively).
Ethical approvals were obtained from the Regional Medical Ethics Board in each Central American country, as well as in Sweden, and by the Associate Director for Science at Division of Global HIV/AIDS, CDC. All study participants signed an informed consent form and completed a study questionnaire.
Sequence analysis of the protease (PR) and the first part of the reverse transcriptase (RT) of the HIV-1 pol gene (nucleotides 2268 to 3257 of reference strain HXB2, PR nucleotides 16 to 297, and RT nucleotides 1 to 708) was performed on the plasma or serum samples by using a published protocol (21, 22, 24). Dried-blood-spot samples were sequenced using another published genotyping assay developed by the CDC (25, 26).
Subtype was determined by uploading sequences individually into the REGA HIV-1 automated subtyping tool v2.0 (27). Subtyping was confirmed with phylogenetic analysis using Mega v5 and the neighbor-joining method with GTR+G (i.e., generalized time-reversible model plus gamma distribution) estimated distances and 1,000 bootstrap replicates (28). Drug resistance mutations were defined on the basis of the standardized list of mutations for treated (29) and untreated (30) patients.
Detailed phylogenetic analyses were performed on 625 study subjects infected with HIV-1 group M subtype B. Non-subtype B sequences were too few for meaningful analyses. Subtype B control sequences were obtained in the following way. (i) Related sequences to each study sequence were retrieved by BLAST searches in the Los Alamos HIV (LANL) database (www.hiv.lanl.gov), which generated 137 sequences after removal of duplicates. (ii) All subtype B sequences spanning our PR-plus-RT sequence fragment with known sampling year and country were downloaded from the LANL and GenBank (www.ncbi.nlm.nih.gov) (n = 15,084). Sequences that had already been retained from the BLAST searches were omitted, and from the remaining sequences, we first selected most of the available strains from other Central American neighboring countries (Bahamas, Cuba, Dominican Republic, Haiti, Jamaica, Mexico, and Trinidad and Tobago) (n = 57). Then we made three random selections of 740 sequences from the rest of the world in order to have approximately a 1:1 ratio with our Central America data set. Consequently, we analyzed three independent data sets in which our Central American sequences, sequences from the BLAST search, and sequences from neighboring countries were kept constant, but the random sequences from the rest of the world varied. We also analyzed a fourth data set in which all available pol sequences from Haiti (n = 21) were included. Finally, in order to avoid the ascertainment bias due to the overrepresentation of North American sequences in the HIV databases and to better investigate the relationship between the Central American epidemic and the epidemic in Haiti, which has been reported as the oldest HIV-1 epidemic in the Americas (31), we created five additional data sets, each one including all available pol sequences from Haiti (n = 21) and a random subsample of sequences from North America (42 strains), Central America (21 strains), South America (42 strains), and Europe/Oceania (42 strains). For each data set, two different outgroups were tested: HIV-1 group M subtype D sequences (K03554 and EF633445) or subtype C (U46016) and H (AF005496) sequences.
Multiple alignments were generated using MUSCLE (32), either using all positions or excluding codons associated with drug resistance, according to the 2010 International AIDS Society (IAS) list of nucleoside reverse transcriptase inhibitor (NRTI), nonnucleoside reverse transcriptase inhibitor (NNRTI), and protease inhibitor (PI) mutations, with the exception of minor PI mutations (29).
Phylogenetic signal was tested by means of Xia's test (33) and likelihood mapping analysis (34). Model selection was executed by using PhyML implemented in TOPALi (35). The TVM+I+G model (i.e., transversion model plus proportion of invariable sites plus gamma distribution) was the model with the lowest Bayesian information criterion (BIC) (73322), followed by the GTR+I+G model (i.e., generalized time-reversible model plus proportion of invariable sites plus gamma distribution) (73344), while all other models had BICs of >20 points compared to GTR+I+G. Maximum likelihood (ML) phylogenetic analysis was performed on the alignment. The parallel version of the FastTree software (36) with the GTR+I+G model (a posteriori 20-parameter gamma optimization) was used (since the TVM+I+G model is not implemented in the FastTree software). The tree topology search was carried out using a mix of nearest-neighbor interchanges and subtree/prune/regraft moves. The reliability of each tree split was calculated by a Shimodaira-Hasegawa (S-H) test (37). For the smaller data sets, we also inferred Bayesian trees with MrBayes, using the TVM+I+G model. For each data set, eight Markov chain Monte Carlo (MCMC) runs were carried out in parallel for 107 generations with sampling every 10,000 generations. Convergence among independent runs was assessed by checking that the standard deviation of split frequencies was below 0.01 at the end of the runs (which was the case for each run). Maximum clade credibility trees were obtained from posterior distribution of trees with TreeAnnotator v.1.6.1 included in the BEAST software package (38).
Epidemiological/transmission clusters were inferred using the phyloPart program (39), with a patristic distance threshold that would maximize the number of clusters and would not be higher than the 25th percentile of the overall patristic distance distribution (see Table S2 in the supplemental material). Only subtrees of the phylogenetic tree with a P value of ≤0.1 as obtained by the Shimodaira-Hasegawa test were considered eligible for phyloPart clustering. Furthermore, a cluster was defined as a monophyletic clade only if it included at least 10 sequences.
To obtain a Bayesian estimate of HIV-1 group M subtype B origin in Central America, a data set was assembled that included all sequences from the five Central American monophyletic clades, as well as several HIV-1 non-B, group M strains used as controls. The evolutionary rate (number of nucleotide substitutions per site per year) and the time of the most recent common ancestor (TMRCA) were inferred by the Markov chain Monte Carlo (MCMC) approach implemented in the BEAST software package version 1.6.1 (38). The analysis was performed assuming either a strict or a relaxed molecular clock, with different coalescent priors (constant population size, and Bayesian skyline plot) and a uniform prior for the root height with mean and standard error chosen according to previously published estimates for group M origin (the year 1921 ± 13 years) (40). An MCMC chain was run for 100,000,000 generations with sampling every 10,000th generation. Proper mixing of the MCMC was assessed by calculating the effective sampling size (ESS) for each parameter (38). All ESS values were >200, indicating sufficient mixing of the Markov chain. The results were visualized in Tracer v.1.5.
Different clock models were compared by calculating the Bayes factor (BF), which is the ratio of the marginal likelihoods of the two models (41). We calculated approximate marginal likelihoods for each model via importance sampling (1,000 bootstraps) using the harmonic mean of the sampled likelihoods (with the posterior as the importance distribution). The difference (in loge space) of marginal likelihood between any two models is the loge of the Bayes factor, loge(BF). Evidence against the null model (i.e., the one with lower marginal likelihood) is indicated by 6 > [2 · loge(BF)] > 2 (positive), 10 > [2 · ·loge(BF)] >6 (strong), and [2 · loge(BF)] > 10 (very strong). BF calculations were performed with Tracer v.1.5.
Almost all sequences (625 of 632 [98.9%]) were classified as subtype B (Table 1). Five (0.8%) sequences were classified as subtype C, and two (0.3%) sequences had previously been identified as unique recombinant forms (URFs), URF_AD and URF_AK (21). Table 1 lists the baseline characteristics of the 632 patients. The median age of the participants was 29 years (range, 0 month to 71 years); the proportions of female and male were almost equal, 49% and 51%, respectively. The predominant transmission route was heterosexual (n = 443 [70.6%]): 346 of these patients were classified as belonging to the general population, whereas 65 patients were FSWs. The remaining patients reported the following routes of transmission: homosexual (18.1%), from mother to child (10.6%), and by blood products (0.6%). Two-thirds of the study participants (66%) reported that they never had been exposed to antiretroviral therapy (ART). Interestingly, 32 patients were Garifunas, a population descending from Caribe, Arawak, and West African people living in relatively isolated communities, about which few HIV epidemiological data are available to date.
Mutations associated with antiretroviral drug resistance were detected in 161 (26%) of the sequences: 130 (62%) among 211 individuals with previous ART and 31 (7.5%) among 417 treatment-naive individuals. In the ART group, the prevalence of resistance mutations to any NRTIs was 53%, that to any NNRTIs was 49%, that to any PIs was 23%, that to the combination of at least one NRTI and at least one NNRTI was 42%, and that to at least one drug from each of the three drug classes (NRTIs plus NNRTIs plus PIs) was 14%. The most commonly observed mutations were M184V (46%), T215Y (26%), M41L (23%), K103N (23%), D67N (18%), and K70R (16%).
We found that 31 of 417 (7.5%; 95% confidence interval [CI], 5.1 to 10.4%) of the sequences from the treatment-naive patients had at least one mutation associated with transmitted drug resistance (TDR) (30). The prevalence of resistance mutations to any NRTI was 5.1%, that to any NNRTI was 3.6%, and that to any PI was 0.2%. The most commonly observed mutations were M41L (3.4%), K103N (2.4%), M184V (1.4%), P255H (1%), and T215Y (0.7%).
Detailed phylogenetic analyses were carried out on the 625 subtype B sequences. There was no evidence of substitution saturation in the full or the reduced (i.e., after exclusion of positions associated with drug resistance) alignment, as demonstrated by the Xia's saturation test (P < 0.0001). Likelihood mapping analysis showed that the exclusion of drug resistance mutations improved the phylogenetic signal (52.2% versus 64.7% of the quartets localized in the center of the triangle). Therefore, all subsequent phylogenetic analyses were done on the reduced alignment.
To place the sequences from the six Central American countries in the broader context of the HIV-1 group M subtype B pandemic, we inferred eight different ML phylogenetic trees. Each tree was rooted either with HIV-1 group M subtype D sequences or with subtype C and H sequences and included the 625 subtype B sequences from the six study countries and different subsets of subtype B control sequences obtained by a combination of BLAST searches and selection of sequences from neighboring countries, as well as random sampling of worldwide sequences (see Materials and Methods). As expected, each tree showed in general a star-like topology with little resolution, but all trees resulted in identical clustering for the Central American sequences (not shown). The ML tree, including all 21 available pol sequences from Haiti and rooted with HIV-1 group M subtype D sequences, is shown in Fig. 1. By using each of our 625 Central American sequences in a BLAST search, 100 of the 137 (73%) sequences found were from the United States, while very few sequences were found from other American countries. (The origins of all control sequences are given in Table S3 in the supplemental material.) However, it is difficult to draw firm (or any) conclusions from this finding because of the ascertainment bias due to the overrepresentation of U.S. sequences in the databases.
The phyloPart analyses of the ML tree identified five statistically supported monophyletic clades formed almost exclusively by Central American sequences (Fig. 1; see Fig. S1 in the supplemental material). These five clades were robust to the selection of worldwide control sequences or the sequences used as the outgroup, as they were present also in the other ML trees (not shown). Clade I was very large, contained the majority of our Central American subtype B sequences (n = 389 [62%]) (Fig. 1), and was represented in all six Central American study countries. Despite our efforts to find closely related database sequences using BLAST searches, clade I was almost exclusively Central American and included only six non-Central American sequences (four from the United States and two from Canada). There were four additional smaller sequence clusters that also fulfilled our criteria for statistically supported clades (clades II to V). Clade II included 10 strains (5 from Belize and 5 from Honduras), clade III included 17 Central American sequences (15 from Costa Rica, 1 from Nicaragua, and 1 from El Salvador), clade IV included 14 sequences (12 from El Salvador and 2 from Honduras), and clade V included 11 Nicaraguan sequences (Fig. 1). In the ML tree rooted with subtype D sequences (Fig. 1), clade III shared a most recent common ancestor with a sequence from Japan (see Fig. S1 in the supplemental material) and, in turn, with clades II and I. However, the monophyletic origin of these clades, indicating that clades II and III are just subclades of the major Central American clade I, should be taken with caution since support for this clustering pattern was extremely low (P < 0.5, S-H test) and the clustering was not present in the tree rooted with subtype C and H sequences (data not shown).
Multivariate testing showed that there were significantly higher odds (P < 0.05) for strains from MSM and mother-to-child transmission cases to be part of a cluster compared to strains from heterosexual subjects. However, the distribution of risk behavior within the five Central American clades showed no significant differences compared to the complete data set. The remaining Central American sequences fell outside the five monophyletic clades and were either intermixed in the trees or belonged to additional clades that did not meet our criteria for significant support (see Materials and Methods).
The fact that a single well-supported clade included the 62% of the Central American sequences suggests that a single introduction was responsible for a majority of the current HIV-1 cases in the region. Moreover, the existence of other four smaller clades where strains belonged to either a single country (clade V), two countries (clades II and IV), or three countries (clade III) may represent later and/or independent introductions of subtype B into Central America. There was no well-supported pattern clustering the Central American clades with control sequences from a specific region or country. In order to examine the positioning of our Central American sequences relative to Haitian sequences and the root of the tree, we needed to take into account the strong ascertainment bias in the pol alignment that included 650 Central American sequences but only 21 (available) Haitian sequences, even though both epidemics are roughly the same size (200,000 individuals). To provide a proper comparison, phylogenies were inferred from five additional data sets, each one including the Haitian sequences and a random subsample of sequences from North America (42 strains), Central America (21 strains), South America (42 strains), and Europe/Oceania (42 strains). In each tree rooted with HIV-1 subtype D sequences, the Haitian strains were basal with high posterior probability in the Bayesian analysis (Fig. 2; see Fig. S2 in the supplemental material). This is in agreement with a scenario by Gilbert et al. of a Haitian origin of HIV/AIDS in the Americas (31). Interestingly, the result could not be reproduced when more divergent sequences (subtypes C and H) were used as outgroups (data not shown). The large Central American clade I usually clustered in proximity to U.S. and Haitian strains (see Fig. S2), although the support was low (P < 0.5) in each tree. Taken together, our data indicate that the five HIV-1B Central American clades belong to the U.S.-pandemic subtype B clade.
Different molecular clock models (strict and relaxed) were tested to infer the timing of the introductions of the five HIV-1 subtype B clades in Central America. The Bayes factor (BF) using either a constant population size or Bayesian skyline plot (BSP) coalescent prior favored the relaxed over the strict clock model (see Table S4 in the supplemental material). Moreover, under the relaxed molecular clock, the coefficient of variation 95% highest posterior density (HPD) intervals did not include the value 0 using either a constant (95% HPD, 0.2818 to 0.4628) or BSP (95% HPD, 0.2133 to 0.3339) coalescent prior, in agreement with a better fit of the relaxed over the strict clock model. The BSP coalescent prior also performed significantly better than the constant population size prior in the case of either a strict (BF = 13.3) or relaxed (BF = 18.1) molecular clock. Therefore, the relaxed clock model with a BSP prior was used to investigate the evolutionary rate and the TMRCA of the five Central American HIV1-B clades. Several studies, on the basis of different methods, estimated HIV-1 group M TMRCA in 1921 to 1931 using env sequences (40, 42) and in 1902 to 1939 (median, 1920) using pol sequences (43), while the origin of subtype B has been placed around the early 1960s (44). These estimates were used as internal controls to verify the robustness of the Bayesian inference. The median evolutionary rate estimated for the five clades ranged from 8.0 × 10−4 to 1.1 × 10−3 nucleotide substitutions/site/year (Table 2). The TMRCA median estimate for Central American clade I was 1966, while estimates for clades II to IV ranged from 1971 to 1976 (Table 2 and Fig. 1). However, the 95% highest posterior density (HPD) intervals were rather large, possibly due to the strong star-like signal in the pol alignments (see Table S5 in the supplemental material), and placed the origin of the five Central American clades between the mid-1950s and the mid-1980s (Table 2).
The present study is the first in-depth description of HIV-1 molecular epidemiology in Central America based on a large number of samples collected in six of the seven Central American countries. The findings show that subtype B is predominant in the region, which is congruent with reports from neighboring countries such as Mexico, Colombia, and Venezuela (45–47), as well as smaller reports from Central America (14–17). Interestingly, however, while HIV-1 group M subtype B strains worldwide usually intermix in a star-like phylogeny, indicative of the panmictic structure of the epidemic, strains from Central America appeared to be highly compartmentalized. The phylogenetic analysis showed that the majority of the current subtype B cases in Central America appeared to originate from a single early introduction (95% HPD 1955 to 1977). Furthermore, four additional statistically supported monophyletic clades consisting of at least 10 Central American sequences appeared to have evolved independently, suggesting the existence of discrete subepidemics within different countries, which originated from separate introductions. The fact that the largest clade included 62% of the strains may suggest that of the possible multiple subtype B introductions in the region, one was very successful and gave rise to a regional epidemic, particularly affecting Honduras and El Salvador. The origin of the major Central American clade occurred relatively soon (median estimate, 1966) after the emergence of the most recent common ancestor of HIV-1 group M subtype B (median estimate, 1957), which may explain in part the early compartmentalization of the epidemic. It is important to note, however, that our median estimate of HIV-1 group M subtype B TMRCA (1957), based on pol sequences, is 1 decade earlier than current estimates (1966) based on env (31). This may be due, at least in part, to the higher star-like signal (phylogenetic noise) in the pol gene, resulting in an unrealistically deep divergence date, although the 95% HPD intervals (1944 to 1968) of our estimates are still overlapping with the ones reported in the literature. The existence of independent monophyletic clades also implies the presence of separate transmission networks for subtype B spread within the Central American region. Therefore, our data may provide valuable information for targeted intervention and prevention.
The five Central American clades appeared to belong to the U.S./pandemic clade of HIV-1 group M subtype B (31), but there was no clear pattern clustering subtype B strains from Central America with specific sequences from the rest of the world. Interestingly, our analyses did show that some strains from Haiti were closely related to strains belonging to the major Central American clade, although statistical support was not significant, suggesting a possible common ancestor between these two epidemics. In addition, the Bayesian trees showed Haitian sequences basal to the HIV-1 group M subtype B phylogeny, in agreement with the results of Gilbert et al. (31). It is also important to note that, in spite of the observed phylogenetic noise in the pol data sets analyzed in the present work, most of the Central American sequences still clustered within a highly supported monophyletic clade, which strengthens the main finding that a single introduction is responsible for the majority of the current cases in the region.
The observation that Central American clades and strains may have originated in the U.S. or Haitian strains is expected and in agreement with known early epidemiological data, as well as other phylogenetic analyses (31), geographical proximity, known tourism, and legal and illegal immigration patterns. However, given the low resolution in the phylogenetic trees and the unbalanced sampling from other American countries, no firm conclusions can be drawn and the route of the earliest HIV-1 transmissions in Central America remains to be investigated.
We found evidence of TDR in 7.5% of the 417 treatment-naive patients from the six Central American countries, which is a moderate level according to WHO criteria (48). This is the first comprehensive report of TDR in Central America. The results have to be interpreted in light of the history of antiretroviral therapy in the countries, which initially often involved uncontrolled monotherapy and dual therapy with antiretroviral drugs that were sold illegally or sent from patients' relatives in the United States, whereas in more recent years, almost all therapy is provided as combination therapy through national treatment programs.
Some limitations of our study should be mentioned. First, we cannot exclude the possibility that the sampled population is not fully representative of all HIV-1 cases in the six Central American countries included in the study because the samples represent only a fraction of all HIV infections in the region. Furthermore, sampling was unbalanced between countries. It is possible that a more balanced sampling would show additional clades and/or that clades II to V are more widely spread across Central America than suggested by our analyses. Consequently, while our study is the most comprehensive investigation of HIV-1 spread in Central America to date, the full picture will require additional sampling from several other countries, including Guatemala, which was not sampled at all. It is also possible that the coalescence times for clades II to V would be more similar to those of clade I (i.e., earlier TMRCAs) had they been more thoroughly sampled. Another limitation was the use of partial, rather than full, genome sequences, which could underestimate the proportion of recombinant strains and possibly may limit the possibilities to resolve completely the deepest portions of the trees. However, our preliminary results suggest that the subtypes and clustering of Honduran strains are similar in analyses based on env V3 sequences.
In conclusion, this is the first comprehensive study of how and when HIV-1 has entered and spread in Central America, which is a region with a substantial HIV/AIDS burden. HIV-1 group M subtype B predominates, although sporadic non-B strains were identified. Phylogenetic and molecular clock analysis showed one major well-supported monophyletic cluster compatible with a single early subtype B introduction accounting for most current cases, as well as a subsequent expansion into regional subepidemics, which deserves further investigation in order to understand the ecological factors driving subtype B's successful emergence and dissemination in Central America.
We thank in particular all study participants. Special thanks goes to Michael Worobey for providing the data sets analyzed in the paper by Gilbert et al. and two anonymous reviewers for helpful comments.
This study was funded by the Swedish International Development Cooperation Agency (Sida) on behalf of a bilateral collaboration with the National Autonomous University of Honduras, the Network for Research and Training in Tropical Diseases in Central America (NeTropica) under project no. 06-R-2010, and the Swedish Medical Research Council. The research leading to these results also received funding from the European Union's Seventh Framework Programme (FP7/2007-2013) under Chain grant agreement no. 223131 and EuroCoord grant agreement no. 260694.
Published ahead of print 24 April 2013
Supplemental material for this article may be found at http://dx.doi.org/10.1128/JVI.01602-12.