We inferred Bayesian genealogies for HIV-1 A and D p24 and gp41 genes using publicly available sequence data sampled from east, west, and central Africa (selected according to specific criteria discussed in the Supplementary methods
) from the Los Alamos HIV databases (http://www.hiv.lanl.gov
), as well as new sequence data from the Rakai district of Uganda (Table S1
). We also included reference sequences from subtypes C, D, G, H, J, and K as outgroups. All sequences were subtyped using the REGA subtyping tool [24
] and manually checked by obtaining neighbor-joining phylogenies including additional reference sequences. Bayesian genealogies were generated according to constant, exponential, logistic, and Bayesian skyline plot (BSP) demographic models under either a strict or relaxed molecular clock. The relaxed molecular clock always performed significantly better than the strict clock (Bayes factors >20), and the models allowing population growth were consistently better supported than the constant size model (data not shown). The HIV-1A genealogies for both genes and all demographic models showed, as expected, two well supported monophyletic clades (P
= 1.0) within subtype A clustering A1 and A2 sequences, respectively. For both genes, all of the known sequences from the DRC were part of the A2 clade. In the p24 relaxed clock tree with BSP prior (, fully annotated tree in S1A), east African sequences were also monophyletic, although the support was only moderate (P
= 0.62) and the clade was not present under all demographic models (data not shown). A few sequences from west Africa appeared to be intermixed in the east Africa clade in the gp41 tree relaxed clock with BSP prior (, S1B
). In the genealogies for subtype D (Fig. S1c and d
), all of the sequences from east Africa clustered together as did the sequences from west Africa. Although these clades were weakly supported (P
< 0.5), they were present under most demographic models. Overall, the genealogies were consistent with a model of limited introduction of each subtype into east Africa, followed by subsequent expansion. The putative geographic origin for the internal nodes of each tree was inferred by a weighted maximum parsimony reconstruction of ancestral states. In both p24 and gp41 trees, the earliest internal nodes in the A1 clade were assigned to Uganda (branches in red, ) and arose at least one decade prior to the Kenya strains. Even if a sampling bias cannot be excluded, a similar number of strains were sampled from Uganda and Kenya for both genes, strengthening our confidence in the result. In all genealogies, more than 90% of the lineages leading to the east African sequences were already present by 1980 ().
Genealogy and phylogeography of HIV-1A in Africa
The median date of the time of the most recent common ancestor (TMRCA) for the root of each tree estimated using the relaxed clock assumption and the BSP prior was between 1915 and 1922 (Table S2
), which is consistent with the origin of the group M HIV-1 epidemic as previously reported [5
]. The origin of the HIV-1 A1 clade was estimated as 1948 for both genes. The TMRCA for the east Africa subtype D clade, which contained more than 90% of the east African sequences, was estimated between 1958 (gp41) and 1967 (p24). Large confidence intervals accompany these estimates; however, the median dates were similar among the different demographic models tested (data not shown).
In order to investigate the demographic history of the subtypes, we compared a constant model of growth with the exponential model and a nonparametric BSP model for either the A1 clade (HIV-1A) or the east African subtype D clade (HIV-1D). In all cases, the Bayesian model allowing population growth was a better fit to the data than the constant model (Table S3
). For all genes, the high posterior densities (HPDs) for the exponential growth rate were greater than one, suggesting further evidence for a model of growth best fitting the data. We then considered the estimates of effective number of infectious individuals over time obtained from the BSP model. For each subtype, the BSP from p24 () and gp41 () appeared similar: all datasets were characterized by a low number of effective infections during the initial epidemic, consistent with the limited introduction model discussed in the previous sections, and an exponential growth phase during the 1970s, followed by a leveling off period during the mid-1980s/early 1990s. These trend and timing were consistent among several different root priors and for both the full A1 clade as well as just the east African sequences for subtype A (data not shown).
HIV-1 subtype A and D epidemic history and net migration trends for central and east Africa
Net migration rates for east and central African countries between 1955 and 2000 were plotted (; raw data are given in Table S4
). In general, we did not observe any specific trend that correlated with the uneven spread of HIV-1 subtypes and wide range of viral prevalence rates in central-east Africa. No country showed a net migration influx higher than five of 1000 people during the time of the initial spread and expansion of the epidemic (1960–1985). Kenya experienced net migration around zero, whereas the DRC, Sudan, and Tanzania did not exceed ±5 migrants/1000 until 1990. During the 1970s, a modest negative migratory outflow (−5/1000) was observed in Uganda, occurring during the coup by Idi Amin and subsequent civil war. Rwanda exhibited a sharp negative outflow (−10/1000) during the 1960s, coinciding with ethnic conflict and the war of independence, but a dramatic decrease in migration during the first half of the 1990s (>−50/1000), coinciding with the ethnic genocide that occurred in 1994, followed again by an increase in the second half of the decade.
Net migration rates over 5-year intervals for east Africa
Changes in population density in the region were examined using a census-based spatially referenced population database [25
]. Difference maps were created to investigate how population distribution changed between decades. Although these maps are derived from different resolutions of input census data for each country and thus should be interpreted with caution, the overall trends in population distribution changes are clear. highlights how strong population growth occurred in regions across northern Tanzania, Uganda, and the most densely populated areas of Kenya in the 1960s. At the same time, northern DRC and the more rural areas of Kenya exhibited overall declines in population. The 1970s were characterized by a mixed pattern, with large areas of Uganda and Tanzania actually experiencing declines in population numbers, whereas other countries showed strong growth. Kenya consistently showed high population growth in the most densely populated areas.
Population distribution changes in east Africa
Accessibility maps estimate the travel time to the nearest major city (defined as those with a population greater than 500 000 in 2000), using road/track-based travel () [26
]. Although based principally on 1990s data due to the sparseness of earlier comparable datasets, the east African road network architecture has changed little since colonial times [27
]. Accessibility was computed using a cost–distance algorithm that calculates the ‘cost’ of traveling between two locations on a regular raster grid [28
]. The DRC is characterized by several isolated networks of relatively high accessibility, centered on the three major urban areas of Kinshasa in the west, Kisangani in the north, and Kananga, Likasi, and Lubumbashi in the south. Distinct routes of travel are evident in each network, and accessibility rapidly declines with distance from the hub cities. Separating these major networks are hundreds of square miles with poor access to major population centers. The western city of Kinshasa has connectivity with Luanda; however, the southern end of this network is also completely isolated. The northern city of Kisangani shows very minor connections with east Africa, where the major population hubs are Kampala, Nairobi, Mombasa, and Dar Es Salaam. In contrast to the pattern in the DRC, transport network isolation in east Africa is not so evident, with the four principal cities and the majority of Uganda and Kenya all relatively well connected and accessible. Moreover, this well connected east Africa network is relatively isolated from both the DRC and the principal population centers of Sudan, Ethiopia, and Somalia to the north and east. The southern DRC city of Lubumbashi has a small connection with a separate network that extends further south. Major roadways are superimposed on the map and reinforce the connectivity patterns.
Accessibility and subtype distribution in central and east Africa
We then plotted the distribution of HIV-1 subtypes A, C, D, and other subtypes as reported by numerous sources () [29
]. Three major cities in the DRC have very different distributions of subtypes. Kinshasa has very high proportion of ‘other’, similar to the situation in Luanda. Subtype A is present at nearly 40% in Kisangani, which accounts for well over 50% in Kigali, Kampala, and Nairobi. Subtype D has varying proportions in the east African cities, but is much higher there than in western or southern Africa. In Dar Es Salaam, part of the east Africa network, the proportion of subtype A decreases to nearly 25%, whereas in Moshi, which is not connected in this network, subtype C is the most frequent. Subtype C is at a proportion of greater than 50% in Lubumbashi in the southern DRC, and is nearly 100% in Lusaka. Thus, the distribution of HIV-1 subtypes appears to closely mirror the accessibility networks as well.