|Home | About | Journals | Submit | Contact Us | Français|
Despite prevention efforts new HIV diagnoses continue in the Southern US, where the epidemic is characterized by significant racial/ethnic disparities. We integrated phylogenetic analyses with clinical data to reveal trends in local HIV transmission.
Cross-sectional analysis of 1671 HIV-infected individuals each with one B-subtype pol sequence obtained during chronic (82%; UNC Center for AIDS Research Clinical Cohort) or acute/recent (18%; Duke/UNC Acute HIV Consortium) infection.
Phylogenies were inferred using neighbor joining to select related sequences then confirmed with Bayesian methods. We characterized transmission clusters (clades n≥3 sequences supported by posterior probabilities=1) by factors including race/ethnicity and transmission risk. Factors associated with cluster membership were evaluated for newly diagnosed patients.
Overall, 72% were male, 59% black and 39% MSM. A total of 557 (33%) sequences grouped in either 108 pairs (n=216) or 67 clusters (n=341). Clusters ranged from 3–36 (median 4) members. Composition was delineated primarily by race, with 28% exclusively black, and to a lesser extent by risk group. Both MSM and heterosexuals formed discrete clusters though substantial mixing was observed. In multivariable analysis, patients with age ≤30 years (P=0.009), acute infection (P=0.02), local residence (P=0.002), and transmitted drug resistance (P=0.02) were more likely to be cluster members while Latinos were less likely (P<0.001).
Integration of molecular, clinical and demographic data offers a unique view into the structure of local transmission networks. Clustering by black race, youth and TDR and inability to identify Latino clusters will inform prevention, testing and linkage to care strategies.
New HIV diagnoses continue in the Southern United States1, the region with the largest number of AIDS cases in the nation and an above average rate of new diagnoses, despite the presence of comprehensive prevention efforts. As in other Southern states, the HIV epidemic in North Carolina (NC)is characterized by significant racial and ethnic disparities. In 2009, blacks accounted for 66% of new infections, with an infection rate over 9 times higher than whites. Similarly, Latinos represent a growing proportion of the NC epidemic, with a significant rise from 1% to 8% of new cases between 1995 and 2005. Although HIV testing efforts have increased following calls for routine screening in 2006, this has only modestly increased new diagnoses. Additionally, presentation late in the course of infection remains common, particularly among Latinos .
The failure to fully reach all individuals at risk for HIV infection, particularly disenfranchised minorities with limited social mobility, highlights the need for continued innovative approaches to better understand HIV transmission on a population level. Prevention programs must target factors contributing to onward HIV transmission locally in order to successfully diminish HIV incidence. Notably, HIV epidemics are heterogeneous, composed of a series of overlapping local sub or micro-epidemics defined by risk groups, temporal variation and localized geographic areas[7–9]. Uncovering these sub-epidemics early is challenging because HIV is often diagnosed years after transmission, making identification of the actual source or mode of exposure difficult. HIV sequence data can be used to characterize the structure of epidemics[10, 11] and when paired with epidemiologic data may provide unique insights into groups responsible for ongoing transmission . This approach to HIV epidemiology is now possible through advances in genomic sequencing, statistical methods, and computational speed. Utilizing HIV pol sequences, derived from genotypes sent during clinical care, results in increased sampling density to uncover trends in transmission at the population level[11, 14].
The objective of this study was to use phylogenetic analyses of HIV pol sequences in conjunction with clinical and demographic data to identify which groups in NC contribute to ongoing HIV transmission – a complementary approach to understanding HIV propagation which may otherwise be invisible by patient history alone. We characterized the composition of phylogenetically reconstructed “clusters”, or groups of people where multiple transmissions likely occurred, and assessed factors associated with membership in these clusters among patients diagnosed from 2000–2009.
We performed a cross-sectional evaluation of patients who had a HIV-1 genotype available for analysis and who participated in one of two cohorts based on the duration of HIV infection at diagnosis: (1) “unknown/chronic” or (2) “acute/recent”. The unknown/chronic cohort includes patients attending the University of North Carolina (UNC) Infectious Diseases Clinic who enrolled in the UNC Center for AIDS Research HIV Clinical Cohort (UCHCC). An estimated 98% of clinic patients participate in the UCHCC, providing an accurate representation of the HIV clinic population. To enroll in the UCHCC, participants must be ≥18 years old and provide written informed consent in English or Spanish.
The acute/recent cohort included patients in the Duke-UNC Acute HIV Consortiumor NC Screening and Tracing Active Transmission (STAT), statewide programs tracking acute infections through publicly funded clinics with >99% participation; both have been previously described in detail[15–17]. Acute infection was defined as either a combination of non-reactive ELISA or an indeterminate western blot (WB) paired with a positive HIV RNA or p24 antigen test, or a negative ELISA and WB within 45 days of a positive ELISA or WB. A recent infection was defined as either (1) a documented negative ELISA or WB within 45–180 days of a documented positive test, or (2) a positive HIV RNA result, no detectable evidence of antiretroviral therapy (ART) on the basis of HPLC with ultraviolet detection (Clinical Pharmacology and Analytical Chemistry Laboratory, UNC CFAR, Chapel Hill, NC; lower limit of detection 10–25 ng/ml), and results consistent with a duration of infection <180 days on both a less sensitive ELISA (Vironostika, bioMérieux, Marcy-l’Étoile, France; standardized optical density cutoff <1.0) and an avidity-modified third-generation immunoassay (Bio-Rad Laboratories, Hercules, CA; avidity index cutoff <40).
Full length protease (PR) and partial reverse transcriptase (RT) sequences were extracted from commercially performed genotypes obtained for clinical care. For UCHCC patients, 95% of assays were HIV GenoSure® or GenoSure® Plus (Laboratory Corporation of America®, Research Triangle Park, NC), which spans PR and RT codons 1–400. Sequences from the acute/recent cohort were derived from GenoSure® or the TRUGENE® HIV-1 assay (Siemens Healthcare Diagnostics, Tarrytown, NY), which spans PR and RT codons 1–250. For patients with multiple sequences, only the oldest sequence was used. Sequences were aligned and edited with Se-Al v2.0. Subtypes were identified using the Subtype Classification using Evolutionary Algorithms . Only B-subtype sequences, which represent >98% of subtypes in the area, were included in final analyses. We deposited the sequences from the UCHCC patients which were used in these analyses into GenBank under accession numbers JX160108-JX161480. Codon positions associated with major drug resistance mutations (DRM) according to the IAS-USA 2009 list were initially removed to avoid potential treatment-induced convergent evolution.
Using a neighbor-joining (NJ) phylogenetic tree, reconstructed under the HKY85 model of nucleotide substitution  in PAUP* v4.0 , we selected sequences that differed by <4.5% pairwise genetic distance from at least one other sequence  to scale the dataset to a manageable size. Transmission clusters were confirmed by Bayesian MCMC inference in Mr Bayes  using the general time reversible model of nucleotide substitution with a proportion of invariant sites (ι) and gamma distribution of rates (Γ), sample frequency of every 1000th generation, and a 10% burn-in. Convergence of the estimates was evaluated with generation vs. log probability plots in Tracer v.1.5 using an Effective Sample Size >200. Due to size constraints, the NJ tree was divided in quarters and each run separately in Mr Bayes. Maximum clade credibility trees were generated with a 10% burn-in using Tree Annotator. We further reconstructed a maximum-likelihood (ML) tree (undivided) under the same model conditions in RAxML v.7.0.4, to ensure the quarter trees were robust. Finally, Bayesian analyses were repeated using complete sequences, evaluating the third codon position only, for additional confirmation of the phylogenetic clusters. These confirmatory analyses yielded trees of similar topology to the initial Bayesian trees (data not shown). To further support identification of “local” clusters, a ML tree was reconstructed in Fast Tree with all study sequences and an additional 595 controls from the HIV Genbank database using Viroblast to identify the top ten related sequences for each study sequence. Clusters split by control sequences or with inconsistent topology to our Bayesian or RAxML trees were not considered robust.
Demographic, risk behavior, and clinical data were abstracted from medical records. The UCHCC has standardized data extraction methods from medical charts and institutional databases . We evaluated age at diagnosis, sex, race/ethnicity, year of diagnosis, risk group, ART exposure, DRMs, and geographic residence. Geographic residence was defined as living within the primary catchment area for the UNC clinic, determined by a 16 contiguous county area (out of 100 NC counties) where ~75% of UCHCC patients reside. For ART-naïve individuals, mutations associated with transmitted drug resistance (TDR) were identified using the 2009 standardized surveillance list from the World Health Organization . The CD4+ T lymphocyte (CD4) count and HIV RNA viral load closest to the date of diagnosis were recorded. Transmission risk was categorized as men who have sex with men (MSM), heterosexual, intravenous drug use (IDU) or other/unknown. Patients reporting IDU in addition to another risk were classified as IDU.
We defined transmission “clusters” and “pairs” as phylogenetic clades with n≥3 and n=2 sequences respectively, and supported by Bayesian posterior probabilities=1. Because we utilized a relaxed genetic distance definition, we also conducted sensitivity analyses on clusters which satisfied both Bayesian posterior probabilities=1 and mean intra-cluster pairwise genetic distance difference of ≤1.5% . Cluster composition was evaluated by comparing characteristics among members, with “Predominant” clusters defined as those in which ≥2/3 (66%) of members share the same characteristic. “Homogeneous” clusters were those in which all members shared a characteristic. Finally, we assessed factors associated with cluster and pair membership for patients diagnosed from 2000–2009. Although pairs represent observed transmission, we distinguished between pairs and clusters because many pairs are identified by partners who enter care together or through partner contact tracing, and may not necessarily indicate further onward transmission.
For newly diagnosed patients, factors associated with membership in pairs and clusters were evaluated. Differences in categorical variables were tested with the Pearson’s χ2 test and continuous variables with the Kruskal-Wallis test. Multivariable analyses were fit using logistic regression to identify independent predictors of cluster membership. We first fit a full model based on results of our bivariable analyses and then used backwards elimination to arrive at a final model that included only factors predictive of cluster membership based on a two-sided alpha of 0.05. All data were analyzed using STATA version 11.0 (StataCorp, College Station, TX).
A total of 1694 sequences were available, acquired from 1997–2009, with each sequence from an individual patient. Of these, 23 (1.4%) sequences were non-B subtypes and eliminated. The final dataset (n=1671) consisted of 1373 (82%) sequences from the chronic cohort and 298 (18%) from the acute/recent cohort. In the UCHCC, these patients represent 57% of all participants, which consists of patients diagnosed in the past 20 years with >50% entering care after 2000. Although genotypes from ART-naive individuals were not common until 2007, 312 of these patients had genotypes analyzed from specimens archived prior to ART-exposure. The UCHCC patients with genotypes were similar to those without genotypes by sex (71% vs. 69% male; P=0.39), race (both 59% black), and risk (37% vs. 40% MSM, P=0.10). While our dataset includes a minority (estimated <20%) of sequences from all cases reported to the state in our primary catchment area from 2000–2009, it is highly representative of these cases by ethnicity (66% black; 7% Latino), sex (72% male), and risk (41% MSM). Additionally, among our study samples, the race/ethnic distribution per year (Supplemental Figure 1) was roughly similar to the clinic demographics.
The overall population was 59% black and 72% male, with over half receiving an HIV diagnosis after 2000 (Table 1). Over half (54%), were ART-naïve at the time of sequence acquisition and median time from diagnosis to sampling was 658 days (IQR 18–2959). The newly diagnosed patients consisted of slightly more Latinos (10% vs. 7%), and less patients reporting IDU (11% vs. 5%) but were otherwise similar to the entire population. Among the newly diagnosed patients who had a genotype before ART (n=775; 87%), TDR prevalence was 12%, and TDR was more common among patients diagnosed with acute or recent infection than those with an unknown duration (17.5% vs. 8.6%; P<0.001).
A total of 557 (33%) of sequences grouped in either 108 pairs (n=216; 13%) or 67 clusters (n=341; 20%). The phylogenetic tree is shown in Supplemental Figure 2. Clusters ranged from 3–36 sequences (median 4; IQR 3–5). The median intra-cluster pairwise genetic distance was 0.020 (IQR 0.012–0.029) nucleotide substitutions/site. Although the majority of clusters were small, two large clusters (13 and 18 members) with relatively short mean genetic distances were identified (Figure 1). While most clusters were composed of patients in the chronic cohort, 42 (63%) clusters included at least one member with acute or recent infection. Notably, 74% of these clusters included at least one chronic patient diagnosed prior to the acute/recent(s), suggesting that local transmission may not be dominated by acute-to-acute. The sensitivity analysis of the more strictly defined 1.5% cutoff yielded 103 pairs (n=206; 12%) and 33 clusters (n=122; 7%) with median 3 members (range 3–13). We examined cluster composition by various characteristics including race/ethnicity, risk, duration of infection, age at diagnosis, and presence of TDR. Overall trends were very similar between the two cluster definitions (Table 2).
Overall, clusters were defined by both racial composition and transmission risk. Most clusters (n=45; 67%) were predominant black and 19 (28%) were 100% black. A smaller proportion of clusters were predominant white and none were predominant Latino. Most Latino sequences (57%) were found in predominant black clusters.
By transmission risk, we found a substantial degree of mixing but also discrete transmission in both MSM and heterosexual groups. Of predominant risk clusters, 28 (42%) were MSM and 23 (34%) heterosexual; 12 and 10 of these clusters respectively were homogeneous(Table 2). Cluster sizes among both groups were similar (mean 3.7 MSM vs. 4.7 heterosexual members; P=0.60). No clusters were predominantly IDU and nearly 25% were mixed, mostly between heterosexuals and MSM. Among sequences that clustered, 102 (71%) from MSM grouped in predominant MSM clusters while only 5% fell in majority heterosexual clusters. Of heterosexuals, 91 (60%) grouped in predominant heterosexual clusters and 9% grouped in predominant MSM clusters. Both black and white women were equally likely to be members of heterosexual clusters (55% vs. 62%) but among men, blacks were more likely to be in heterosexual clusters than whites (21% vs. 9%; P=0.02). Among heterosexual men, 72% of whites and 57% of blacks were in predominant mixed or MSM clusters (P=0.28).
Of all clusters, 17 (25%) included at least one member with TDR. Of these, six had ≥60% of members with TDR, and in 5 of these, all members shared the same TDR mutation(s) (Table 3), and were among the newly diagnosed subset. These homogenous TDR clusters were smaller (mean 3.0 vs. 4.2 members; P=0.16) and had significantly smaller mean genetic distance (0.009 vs. 0.020 substitutions/site; P=0.01) compared to the clusters that had no members with TDR.
Of 889 patients who were newly diagnosed, 154 (17%) and 231 (26%) were members of pairs and clusters, respectively (Table 4). Generally, no statistically significant differences were found between patients in pairs and those who were not clustered, though a higher proportion of Latinos were in pairs. Cluster members were more likely to be younger (median age 30 [IQR 23–41] vs. 34 [IQR 26–43] years; P<0.001) than non-members. Latinos were the least likely of all racial/ethnic groups to be clusters members (9% vs. 30% blacks vs. 24% whites; P<0.001). Those with acute infections were also more likely to be in clusters (35% vs. 25% chronic; P=0.02) as were patients residing in UNC’s primary catchment area (27% vs. 22%; P=0.02). Among the 775 ART-naïve patients, 92 (12%) had TDR. Of these, 33 (36%) were cluster members, a significantly higher proportion compared to 174 (25%) of cluster members without TDR (P=0.03). All these variables remained associated with cluster membership in the multivariable regression model (Table 4). Analysis of the more tightly defined 1.5% clusters showed similar overall trends, though with reduced levels of statistical significance for some variables due to lower sample size (Supplemental Table).
Our study represents the largest phylogenetic analysis of HIV pol sequences from the Southeastern US, a region plagued by a substantial burden of HIV/AIDS cases. We found that in our HIV cohort, representing both chronic and acute infections, 20% of pol sequences derived from genotypes formed transmission clusters involving three or more individuals and were delineated primarily by race and, to a lesser extent, risk groups. Although we observed substantial mixing between MSM and heterosexuals, both groups formed discrete clusters and were equally likely to contribute to onward transmission among newly diagnosed patients. In contrast to other studies[32, 33], we found similar cluster sizes among risk groups as well as several large heterosexual clusters, suggesting that the local transmission structure is different compared to regions where MSM transmission predominates. Further, among new patients, non-Latino ethnicity, younger age, acute infection, presence of TDR, and local residence were all independently associated with cluster membership. Our results not only offer a glimpse into the structure of local HIV transmission, but more importantly, pave the foundation for future work exploiting the now growing repositories of pol sequences in the US.
Within transmission clusters, we observed stronger trends with respect to race, especially black race, rather than risk groups. While 28% of clusters were exclusively black, only <18% were exclusively MSM or heterosexual. Few phylogenetic studies to date have explored racial/ethnic differences in transmission clusters [34, 35] though these methods have been used to evaluate migration and domestic transmission in Europe  and to assess risk groups[11, 14, 36]. The racial homogeneity of our clusters parallels alternative analyses showing high rates of assortative sexual mixing, or selecting partners of the same race/ethnicity, among US blacks and black MSM[38, 39]. The weaker trends seen among risk groups may be due to several factors. Since risk groups are self-reported, they may reflect sexual identity rather than actual routes of infection. Additionally, men who identify as MSM may still have sex with women. The substantial mixing between risk groups in our study suggests possible underreporting of MSM or bisexual behavior, which has been noted in other phylogenetic studies . Among heterosexually-identified men who were cluster members, the majority of both whites and blacks grouped in predominant MSM or mixed clusters. We did not find racial differences in the degree of risk group mixing, further countering the hypothesis that black MSM largely contribute to the heterosexual epidemic through undisclosed MSM activity .
Latinos were much less likely to be members of transmission clusters compared to both blacks and whites. This finding was surprising, as the sampling density was representative of our clinic demographics, and because Latinos in NC have nearly three times the HIV incidence rate compared to whites. Members of potential clusters involving Latinos may have been missed in our analyses due to under sampling. Alternatively, many infections among Latinos may have been acquired elsewhere and thus do not cluster in our local cohort. Latinos in NC are more likely to be foreign-born compared to other regions in the US, and some are seasonal migrants who may have partners residing outside the state. Latinos did cluster in pairs similarly to other groups, possibly representing partners seeking care together following infection acquired elsewhere. Future studies tracking migration in conjunction with phylogenetic analyses may help delineate the structure of transmission among this hard to reach group.
Among newly diagnosed patients, we found that the presence of TDR, in addition to younger age and acute infection, was significantly associated with transmission cluster membership. Notably, the association between TDR and clustering remained significant even after controlling for infection duration in the multivariable model. These associations with clustering may simply be markers for very high risk behavior and rapid ongoing transmission as no data to date suggests TDR mutations make the virus more transmissible. Additionally, we found several clusters with nearly all individuals harboring TDR and very few ART-experienced members, suggesting either that experienced individuals were missed by incomplete sampling or further supporting the role of drug naïve individuals contributing to onward spread of TDR [44–46]. Importantly, these clusters may reflect sexual networks that are reservoirs of drug resistance beyond ART-experienced individuals.
Notably, the reconstruction of transmission clusters on the population level represents an estimate of the local epidemic. Through incomplete sampling potential cluster members will be unidentified, either because they are undiagnosed, disengaged from care, or never had a genotype. Furthermore, the parameters of our analysis were not intended to only identify linked transmission between partners as there may be unrecognized third parties involved in the transmission chain and the directionality of transmission cannot be discerned. Although our use of pol sequences acquired during routine care represents an obvious and unavoidable selection bias, we had sufficient sampling density to uncover clusters and demonstrate both expected and unanticipated trends. We used robust statistical support to define our clusters which could potentially underestimate the number of actual transmission clusters. Although our less strict genetic distance cut off helps identify clusters where transmission events are spread out over several years, this method may reveal many historic clusters. Additionally, we are unable to determine when onward transmission events took place – whether before or after diagnosis. While we observed clusters predominantly composed of acute/recent patients , a substantial proportion included one or more chronic patients with diagnoses prior to the acute infection. Importantly, this indicates a failure of secondary prevention and suggests that acute-acute transmission may not dominate spread in NC; a significant amount of transmission may occur during chronic infection .
Our study demonstrates a unique view into the structure of local transmission in NC through the integration of molecular, clinical, and demographic data. These complementary methods have the potential to provide important insight into relationships that cannot be uncovered through traditional epidemiological methods alone. These methods may help identify gaps in case finding and transmission trends among high risk groups including hard to reach populations, such as Latinos in the Southeast. Ultimately, the integration of widespread genotypic sampling with epidemiologic data, time-scaling, and sophisticated statistical methods, could lead to the development of novel models predicating incident cases and onward transmission – ideal targets for prevention campaigns.
We thank patients and staff of the UNC Center for AIDS Research HIV Clinical Cohort, led by Oksana Zakharova; patients and staff of the Duke-UNC Acute HIV Consortium and the NC Screening and Tracing of Acute Transmission Program including Julie Nelson, JoAnn Kuruc, and William Miller; and Myron Cohen for his review of the manuscript.
This work was supported by the University of North Carolina at Chapel Hill Center for AIDS Research (P30 AI50410) and the KL2 Multidisciplinary Clinical Research Development Award (5KL2RR025746-04) from the University of North Carolina at Chapel Hill.
1Southern region includes the following states: Alabama, Arkansas, Delaware, District of Columbia, Florida, Georgia, Kentucky, Louisiana, Maryland, Mississippi, North Carolina, Oklahoma, South Carolina, Tennessee, Texas, Virginia, and West Virginia
Conflicts of Interest and Source of Funding: AD has received grant support from the Bristol-Myer Squibb Virology Fellows research training program. SN has received grant support from Pfizer, Bristol-Myers Squibb, and Merck. JS is an employee of Laboratory Corporation of America. JJE has received consulting fees from Tibotec, Bristol-Myers Squibb, Merck, GlaxoSmithKline and ViiV, lecture fees from Roche and Bristol-Myers Squibb, and grant support from GlaxoSmithKline, Merck, ViiV and BMS.
Parts of this work were presented at the 6th International AIDS Society Conference on HIV Pathogenesis, Treatment and Prevention in Rome, Italy (Abstract #MOAC0205).
Author contributions: Patient data (SN, CBH, JJE), HIV pol nucleotide sequence data (CBH, JS), planning of analyses (AD, SH, DP, JJE), phylogenetic analyses (AD, SH), statistical analyses (AD, SN), preparation of tables and figures (AD), drafting manuscript (AD, JJE), manuscript review and edit (all).