In North America and Europe, HIV-1 subtype B strains were responsible for the initial spread of HIV-1. The primary modes of exposure were men who have sex with men and intravenous drug use. Consequently, the HIV-1 pandemic in these regions was dominated by subtype B infections. Moreover, since the first HIV-1 strains to be fully characterized were from these regions, they formed the basis for the development of diagnostic, screening, and viral-load assays. With the recognition of the global nature of the HIV-1 pandemic and characterization of strains collected from other parts of the world, it became evident that HIV-1 exhibits a high degree of genetic diversity and that subtype B infections represent a relatively small proportion of infections worldwide (4
; HIV Sequence Database). The first case of non-subtype B HIV-1 reported in the United States (Alabama) was in 1994 and involved a student from Zaire infected with a subtype D strain (41
). Soon thereafter, the first documented cases of native U.S. residents becoming infected with non-B subtypes (A, D, and E [CRF01_AE]) were reported (42
). These cases involved American servicemen who had acquired HIV-1 during overseas deployment. In a subsequent study involving active surveillance of military personnel, 7.4% of recent HIV-1 infections were due to subtype E (CRF01_AE) strains acquired abroad (25
). Nine cases of non-subtype B infections among military health care beneficiaries were subsequently identified by astute physicians who suspected non-B infections based on low or undetectable viral load measurements and either declining CD4 cell counts or history of foreign travel (43
). The first well-documented case of HIV-1 non-B (CRF01_AE) transmission to a native U.S. resident with no history of travel abroad was published in 1996 (44
). Already by 2001, there was evidence of indigenous transmission of non-subtype B strains in rural Georgia (45
). Several subsequent studies documented the presence of non-subtype B infections among immigrant residents in various U.S. settings, with a prevalence ranging from 6 to 95% (27
). Since the majority of available data are from specific populations, accurate estimation of the overall prevalence and diversity of non-B infections in the United States is challenging.
In the present study, the diversity of HIV-1 in the United States was evaluated using 24,386 pol
sequences generated by resistance genotyping of patients in 46 states. In addition, the availability of sequences from specimens collected during the time period from 2004 through September 2011 provided an opportunity to explore temporal trends. However, since the samples available from 2004 to mid-2007 were from New York State exclusively, the trends from 2008 to 2011 are the most informative. This analysis suggests that, although subtype B infections still predominate in the United States, the prevalence of non-subtype B strains is on the rise. The overall prevalence was 3.27%, and in 2011, non-B strains represented 4.12% of this study population (). Moreover, the diversity of the non-B variants is high, with an ever-expanding geographic distribution ( and ). Non-B strains were detected in 37 of the 46 states from which samples were collected. The scope of genetic diversity observed is unprecedented for U.S. studies, with representatives of most HIV-1 group M subtypes (all sub-subtypes), 23 different CRFs, and 39 URFs. Of the 798 non-B strains identified, the majority were subtype C (34.2%), followed by subtype A (18.5%), CRF02_AG (17.9%), and CRF01_AE (6.5%). These four forms represented more than 77% of all non-B strains, which is consistent with several other U.S. studies (32
). The documentation of 23 different CRFs in this study population is indicative of continued introductions of HIV-1 from many geographic sources. While several of the observed CRFs are consistent with links to West Central Africa, CRF14_BG was initially identified in injecting drug users in Spain and Portugal, and the subtype B/F-derived CRFs were first identified in Brazil (4
; HIV Sequence Database). Of interest, although 70.1% of the total sequences were obtained from males, the majority (55.5%) of non-subtype B infections were in females. The bias toward females harboring non-B infections presumably reflects predominantly heterosexual transmission of these variants and is consistent with results from another recent U.S. study (34
). However, due to the design of our study, information on the route of transmission and patients' country of origin is not available.
To our knowledge, this is the largest and most comprehensive survey of HIV-1 non-subtype B strains in the United States to date. However, this study has several limitations and likely underestimates their prevalence and overall diversity in the United States. First, although >24,000 sequences were analyzed, this represents only ~2% of the estimated total HIV-1 infections in the United States. Second, sampling was uneven across the country and across the states, as it was dependent on the ARUP client base. Seventeen states each had fewer than 100 sequences represented (385 sequences, 1.58% of the total), and nine of these states had no non-B sequences, as shown in and . The potential for large regional differences in the proportion of non-subtype B even within a state has been demonstrated (34
). Third, the determination of subtype is based on analysis of resistance genotyping sequence information, which is ~15% of the complete HIV-1 genome. To ensure the identification of all recombinants and to confirm the subtype/recombinant classification, complete-genome sequencing would be required. Fourth, the sequences were generated by utilizing a resistance genotyping assay validated for subtype B; any divergent HIV strains not successfully genotyped are excluded from our analysis. Fifth, some bias is likely introduced due to capturing data only from patients accessing treatment. It is possible that immigrant or lower-socioeconomic-status groups may be underrepresented. Due to the study design, the treatment status (naive versus experienced) of the patients is unknown. Finally, the sampling early in the study was from New York state exclusively, and so diversity in the 2004 to 2007 time frame is likely underestimated. However, the vast majority (>90%) of sequences analyzed were collected from a client base in 46 states.
Since the sequence data were deidentified, it is theoretically possible that some resampling occurred. Analysis of sequence similarity was performed to reduce this possibility. However, such a filter does not discern temporal changes in sequence, such as viral sequence drift, evolution of resistance-associated substitutions, or variability associated with the genotyping assay. Too-stringent criteria would result in the inadvertent elimination of transmission cluster sequences. Thus, the determination of an ideal percentage for identity cutoff is challenging. More-rigorous phylogenetic analysis of the non-subtype B subset of sequences resulted in the elimination of ~10% of candidate sequences due to resampling. This is in line with previous resampling estimates based on an analysis of a similar database with patient identifiers (40
). A similar level of resampling would be expected for the subtype B sequences. Thus, although the non-B prevalence is slightly underestimated, resampling likely has a minimal impact on the overall study conclusions.
Although the HIV-1 pol
gene sequence is relatively conserved and sequences can be influenced by antiretroviral drug pressure, it provides ample resolution for distinguishing between subtype B and non-B strains (26
). The utility of resistance genotyping sequences for HIV-1 characterization has been established by numerous studies (23
). Current U.S. guidelines recommend resistance genotype testing at the initiation of antiretroviral therapy and for subsequent patient monitoring (50
). The availability of HIV-1 nucleotide sequence data from resistance testing provides an opportunity to determine the genomic form of HIV-1 responsible for infection; unfortunately, this is an opportunity not generally capitalized on. Online software such as the REGA Subtyping Tool simplifies mining of subtype information from resistance genotyping sequences. Although the performance of the REGA Subtyping Tool is not perfect (51
), it successfully categorized >90% of our sequences. Of the untypeable sequences, ~90% were subtype B based on subsequent PHYLIP analysis. Most non-B sequences were categorized correctly, although more-rigorous phylogenetic analysis resulted in the reclassification of 22. Thus, this automated and widely accessible online tool provides a useful means for subtype/recombinant form assignment.
An important issue to be explored is the extent of indigenous transmission of non-subtype B strains in the United States. Although studies in the 1990s provided evidence that this was already occurring (44
), the degree to which it contributes to ongoing transmission in U.S.-born patients is unknown. Unfortunately, because of the study design, demographic data related to country of birth, ethnicity, risk factors, and travel history are not available, so this issue cannot be addressed here. Increasing movement of non-B viruses from predominantly immigrant populations to native-born individuals has been documented in France, the United Kingdom, and Canada (20
). A recent study examining HIV-1 in Maryland revealed that non-B viruses were responsible for 6.5% of infections in U.S.-born individuals (34
). Interestingly, a recent study encompassing 15 states and one county that focused on persons whose birthplace was known revealed that, of the non-subtype B variants identified, 28% were in U.S.-born residents (49
). This crossover trend warrants future monitoring. Since the commercial immunoassays used to diagnose HIV-1 infection are not designed to discriminate between subtypes and recombinant forms, the spread of these viruses could go largely undetected.
Among the practical considerations associated with increasing HIV-1 diversity is the potential impact on diagnostic, screening, and patient-monitoring assays. Natural genetic polymorphisms have the potential to modify or ablate epitopes utilized for the detection of p24 antigen and HIV-1-specific antibodies, thereby compromising assay performance characteristics (54
). It should be recognized that the performance of fourth-generation HIV-1 antigen-antibody combination assays used widely throughout the world varies substantially with respect to the sensitivity of detection of antibodies and antigens of non-subtype B and recombinant strains (7
). HIV-1 genetic heterogeneity occurring within primer and/or probe binding sites can also influence the performance of commercial patient-monitoring assays (5
). Reflecting the degree of challenge that HIV-1 diversity presents, even simultaneous targeting of two genomic regions for viral load determination by one commercial manufacturer does not completely prevent underquantitation in some cases (16
). Given the ongoing diversification and continual redistribution of HIV-1, it has become increasingly important to utilize assays whose performance is transparent to group/subtype and recombinant form diversity.
This study shows that the level of HIV-1 strain diversity in the United States is high, with multiple non-B subtypes and many of the recognized CRFs, as well as many URFs. Moreover, the geographic distribution of these HIV variants is widespread. For the years where the coverage was most comprehensive (2008 to 2011), there was a general trend of increasing prevalence of non-B strains in our study population. Factors such as increases in global travel, shifting immigration policies, and indigenous transmission are likely to contribute to an increasing prevalence of non-B variants in the United States. Given the potential implications for diagnostics, patient monitoring, transmissibility, disease pathogenesis, response to treatment, and vaccine development (3
), continual and more-comprehensive surveillance of the prevalence and distribution of HIV-1 variants in the United States would be prudent.