Over 1.2 million people are currently living with AIDS in North America (Centers for Disease Control and Prevention 2007
; UNAIDS 2008
). Among these, 42% are African Americans, 40% non-Hispanic Whites, 17% Hispanic, and 1% Asian and other races. The main prevention strategy in America is to introduce widespread testing to identify HIV-positive people. This strategy has been successful in some areas, such as the prevention of mother-to-child transmission. In other areas, prevention efforts have been less effective, and although, combination antiretroviral treatment has helped to dramatically reduce the number of people developing and dying of AIDS, around 40,000 new AIDS cases are diagnosed every year. Indeed, in the last few years, there appears to be an increase again in the rate of HIV-1 infection (Centers for Disease Control and Prevention 2008
Despite these statistics, there have been few comprehensive surveys of the viruses responsible for new (incident) infections in North America or to monitor their population dynamics (Flynn et al. 2005
; Keele et al. 2008
). Population genetic studies will help us understand the evolutionary history, origin, epidemiology, and population dynamics of pathogens and, ultimately, develop improved public health control strategies. Indeed, the emerging field of molecular epidemiology allows researchers to define the basic units of transmissible diseases and provides keen insights into the past history and future directions of infectious diseases (Tibayrenc 2005
). A comprehensive survey of genetic diversity of HIV-1 across North America has never before been accomplished, yet such data are useful for the selection of representative antigens to include candidate vaccines and to understand the population dynamics of HIV-1 in this area.
In 2003, a phase III placebo-controlled trial (VAX004) of a candidate HIV-1 vaccine (AIDSVAX B/B) was completed in individuals at high risk for HIV-1 infection (Flynn et al. 2005
). The study enrolled 5,403 volunteers from North America and the Netherlands of which 368 became infected with HIV-1 despite intensive risk-reduction counseling. Envelope glycoprotein sequences were generated for 349 HIV-1 subtype B–infected individuals using the plasma sample obtained closest to the time of diagnosis. A sample of three full-length gp120
clones with open reading frames were obtained per patient resulting in a final data set of 1,047 sequences.
Previously, Pérez-Losada et al. (2009)
analyzed selective pressure variation across races in these data, finding significant differences. In this paper, we provide a much broader analysis of the VAX004 North American sequences in combination with other sequences available in public databases to document HIV-1 envelope glycoprotein sequence variation as a function of treatment status (vaccine or placebo), geography, race, risk group, and viral load. Here we studied potential differences in genetic diversity due to mutation and recombination and extend our previous analyses on selection across races to the four other factors. Finally, we tried to infer the demographic history of HIV-1 in North America and date the origin of the virus. The data analyzed in this paper represent the largest molecular epidemiologic survey of viruses responsible for new HIV-1 infections in North America and provide a unique opportunity to study HIV-1 evolution in an epidemiological context.