The present study reports for the first time the molecular epidemiology of the Indian HIV-1 epidemic using sequences of three structural genes (gag
) derived from multiple clinical cohorts across the country. Based on this high quality data set, we found a high prevalence of recombinant strains in India Also, our analysis suggests that the most recent common ancestor of the HIV-1C epidemic in India is dated between 1967 and 1974, more precisely in 1971, approximately five years earlier than previous estimates 
All of the previous studies of the molecular epidemiology of the Indian HIV-1 epidemic have used a few geographically localized samples and a single viral gene 
. A recent such study of global trends in the molecular epidemiology of HIV-1 identified 1.06% recombinant strains (CRFs and URFs) in India between 2000 and 2007 
. Our own observations were comparable for the period between 2007 and 2011 when a single viral gene was used in the analysis. However, when two or three viral genes were included, a higher proportion of HIV-1 recombinant strains were found than has been reported earlier 
. The mosaic structure in our recombinants indicates that URFs might have been formed in the local epidemic due to migration of strains. Some of these URFs are likely to mature into CRFs as they continue to circulate among the populations.
In our study, URFs were more prevalent in these northern and north-eastern regions of India. This outcome, however, must be interpreted with caution given that the numbers of strains analysed from these two regions were relatively small. Previous reports from northern 
and north-eastern 
regions of India demonstrated relatedness to the subtype B segments from China and Southeast Asian countries, mainly Thailand. In addition, it has been reported that the Indian subtype C was one of the parental strains of CRF07_BC and CRF08_BC 
. Thus, the data from our analysis and the previous studies collectively suggest an expansion of the HIV-1 URFs in the northern and north-eastern regions of India. Cross border networking among the intravenous drug users may be the driving force of the spread of recombination in these regions. Furthermore, our data indicate that the transmission of the subtype A1 segment of the A1C recombinants in India most likely occurred from eastern Africa. Continual migration between Africa, especially eastern and southern, and India, for business and migrant labour has been going on for centuries, and this well established trade route may explain the spread.
The tMRCA of HIV-1C in Africa has been dated into 1950s. Our estimates of the tMRCA of Indian HIV-1C to be between 1967 and 1974 allude to the presence of HIV-1C in Indian nearly two decades prior to its detection in 1986 
. The time of introduction of HIV-1C in India is somewhat later than that of Ethiopia (1965) 
and Zimbabwe (early 1970s) 
, but precedes that in Brazil (early 1980s) 
and the United Kingdom (1980s) 
. This would indicate that HIV-1C was introduced in India at an early stage of the global subtype C epidemic and that it is likely to be the oldest HIV-1C epidemic outside Africa.
The population dynamics described in this report indicates the growth of the effective infected population size in three phases. The data corresponds well with the HIV-1 estimates in India, which indicate a stable or reverse HIV prevalence between 2002 and 2009 (National AIDS Control Organization, Annual Report 2011, http://nacoonline.org
.). Our molecular data are also in line with the success of the strategic plan for the HIV prevention, the National AIDS Control Program (NACP) launched by Govt. of India in the 1990s. Targeted interventions through NACP –I-III to high risk group populations and the scale up of this program coincides with the stabilization of the epidemic as observed in our BSP.
Our study has some limitations. First, the depth of the sampling is relatively low given the HIV-1 estimates in the country. In order to more accurately analyze the spread of HIV-1C within the subcontinent, a further increase of the number of samples from the other locations would be needed. Second, the present analysis was restricted to mainly heterosexual transmission and only a few samples from intravenous drug users and perinatal transmission. Third, Bayesian coalescent method has some limitations to conclude the population demography as the deleterious mutations may result in overestimating the tMRCA. Fourth, we have a large number of samples from southern India compared to the other regions. However a technical merit of our study is the large data set with known clinical, demographical, sampling date and geographic origin. The multiple gene analysis for HIV-1 subtyping also minimizes the use of the subtype C segments of recombinant strains in the dataset to estimate the tMRCA.
In conclusion, our study identified a significant increase in the prevalence of recombinant strains (URFs) in India by the application of robust subtyping methods. The introduction of HIV-1C into India was dated back to around 1971, and we found that the epidemic has been stable for over a decade. Our results indicate that HIV-1C was likely to have been introduced into India at an early stage of the global HIV-1 epidemic and that India harbors one of the oldest HIV-1C epidemics worldwide outside Africa. As the depth of sampling (the proportion of available sequences to infections) is still very low in India the resulting diversified epidemiology may pose serious challenges to the development of an effective vaccine that would be applicable in the country. Ongoing country wide molecular surveillance of HIV-1 is likely to contribute towards a better understanding of the epidemiology in this region.