We have retrospectively investigated the dynamics of the developing heterosexual HIV epidemic in the UK by applying Bayesian phylogenetic analysis to anonymised viral sequences obtained in the course of routine clinical treatment. The high level of representation in the UK HIV Drug Resistance Database (over 40% of the estimate of the relevant risk group) has permitted a detailed analysis of the level of clustering, the distribution of cluster size and the distribution of the interval between transmissions for non-B subtype sequences. After screening out non-UK associations, we have found that among probable UK-based infections, 14% of subtype A sequences were found in clusters ≥10 individuals, with 6% of subtype C and 1% for others, although these percentages increase sharply (to a total of 48%) if the denominator is restricted to the 293 individuals within UK-based clusters of 3 or more. That this would suggest that individuals within a UK-based cluster of any size are very likely to be in a large one is a striking conclusion as all likely confounding factors (such as immigration of concordant families) might increase the numbers of pairs, and perhaps clusters of 3 individuals but not of clusters of 10 or more, and therefore would decrease the proportion in large clusters. Despite the different geographical origin of HIV-1 subtypes, large clusters were observed in both subtype C (33 members) whose primary origin would be southern Africa, and subtype A (24 members) which is primarily associated with East Africa, suggesting no major distinction in the structure of the epidemic among communities from different countries.
We explored the epidemic in these groups in greater detail by using time-resolved phylogenies to analyse the dynamics of transmission within clusters, adopting a relaxed molecular clock 
. As each sequence is obtained from a different infected individual we take the internode interval as a maximum estimate of the time between transmissions 
: missing data, in the form of individuals within the transmission network who were not sampled, would always reduce this estimate. Taking this approach a median estimate of the time between transmissions of 27 months was observed overall for non-B subtypes (32, 25 and 22 months for subtypes A, C and other, respectively). This approach also allowed the estimation of the proportion of transmissions within defined intervals after infection: overall just 2% of transmissions in this population were estimated to occurr within 6 months or less (0%, 2% and 5% for A, C and other subtypes, respectively).
In an earlier study of the phylodynamics of HIV in an MSM population attending a large clinic in London we observed a much higher frequency of linkage between individuals with 25% of those with a connection to at least one other being found in large clusters 
. Among these MSM the median transmission interval within clusters, estimated in the same way, was almost half that for the heterosexual population studied here, at 14 months, and 25% of transmissions within clusters occurred within 6 months of infection. Nevertheless, the shape of the distribution of cluster size was similar between the two groups. The overall proportion of transmission intervals between 6–36 months after infection for the heterosexual clusters, 62%, is very similar to that estimated for the MSM dataset (63%; ). While there is an extended right-hand tail of the transmission interval distribution for non-subtype B UK transmission clusters () this is likely to be due in part to the inclusion of a residue of non-UK based distantly linked sequences which were not identified by the global diversity screen. In a recent study of patients selected in primary transmission in Quebec, Brenner et al. 
indicated that while 28% of MSM diagnosed early in infection were part of transmission clusters involving 5 or more individuals, only 13% of non-B subtype infections (mostly heterosexual) were in clusters.
The observed differences between MSM and heterosexuals in inter-transmission intervals could reflect real differences in the dynamics of the epidemics in different risk groups. In this study a possible cause of such a distinction could have been a systematic difference between them, for example in the sampling of the population if there were many more missing individuals from the heterosexual clusters. At the most basic level this would appear to work in the opposite way, as the earlier MSM study was restricted to individuals attending a single clinic in London 
, while the analysis presented here derived from population surveillance of all HIV-infected individuals receiving treatment in the United Kingdom. As indicated earlier (see Introduction
), these results reflect approximately 40% of the HIV-infected Black African population. In contrast, the earlier study analysed 2126 individuals sampled from approximately 11,000 MSM receiving care in London (www.hpa.org.uk
), i.e. ~20% of those receiving care and perhaps 10–15% of all MSM in London. We therefore have approximately 3–4 fold greater coverage of the of African-derived HIV in the UK in this study than of MSM in London previously.
Another possible source of bias could lie in the frequency of testing. The possibility of higher awareness and/or access to HIV-related care among MSM than among the predominantly immigrant HIV-infected heterosexual group could in principle have led to a shorter time between infection and diagnosis. If this also led to a shorter time between infection and initiation of antiretroviral therapy then the period of opportunity for transmission could be reduced. Time of infection is unknown for most of the patients studied so we investigated this possibility by using CD4 counts at the time of diagnosis as a proxy for the average time since infection (Text S1
, Table S1
). In agreement with Stöhr et al. 
, we conclude that there is little difference between the heterosexual and MSM groups in the UK (Figure S9
): the 10% difference we observe in CD4 count at treatment between subtypes C and B cannot explain the observed 50% difference in the median inter-transmission interval.
Following the observations of Liljeros et al. 
that human sexual networks based on contacts within the last year have the properties of scale free networks, we have examined the distribution of the size of transmission clusters among heterosexuals in the UK and find an excellent fit to a power law, consistent with a scale-free network (). Inference from viral sequence data is not direct and as discussed in detail earlier 
, it is important to recognise that the viral transmission network and the sexual network are not the same in a chronic infection such as HIV: a series of transmissions could derive from a single individual rather than as onward transmissions from their sexual contacts. The transmission network is a subgraph of the sexual network but clearly both incorporate a time dimension; the network that fits a power law was that described in terms of sexual contacts in the last year 
and is smaller than the lifetime network. Here we tested several time depths and found that the best fit was obtained with a limit of 5 years, and the value of the shape parameter α, was estimated at 2.1 ( and S5
), close to estimates obtained by Liljeros et al. 
The greater time depth reflects the substantial delay that is usual between infection, diagnosis and the onset of antiretroviral therapy, which would have been the indication for a HIV genotype test from which our sequences are derived. While nodes in a sexual network and nodes in a transmission network cannot be directly equated, the distribution in time of the latter is clearly bounded by the former. On the other hand, the relationship of the sexual network to the transmission network is determined by the probability of transmission per contact which varies greatly and is difficult to estimate 
. Therefore a quantitative description of the transmission network for a population can provide critical information for modelling the epidemiology of HIV transmission.
The degree of clustering deduced from heterosexual population differs from that found previously for MSM and there is a substantial difference in the dynamics. While it is generally recognised that concurrent partnerships form the greatest potentiating factor for HIV and other STIs, the difference between these risk groups suggests either a longer interval between partner change, or a lower per-contact risk of transmission in heterosexuals. With very few inter-transmission intervals below 6 months it is unlikely that the elevated viral load associated with acute infection 
plays a significant role in the UK heterosexual epidemic. The slower dynamics of the heterosexual epidemic thus offer more opportunity for successful intervention, but it is essential that diagnosis is achieved as early as possible.