The phylogenetic analysis of env
revealed clear evolutionary differences between the viral populations present in each twin (Figure ), with twin A showing much longer branch lengths compared to twin B. The viral populations from each twin formed reciprocally monophyletic groups with a shared most recent common ancestor compared to HIV-1 control sequences (lab reference sequences and the closest sequences identified in BLAST analyses), as one would expect given the same source population of HIV. Twin A also had much high levels of genetic diversity compared to twin B (Figure ; Table ). The phylogenetic network reconstruction of rt
(Figure ) also indicates distinct viral populations for each twin, with a relatively higher number of internal sampled genotypes in twin A in comparison with twin B. Twin B has a higher number of tip haplotypes (although not statistically significant with a P
= 0.10; Fisher Exact Test), suggesting that selection is acting with more force on twin B compared to twin A. Both phylogenetic and network reconstruction analyses showed similar results for all genes analyzed (env
) (data not shown). Growth rates were also different between the HIV populations infecting the two twins with twin A showing a rate, at least, two times higher than twin B (Table ). Interestingly, recombination rates were relatively similar except for rt
where the recombination rate (C
) was three times higher in twin A. In both twins the substitution rate was higher than the recombination rate, as indicated by the low estimates of r
(c/μ). The ratio of the per-site rate of recombination to the per-site rate of mutation (c/μ) in all genes was less than one, indicating that mutation plays a more significant role in producing novel genetic combinations than recombination and is, therefore, the major force driving the evolution of these viral populations [see [18
]]. Moreover, the BEAST analyses showed that twin A had a higher relative genetic diversity than twin B and showed different population dynamics through time (Figure ). In env
, after a similar starting point in both twins, the relative genetic diversity remained constant until just recently when both twins had a sudden decrease in diversity followed by an exponential increase with twin A increasing to much higher levels of diversity compared to twin B. In pro
, both twins showed a gradual increase in the relative genetic diversity until just after drug therapy intervention. Then twin B starts to increase in diversity before twin A (perhaps because twin B started drug therapy (AZT) two years prior to twin A (ddI); see methods). Twin A then has a steeper increase in diversity relative to twin B before their levels come together at higher levels. In rt
, after a short initial starting point in both twins, twin A rapidly develops higher diversity than twin B through time with both patients showing increases in diversity over the last two years, but twin A maintaining a relatively higher level of diversity compared to twin B.
Figure 1 Midpoint-rooted phylogenetic tree of env sequences for the viral populations collected from both twins. Numbers above and below branches indicate Maximum Likelihood (ML) bootstrap proportions and Bayesian posterior probabilities (as percentages), respectively. (more ...)
Estimates of genetic diversity (θ), recombination (r and C), and growth (g) for pro, rt and env in each twin.
Figure 2 Statistical parsimony network of rt in each twin. The network is constructed so that the colored squares and circles represent actual cloned sequences. The size of the colored squares and circles is proportional to the number of sequences displaying the (more ...)
Bayesian skyline plots of the past population dynamics of HIV-1 in twins A (black lines) and B (red lines) for env, pro and rt. Solid lines show the median estimate and dashed lines the 95% highest posterior density limits.
We also investigated the extent to which natural selection has impacted the viral populations for each twin. Significant evidence of adaptive selection was detected in rt
from twin B (presumably associated with drug resistance) and in env
from twin A (presumably associated with immune avoidance) using PAML (Table ). The Bayesian approach identified 11 positively selected sites (pP
> 0.95) under model M2 and 13 positively selected sites under model M8 in env
from twin A. All of the 11 sites detected under model M2 were also found by model M8. Models M2 and M8 also detected one site under selection in rt
from twin B (S162D). This site is not documented as a drug resistance mutation site in rt
. Even though they are not detected as positively selected sites in rt
, we found, in both twins, M184V mutation that is associated with conferring resistance to 3TC. Also we detected T215F and K219Q mutations in rt
in twin B, which are associated with conferring resistance to AZT. There are 26 positions known to be associated with protease inhibitors [19
]. Some amino acid changes were seen in pro
in both twins, but none of these changes are known to confer drug resistance. A recent study by Nozawa et al
] pointed out the low sensitivity of PAML for detecting positively selected sites, however this claim has been rejected by Yang et al
] who provide strong support to the sensitivity of this statistical method for inferring positive selection in DNA sequences and for comparative analysis of genomic data [22
]. Recombination can confound the inference of selection. We, therefore, tested for recombination using GARD (Genetic Algorithm for Recombination Detection) that detected a single recombination breakpoint in env
for both twins and in rt
for twin B. The REL (Random Effect Likelihood) selection analyses, that took into account the presence of recombination inferred through GARD, clearly indicated that in env
, positive selection was stronger for twin A, while no difference was detected in the other genes (Table ).
Log-likelihood values and parameter estimates (ω, p, and n) for pro, rt and env in each twin.
Selection analyses for pro, rt and env in each twin.
Our goal was to investigate HIV-1 evolution in identical twins infected synchronically at birth with the same blood transfusion. We found compelling distinctions between the viral populations from each twin with respect to their population dynamics, phylogenetic structure, growth rates, recombination rates, genetic diversity, and selection pressures. These results were unexpected due to a combination of having identical starting points with respect to both the infecting viral population and the host genetic background. That coupled with the seemingly limited pathways of evolution for both immune evasion and evolution of drug resistance [14
] would lead one to predict similar patterns of genetic diversity and dynamics in viral populations resulting in similar clinical outcomes. Instead, we found higher growth rates, higher genetic diversity, and higher recombination in rt
in the healthier twin A compared to twin B. We also found sites under diversifying selection in env
in twin A whereas twin B had only one site under selection in rt
(in PAML analysis). Thus, the higher genetic diversity and higher number of selected sites in env
appear to be associated with slower disease progression, results concordant with that found in a broad study of disease progression in infants [24
]. Similarly, the twins differed in their population dynamics and these differed by gene region. The rt
regions showed the viral population in the healthier twin A with higher levels of genetic diversity throughout the history of infection even when there were significant shifts in overall levels of diversity. On the other hand, pro
showed the viral population in twin B with a gradual increase in diversity post drug therapy with a more rapid increase in twin A that was delayed by the same time period as the delay in the RT inhibitor (2 years). This result suggests that the shape of the response to drug therapy in terms of the HIV population diversity might be diagnostic of future disease progression, but further study with larger sample sizes are needed to better test this response as predictive of disease progression. Nevertheless, these twins clearly show very different responses to infection.
This difference in viral population dynamics is concordant with the differences observed in the clinical courses in each twin. The immune system in twin A shows CD4 T cells at an almost normal rate. The immune system in twin B is depressed, hence no strong selective pressure is acting upon its virus population to evolve fast [24
]. All these results combined provide strong evidence that, at least in this case, the replicate evolutionary experiment did not result in an identical outcome demonstrating the importance of selective response to random epigenetic factors impacting disease progression [25
Indeed, some studies in monozygotic twins revealed increasing epigenetic differences with age [25
]. Additionally, there is a clear potential for founding effects upon infection [27
], even in the context of a blood transfusion as the viral population in a shared blood donation is certainly reduced in genetic diversity and number compared to infectious virus from an infected individual. The combination of a reduced effective population size coupled with strong selective pressure is a key ingredient for founder effects [28
], resulting in populations with very different characteristics as evident here in both the population dynamics and immunology. Clearly, the early impact of founder effects and epigenetic factors on viral population dynamics has diversifying impact over time as the viral populations undergo independent evolution - even in the face of similar genetic selection pressures, identical genetic starting points, and identical host genetic backgrounds (Figure ). This epigenetic drift during development can be either stochastic (especially when impacted by genetic drift) or determined by environmental factors [30
]. Host-virus interactions in early stages HIV infection are presumed to have a large impact on the disease course and viral evolution [31
], yet they are exceptionally difficult to study because researchers are typically not able to design experiments to investigate viral dynamics at infection. Our study capitalizes on the infection of monozygotic twins through a common contaminated blood transfusion to demonstrate that even more complicated epigenetic factors need to be taken into account in developing hypotheses associated with genetic diversity, population dynamics, selection pressure and their association with disease progression.