Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Stat Commun Infect Dis. Author manuscript; available in PMC 2012 August 15.
Published in final edited form as:
Stat Commun Infect Dis. 2010 December 21; 2(1): 1018.
doi:  10.2202/1948-4690.1018
PMCID: PMC3419597

An imputation method for interval censored time-to-event with auxiliary information: analysis of the timing of mother-to-child transmission of HIV


The timing of mother-to-child transmission (MTCT) of HIV is critical in understanding the dynamics of MTCT. It has a great implication to developing any effective treatment or prevention strategies for such transmissions. In this paper, we develop an imputation method to analyze the censored MTCT timing in presence of auxiliary information. Specifically, we first propose a statistical model based on the hazard functions of the MTCT timing to reflect three MTCT modes: in utero, during delivery and via breastfeeding, with different shapes of the baseline hazard that vary between infants. This model also allows that the majority of infants may be immuned from the MTCT of HIV. Then, the model is fitted by MCMC to explore marginal inferences via multiple imputation. Moreover, we propose a simple and straightforward approach to take into account the imperfect sensitivity in imputation step, and study appropriate censoring techniques to account for weaning. Our method is assessed by simulations, and applied to a large trial designed to assess the use of antibiotics in preventing MTCT of HIV.

Keywords: HIV/AIDS, mixture models, mother to child transmission of HIV, multiple imputation

1 Introduction

Understanding the dynamics of mother-to-child transmission (MTCT) of HIV is critical to developing effective treatment or preventive strategies for the deadly disease. This includes estimating the distribution of the timing of MTCT overall or within one of the exposure periods, i.e., in utero, intrapartum and postnatal (while breastfeeding), and relating distributions of the timing of MTCT to baseline covariates (e.g., The Breastfeeding and HIV International Transmission Study Group, 2004; Iliff et al., 2005; Kourtis et al., 2006; Luzuriaga et al., 2006; Piwoz et al., 2006; Saba et al., 2002).

Unfortunately, infants are often lost to follow-up, e.g., due to death of the infant or mother, moving away from the study site or unwillingness of the primary care-giver to allow the infant to continue in the study. Hence these infants would have unknown infection status at the end of the study. Also, when HIV infection is only assessed at discrete routine clinic visits, the time of infection is only known within an interval. Therefore, the timing of MTCT of HIV is often right- or interval-censored. In addition, missed visits may further complicate analyses, and the HIV detection assay for infants may have low sensitivity soon after infection, which can lead to false negative test results (Cunningham et al., 1999; Dunn et al., 2000; Simonds et al., 1998; Young et al., 2000). Without carefully taking into account these complications in data analysis, a naive application of current methods for censored time-to-event data in the statistical literature may lead to biased estimates and incorrect inferences.

All these complications have indeed occurred in the HIV Prevention Trials Network (HPTN) 024 trial. The HPTN 024 trial was a multi-site, placebo-controlled, double blinded randomized trial of antibiotics to prevent perinatal MTCT of HIV (Taha et al., 2006). In this trial, the primary endpoint was the cumulative transmission rate at a point in time shortly after birth as measured by the imperfect diagnostic assay. During the trial follow-up, the infants were scheduled to be tested within 48 hours after birth to assess in utero transmission. A second visit was scheduled to be between 4 and 8 weeks after birth to assess intrapartum transmission. Subsequent visits were also scheduled to evaluate late postnatal MTCT of HIV via breast milk (transmission first detected after 6 weeks) and mortality. Specifically in HPTN 024, these visits were scheduled at 3, 6, 9 and 12 months.

The usual approach to estimating the distribution of the timing of MTCT of HIV via breast milk is to subset the data to breastfed infants known to be negative at the 4–8 week visit because they tested negative at that or a later visit. Infants who missed the 4–8 week visit and subsequently tested positive would not be included in the analysis because their 4–8 week HIV infection status is unknown. The infant’s time to event is taken to be the time of the first positive test or the midpoint between the last negative test and the first positive test. If the infant has no positive test, his time to event is censored at the last negative test, weaning or death.

The Kaplan-Meier approach is then used to estimate cumulative event rates, and proportional hazards models are used to estimate the association between timing of MTCT and covariates. If interest is in estimation of the late postnatal timing distribution, then the origin is set to 8 weeks. Other general interval-censoring techniques may also be used to estimate the survival distribution (Turnbull, 1976; Sen and Baner-jee, 2007) or hazard ratios (Finkelstein, 1986; Satten, 1996; Goggins et al., 1998). However, none of these approaches account for uncertainty about inclusion of some infants in subset analyses nor do they reflect the unique features of the distribution of the timing of MTCT of HIV which we will detail later.

Other authors have proposed approaches to estimate the true timing of MTCT of HIV accounting for the imperfect diagnostic assay (Balasubramanian and Lagakos, 2003; Zhang and Lagakos, 2007; Gupte et al., 2007); however, these approaches are not easily implemented and still do not address the uncertainty about inclusion of some infants in subset analyses due to missing data.

Multiple imputation (MI) (Rubin, 1996) has previously been proposed to aid in the analysis of interval-censored data. For example, Pan (2000) proposed an imputation scheme based on the Poor Man’s Data Augmentation algorithm (Wei and Tanner, 1991), and Bebchuk and Betensky (2000) used local likelihood methods for imputing interval censored observations. However, neither of these approaches made use of auxiliary information nor did they impute right-censored observations. Glynn and Rosner (2004) proposed a multiple imputation scheme for interval censored paired data based on a parametric frailty model. Most recently, Hsu et al. (2007) proposed a non-parametric imputation approach that uses auxiliary variables and imputes both right and interval censored observations.

In the rest of this paper, we present a flexible imputation method to aid in analysis of the timing of MTCT of HIV. Because there are many variables available that are known to be highly associated with the timing of MTCT of HIV, which can be used to improve the model but may not be desired for inference, our method includes these auxiliary variables for multiple imputation. The imputed values can then be used in a wide range of analysis models.

Specifically, we first present a mixture model that allows that a proportion of infants are born with detectable HIV infection from in utero transmission and that another significant proportion will never experience MTCT of HIV. Additionally, we model the continuous infection time after birth using a mixture of Weibulls model to allow for a flexible estimate of the baseline survival distribution for postnatal transmission while still allowing for straight-forward computation using available software. We next extend the imputation step to allow for imperfect sensitivity in the imputation step allowing the procedure to estimate both time to transmission and time to detectable infection. Additionally, we demonstrate how to use this model to impute transmission times both for right and interval censored observations. Finally, we use multiple imputation methods to calculate the final estimates and their standard errors. An application of this approach to the HPTN 024 data set demonstrates the value of this approach to estimate the distribution of timing of late postnatal transmission. An extensive simulation explores the properties of the MI procedure.

2 Methods

2.1 Overview

Let s denote the time that an infant would first test positive for HIV (referred to as timing of MTCT). Note that this is not the same as the time at which of MTCT of HIV occurs due to the low sensitivity of HIV PCR assays in the period immediately after transmission occurs (Cunningham et al., 1999; Dunn et al., 2000; Simonds et al., 1998; Young et al., 2000). We do not observe s precisely. Instead, we observe the pair of times (L, R), where R < s < L. We define L as the time of the last negative test and R as the time of the first positive test. If the infant never has a negative test, without loss of generality, we set L =−∞. If instead the infant never has a positive test, we set R = ∞ and treat the observation as right-censored.

Given the observed pair (L, R), we can estimate the distribution of s, f(s), or its associated survival distribution, S(s), using interval-censoring estimation techniques, but often we are interested in estimating this distribution over a specific time interval. For example, when examining the late postnatal transmission distribution, we might be interested in estimating the effect of a set of covariates, X, on the timing of transmission after a certain age, t1, g1(s|X) = f(s|s > t1, X), where t1 is often taken to be 6 weeks. Many infants’ observations of (L, R) will contain t1 and therefore it is unclear if they belong to the analysis subset of interest thereby complicating estimation of g1(s|X). Alternatively, we might be interested in examining the effect of covariates on the timing of MTCT in that group of subjects who are infected late postnatally, g2(s|X) = f(s|t1 < s < t2, X), where t2 is usually taken to be the end of follow-up. Again, for many infants, (L, R) will contain either t1 or t2, and it will be unclear if they should be members of the analysis subset of interest.

To allow straight-forward estimation of both g1 and g2, we propose a multiple imputation technique for the actual random variable of interest, s. First, we specify a likelihood-based model for the complete data. Next, we set prior distributions for the parameters in the model. The imputations can then be generated using Markov Chain Monte Carlo (MCMC) methods, where, after a significant burn-in period, the missing data is imputed by taking draws from the posterior predictive distribution conditional on the current draws of the parameters from the posterior distribution of the parameters. Each data set created by this imputation technique is referred to as an augmented data set.

Our proposed imputation model reflects features seen in but not unique to MTCT studies. First, many infants are infected in utero and their infection can be detected immediately after birth. Second, because the at-risk time for MTCT of HIV is limited by exposure to breast milk, not all subjects will experience MTCT of HIV. This is in contrast to a usual time to event analysis where we assume that if we could follow a subject indefinitely and there were no competing risks, s/he would eventually experience the event. Third, s is not observed past some end of follow-up time, t2. To accommodate this, we will assume that all infants are censored at t2 if they have not yet experienced the event and account for this accordingly in the multiple imputation. This assumption will still allow for estimation of both g1 and g2.

Finally, we will also impute the timing of infection, defined as s*, as opposed to the timing of detection, s through specification of a distribution on the time from infection to detection. We will explore two distributions that have been used in the literature. Balasubramanian and Lagakos (2001) used a uniform distribution with fixed parameters when estimating the timing of in utero transmission. Gupte et al. (2007) assumed an exponential distribution for s* when estimating timing of MTCT including postnatal transmissions and estimated the parameter of the exponential.

2.2 Imputation model

Let si, i = 1, …,N denote the age at which the ith infant born to an HIV–infected mother first has detectable HIV infection. We now present a new model for imputing individual times to detectable HIV infection. This step requires specifying a likelihood for the complete data. As stated in the Introduction, however, there are three distinctive features of the distribution of the timing of MTCT of HIV that we will consider in our modeling building and likelihood construction:

  1. Most transmissions that occur in utero can be detected immediately after birth. One way to approach this is to treat the time to detectable infection for these infants as left censored at zero; however, we are not really interested in estimating the timing of detectable infection before birth. Instead, without loss of generality, we will assume the time of first positive test for these infants is 0.
  2. All infants will be weaned at some point and will no longer be at risk; therefore, if they have not experienced MTCT before weaning, we do not expect that they will experience it all.
  3. The third feature is closely related to the second. Most studies, including HPTN 024 do not follow infants until the last infant is weaned, but instead follow them for 12–18 months. Because, in general, we do not observe events past the period of follow-up, we cannot expect to accurately impute event times past this time. Additionally, any analysis we might perform, even had we completely observed the outcome of interest according to the study design, would be limited to the period of follow-up, and it would be impossible to distinguish those infants who will never be infected from the few who are infected after the end of follow-up.

Based on our consideration of these features, we propose an imputation model that is a mixture of three distributions: a point mass at zero that reflects the proportion of infants with detectable infection at birth, a continuous distribution for those infections that are first detectable after birth and before the end of study time, t2, and a point mass at a time greater than t2 representing the proportion who experience the event after t2 or never experience the event. The third distribution results in an overall distribution similar to a cure rate mixture model (Berkson and Gage (1952); Farewell (1982); Kuk and Chen (1992) and others) without the medical concept of cure.

Specifically, we express the distribution of the ith infant’s time to detectable infection as


where δx(si) denotes a point mass at si = x, p1 and p2 are mixing proportions, Θ is the set of parameters that define f2 and Zi is a vector of covariates (including an intercept term) of length q that includes any covariates of interest for the final analysis. To facilitate estimation, we introduce a partially latent (auxiliary) variable, di, where


Note that di = 3 for those infants who would have their first positive test after t2. Now, we can rewrite (2.1) conditional on di as


where I(x) is an indicator function that takes on the value 1 if x is true, 0 otherwise. Here, di is a partially observed latent variable. Although we will not truncate f2 at t2, if di = 2 is sampled for an infant during the MCMC procedure, his/her Ri will be set to t2 while di = 2.

In order to completely specify the likelihood for imputation, we must specify a distribution for di. We take di to be a multinomial random variable and specify its mean vector as a function of the set of covariates, Zi, such that




where expit is the inverse-logit function and α and ω are sets of covariates linking Zi to di. The probability mass function for di is then


Next, we specify f2. For ease of computation, we restrict our options to parametric distributions. The Weibull distribution allows for a wide range of shapes for the hazard function given by


where a is the shape parameter of the Weibull distribution and β is a vector of parameters linking the ith infant’s covariate vector, Zi, to the hazard and exp(−β′ Zi) is the ith infant’s scale parameter. The hazard shown in (2.6) assumes a proportional hazards model.

A frailty model may define a common scale parameter within groups, thereby recognizing that some groups may inherently be at higher risk than other groups throughout follow-up. Instead, we explore the possibility that different groups may follow hazards with different shapes without specifying membership in the groups a priori. This approach is motivated by the hypothesis that infants’ underlying risks may follow different trajectories based on unobserved information. For example, mixed feeding (breastfeeding plus formula or other foods) may put infants at a higher risk of transmission (Coutsoudis et al., 2001), resulting in an underlying hazard that may remain constant or increase rapidly after birth; whereas, exclusively breastfed infants may be expected to have a hazard that decreases soon after birth then becomes roughly constant (Magoni et al., 2005). In HPTN 024, information about mixed feeding is not collected. Therefore, we allow the shape parameter to vary across infants to accommodate this potential variation in shape of the underlying hazards. We define γ to be a vector of length m so that f2 is a Weibull distribution, where


Defining pij = Pr(di|Zi,β,ω) = j, the marginal hazard can then be expressed as


and can take on a variety of shapes depending on the values of γ1, …, γm and π1, … πm.

Thus far, we have described the distribution of the true but unobserved time until an infant would first test positive for HIV, si. We do not actually observe si but instead observe (Li, Ri). The ith subject’s contribution to the likelihood is then


2.3 Estimation procedure

Before producing estimates of the parameters in Equation 2.9, we must first specify prior distributions. We specified the prior distributions of the intercepts for modeling di, ω1 and α1, to be non-informative but also to rule out values that would be extreme (as in very close to 0 or 1) on the probability scale. We selected the prior distributions to be proper and set the parameters as follows:

  • (πγ1, …, πγm) ~ Dirichlet(1m), where 1m is a vector of ones of length m,
  • βj ~ N (0, 1000), j = 1, …, q,
  • γj ~ N (0, 10)I(γj > 0), j = 1, …, m,
  • ω1 ~ N (0, 2), ωj ~ N (0, 1000), j = 2, …,q,
  • α1 ~ N (0, 2); and αj ~ N (0, 1000), j = 2, …, q.

We implement the multiple imputation scheme in BUGS (Spiegelhalter et al., 2005), using OpenBUGS (Thomas et al., 2006) and the BRugs package in R (R Development Core Team, 2006). After a burn-in of 10,000 iterations, we keep samples from the posterior from every 100th iteration for a total of 100 samples. We then fit the model of interest to each of these 100 data sets. We define g = 1,…, G, to index the G retained samples, θ to be the parameter or statistic of interest which is estimated by [theta w/ hat](g) at the gth iteration and [theta w/ hat] overall and v(g) to be the estimated variance of [theta w/ hat](g) and v to be the estimated variance of [theta w/ hat]. We then obtain final estimates by following Rubin’s rules for combining the estimates from multiple imputations to obtain point standard error estimates as follows. The final estimates, [theta w/ hat] and v are given by




Inference proceeds in the usual manner for the model under consideration, usually implementing a t-test or making asymptotic normal approximations.

2.4 Allowing for imperfect sensitivity

We propose an approach to estimating the distribution of the timing of MTCT as opposed to the timing of detection through modification of the imputation step.

The time from infection to detection, is a random variable, u = s − s*, where s* is the unobserved time of infection. The distribution function of u, e(u) also describes the time-varying sensitivity of the diagnostic assay which we expect to be 0 at the time of transmission monotonically increasing to 1 by some time relatively soon after infection. To obtain G augmented data sets of s*, we repeat the following steps for g = 1,…, G for each infant: If δi(g)<3, first sample u from e(u) then set si(g)=si(g)-u. For each augmented data set, we can compute the statistic of interest, then use the formulas given by Equations (2.10) and (2.11) to combine the estimates across the imputations for inference.

Gupte et al. (2007) estimated the mean of the exponential distribution for time between infection and detection to be approximately 9 days with a constraint that the sensitivity equal 0 at birth. For breastfeeding transmission, they estimated the mean to be 13 days; however, the standard error was large (277 days). Because we do not believe that the sensitivity equals 0 at birth, we instead choose a mean of 14 days. However, the simplicity of this approach allows that once the imputations of timing of first positive test are obtained, assessing the effects of different sensitivity profiles is quick and straight-forward. To estimate the distribution of and associations of covariates with the timing of breastfeeding transmission, we will subset the data to s* > 0 instead of 6 weeks.

3 Data Analysis

In this section, we apply the multiple imputation model for timing of MTCT of HIV to data collected in HIV Prevention Trials Network (HPTN) 024 (Taha et al., 2006). Although HIV testing was initially scheduled to occur at birth, 4–6 weeks and 3, 6, 9 and 12 months, the majority of 4–6 week visits occurred between 6 and 8 weeks, and the three month visit was dropped early in the study. Samples collected at 3, 6 and 9 months were only tested if the 12 month sample was positive or missing.

Infants born to HIV-infected mothers are only at risk for MTCT of HIV while breastfeeding. At one site, mothers were counseled to stop breastfeeding by the time their infants reached 6 months of age, and, by 6 months of age, over 90% of the infants at this site had been weaned. In contrast, over 90% of the infants at the 3 remaining sites were still breastfeeding at six months. This difference in the underlying hazard between the sites will be accounted for by performing a stratified proportional hazards analysis.

We performed the multiple imputation as described previously. The values for Li and Ri were discussed in general in the Methods section. Here, we discuss how they were set more specifically for HPTN 024. If the ith infant never had a negative test, we set Li = 0 and Ri equal to the time of the first positive test. Because the earliest detection time is birth, Li = 0 is as general as L = −∞ in implementation. If the first positive test occurred on the day of birth, we set Li = Ri = 0. For these infants, we know that di = 1 and si = 0. If the infant had both a negative and positive test before weaning, we set Li equal to the time of the last negative test and Ri equal to the time of the first positive test. If weaning occurred before the first positive test, we set Ri equal to the time of weaning plus 30 days (due to the sensitivity issue discussed previously). For subjects who have both a positive and negative test, di is known to be 2. For subject’s with only negative tests and no positive tests, we set Li equal to the time of the last negative test unless weaning occurred more than 30 days before the negative test. In that case, we set Li equal to the time of weaning plus 30 days. Additionally, because follow-up was limited to approximately one year and therefore there was no information past this point in terms of observed events, Ri was set to 400 days. This would not impact the final analysis where the imputed data was censored at one year.

The following auxiliary variables were used in the imputation procedure: maternal CD4 count, hemoglobin, viral load, weight and age at 32 weeks gestation; enrollment site; whether the mother took nevirapine; an indicator of whether the infant was delivered at the study clinic; whether the infant took nevirapine; the duration of ruptured membranes; and the infant’s birth weight and sex.

In each augmented data set, every infant has an imputed value for si. This si reflects the true time of detectable infection if other events, such as death or weaning, did not intervene. Also, because there was little information past one year in the original data set, we censor the infants’ times to event at one year in the final analyses. Because si is now on a continuous scale in the augmented data set, we can perform time to breastfeeding transmission by subsetting to those subjects whose si is greater than 6 weeks. In contrast, the observed analysis must define the subset of interest as those infants with a negative test after 4 weeks and not positive before 8 weeks, misclassifying those infants tests at 8 weeks who may have tested negative at 6 weeks and those infants who tested negative at 4 weeks may test positive by 6 weeks. Therefore, we expect some bias in the baseline number at risk. Additionally, when performing the observed data analysis, we assumed only right censoring and set the time to event for any infant with a positive test to be the midpoint between the last negative test and the first positive test. For the sensitivity adjusted analysis, we subsetted the data to those infants with an imputed time of infection, si, greater than 0. In the proportional hazards model, we studied the relationship between maternal CD4 and viral load, stratified by site.

Censoring can be complex in these studies due to the different causes: death, weaning and loss to follow-up; therefore, we propose examining different censoring rules in the analyses and in later simulations to determine which censoring approach produces the least biased and most efficient estimate of the survival distribution or association parameter of interest. If an infant dies, is lost to follow-up or reaches the end of the study without having a positive HIV test, his/her time to event is censored at the time of the last negative test. If infant is weaned, there are three censoring options:

  • C1
    An infant’s event time is censored at his last negative test. This is a common approach that does not require information on weaning.
  • C2
    An infant’s event time is censored at the end of follow-up if there is a negative test after weaning in the observed data. This censoring approach reflects that these infants are no longer at risk after weaning and should produce an estimate of distribution of time to first positive test in the population under study.
  • C3
    In the observed data analysis, if an infant has a negative test after weaning, his event time is censored at the time of weaning. Otherwise, it is censored at the time of his last negative test. In the imputed data, an infant is censored at the time of weaning if he has not already experienced the event. This approach estimates the late postnatal time to first positive distribution as if no weaning occurred.

Scenarios C1 and C2 result in the same censoring scenario for the MI analysis, no censoring except at the end of the follow-up time. However, MI results under these scenarios will be presented as coming from Scenario C2. Under a frequent testing schedule, there should be little difference between Scenarios C1 and C3.

Overall for the observed analysis, of the 1977 potential infants, 1317 tested negative at or after the 4–8 week visit and were still breastfeeding at 8 weeks. Infants were excluded because they were known to be positive by 8 weeks (N=298), were weaned before 8 weeks and therefore not at risk (N=70), had unknown infection status at 4–8 weeks due to missing 4–8 week test and later positive test (N=22), or had no test results after the 4–8 week visit (N=270). Analyses on the observed data were carried out under all three censoring scenarios (C1–C3). Analyses on the augmented data sets were carried out under censoring scenarios C2 and C3.

Figure 1 plots estimates of the cumulative rates of MTCT of HIV based on the Kaplan-Meier analysis of the observed data, the MI (m=2) analysis of time to detection and the MI (m=2) analysis of timing of infection. The observed data analysis starts at 0 at birth and does not reach the MI estimate of cumulative detection rate at birth until approximately 1 week. The Kaplan-Meier analysis also estimates a higher cumulative infection rate at one year than either of the MI approaches. The sensitivity adjusted analysis estimates a slightly higher in utero/delivery transmission rate than the unadjusted MI analysis; however, they both converge to approximately the same estimate by about 6 months. The MI curves reflect a hazard that begins to increase soon after birth and then levels off again around 2 months. This has also been observed in clinical trials (Taha et al., 2007).

Figure 1
Cumulative rates of MTCT of HIV (solid black) and detection of MTCT of HIV. The dashed lines represents the time to detection based on MI with m=2. The step curve represents time to detection based on the observed data with cross hairs indicating censoring ...

Table 1 shows results from the Kaplan-Meier (KM) and proportional hazards (PH) analyses on both the observed and imputed data. For the observed data, C1 and C2 produce similar results. If all infants were tested at the end of the study, we expect these results to be identical because all weaned infants who did not experience MTCT of HIV would be censored after the end of follow-up. For the observed data, censoring scenario C3 resulted in higher KM estimates of the cumulative infection rates than C1 or C2. Because C3 treats weaned infants as if they were still at risk at the time of the weaning and therefore assumes that some would experience the event, we would expect the proportion to be higher. The same differences between C2 and C3 are seen in the multiple imputation analysis. Because censoring under C2 (not at risk after weaning) is usually of interest, we will focus the comparison between the observed and MI analyses under censoring scenario C2. Many infants who tested negative at 4–8 weeks did not have another test result available until 12 months. If that test result was positive, the time of the first positive test was imputed to be approximately 7 months in the observed analysis; therefore, we expect the observed data analyses to underestimate the transmission rate at earlier times. The results indicate that the MI may be correcting this, producing higher estimates at 3 and 6 months than the observed analysis. MI produces lower estimates of transmission rates at 9 and 12 months, though. The simulations summarized in the next section show that we expect the observed analysis to overestimate the transmission rate at 12 months. Also, the MI analyses include the 292 infants whose HIV infection status is indeterminate at 4–8 weeks due to missing tests. Potentially, these infants were less likely to have experienced MTCT, thus increasing the number at risk disproportionately to the number of events. The MI results adjusted for imperfect sensitivity are higher at all time points, reflecting that the 6 week cut point for breastfeeding transmission misclassifies approximately 1/4–1/3 of the breastfeeding transmissions as in utero/delivery. The MI results do not vary substantially over m.

Table 1
Results from breastfeeding transmission analyses (C1=censored at last negative; C2=censored after end of follow-up if weaned before last negative; C3=censored at time of weaning).

To better understand the variability between imputations and how the estimates from the augmented data sets compare to the observed analysis, we plotted the KM estimates of the cumulative detection rate curves for each augmented data set (m = 2) and for the observed data (Figure 2). There is variability between the estimates from the augmented data sets. At most points of interest the estimates are all contained within an interval of width approximately equal to 0.02. Before five months, the observed data analysis estimates of the survival curve are higher than all the estimates from the augmented data sets. From 5 to 8 months, the observed curve crosses all the augmented data set estimates. After 8 months, the observed curve is at the lower end of the augmented data estimates.

Figure 2
Curves of the cumulative proportion of late postnatal infections for the observed data analysis (black) and each of the augmented data sets with time to first positive test (red) and timing of infection adjusted for imperfect sensitivity (grey). The vertical ...

Table 1 also lists the results from the proportional hazards regression models fit to the observed and MI data. The estimate of association was higher in the observed analyses than in the MI analyses. Additionally, the standard errors were lower for viral load and higher for CD4 count in the MI analyses. The MI results varied little over m or the censoring scenario. However, the observed analyses results varied more over censoring scenarios (C3 vs. C2 or C1), suggesting some interplay between timing of weaning and CD4 count and viral load.

4 Simulations

In this section, we describe simulations designed to assess the multiple imputation procedure and compare it to traditional analyses on the observed data. Additionally, we explored the three censoring approaches for weaning (C1–C3). We simulated di and si subject to the effects of 4 covariates, 2 binary (X1 ~ bernoulli(.5), X2 ~ bernoulli(.25)) and 2 continuous (X3 ~ uniform(−1, 1), X4 ~ N (0, 1), with m = 2, γ = (7, 0.9), πγ1 = 0.26, α = (−2.7, −1.0, 1.0, 0.5, −1.0), ω = (−0.5, −2.0, 2.0, 0.5, −1.0) and β = (−1.6, −1.0, 1.0, −0.5, 0.5).

We then simulated visits according to visit schedule in HPTN 024 (birth, 4–8 weeks and 3, 6, 9 and 12 months) and according to the proportions seen in HPTN 024. Details are given in the Appendix.

Next, we simulated times of death and weaning according to the distributions seen in HPTN 024 under the following three scenarios:

  • S1
    Non-informative loss to follow-up and death and no weaning
  • S2
    Non-informative loss to follow-up, death related to one of the covariates (X3) in the same manner as CD4 count is related to infant death in HPTN 024 and no weaning.
  • S3
    Non-informative loss to follow-up, death and weaning.
  • S4
    Non-informative loss to follow-up and death. Time to weaning is related to X2 similarly to the relationship between the HPTN 024 site with early weaning and time to weaning. Therefore, X2 = 1 is associated with early weaning and increased risk of transmission.

Under each scenario, we simulated 100 data sets with 1000 observations each and fit the complete data analysis (no censoring except for death) which was used as the gold standard for comparing the observed data analysis and the MI analysis with m = 1, 2, 3 under S1 and S2. For S3 and S4, we compared the results to two gold standards designed to represent the best estimates possible if we had observed timing of detectable infection perfectly. The first gold standard (G1) censors infants at death. The second gold standard (G2) censors infants at death and weaning and estimates transmission rates and associations assuming there was no weaning. C2 is designed to estimate the first gold standard, and C3 is designed to estimate the second gold standard. C1 mimics what is usually done in practice.

We compared the results in terms of their bias compared to the gold standard, the variance ratio (variance of the estimate of the analysis divided by the variance of the gold standard analysis) and the coverage rates (frequency that the confidence interval contained the gold standard estimate).

The results for the simulations under S1 and S2 are shown in Table 2. The MI analyses performed the same or better in terms of bias under both scenarios for both estimators. The MI analyses had lower variance estimates than the observed analyses for the estimate of cumulative transmission, but the opposite was true for the estimate of the hazard ratio. The coverage rates for the MI analyses were the same or better than the observed analyses under both scenarios.

Table 2
Simulation results when there is assumed to be no weaning during follow-up.

The results for the simulations under S3 are shown in Table 3. First, we examine estimates for the late postnatal transmission rate at 12 months defined as the cumulative infection rate among those uninfected at 6 weeks. The bias for the observed analyses was relatively high compared to the truth (0.1182) and twice that of most of the MI analyses. Additionally, the MI analyses were more efficient under all scenarios. The coverage rates for the MI analyses were better for G1; however, the coverage rates for the observed analyses were better for G2. Turning our attention to the PH analyses, the observed and MI analyses performed similarly for bias under G1 and G2. The lowest bias was the observed analysis under C3 for G1. In all cases, the observed analyses were more efficient than the MI analyses. Both the observed and MI analyses had similar coverage rates.

Table 3
Simulation results when weaning is observed during follow-up.

The results for the simulations under S4 are shown in Table 4. Under G1, MI performed better in terms of bias for both the KM and PH estimates. Under G2, the observed analysis performed better. Additionally, the observed analysis produced less biased estimates of G2 and G1, even under censoring scenarios designed to estimate G1. The MI KM analyses were more efficient than the observed analyses for G1 and G2. The opposite was true for the PH estimates. Recalling that C3 is designed to estimate G2, the low coverage rates for MI under G2 using C2 is not alarming; however, the coverage rates are higher than desirable under C3. The estimates of the association between CD4 and VL and time of transmission are not as strong as their association with time of infection; although, they are very similar.

Table 4
Simulation results when weaning related to a covariate is observed during follow-up.

5 Discussion

We present an approach to imputing the timing of MTCT of HIV. Given the augmented data sets produced by the multiple imputation procedure, we can now perform many analyses that would not be possible otherwise without excluding large portions of the data set. Here, we showed an example estimating the cumulative late postnatal transmission rate at 12 months and the effect of covariates on the hazard of late postnatal transmission. Additional analyses are now also attainable. For example, investigators are also interested in estimating the distribution of timing of MTCT among those infants who experience MTCT for use in planning HIV testing schedules. Potential analyses are not limited to late postnatal transmission. For example, we may want to assess how baseline covariates predict transmission during the three exposure periods. The MI approach allows us to include those infants whose timing could not previously be precisely categorized. This approach is flexible and can easily be implemented with OpenBugs and R.

The MI approach was validated in simulations and shown to be less biased in most situations than the traditional estimator. In the presence of weaning, the traditional estimator proved to be a better estimator of the distribution of MTCT ignoring weaning which is seldom of interest.

Our goal here was to find a flexible MI model that could easily be implemented in available software. Although, the MI model we propose is flexible and mirrors the modes of MTCT of HIV, it could be improved in several ways. Here, we do not directly account for HIV-infected infants being at higher risk of death. In reality, it is likely that an infant who had a negative test long before death actually acquired HIV. Additionally, a mother’s decision to wean may also be related to the health status of her infant. To reflect these issues, we could consider modeling the relationship between the risk of death, time to weaning and the risk of MTCT more directly using competing risk models. We chose mixtures of Weibull models for flexibility in the distribution of time to detectable infection after birth. Instead, we could explore other flexible baseline hazards; however, these models would likely require customized software and would not be easily fit.

From a public health perspective, understanding the timing of detectable infection is sometimes more important than knowing the true timing of transmission, i.e., when determining an HIV testing schedule for infants. However, knowing the timing of transmission can assist in designing interventions which must occur before transmission and not before the time of first positive test. To address this issue, we presented a method to adjust for the imperfect sensitivity in the imputation step. Because this method is straight-forward and quick to implement once the imputations of time to first positive test are generated, hypotheses can be examined under a variety of time-varying sensitivity profiles. Although we did not implement it here, the distribution of the time from infection to detection could depend on covariates.

In summary, we present a comprehensive approach to the analysis of data collected in studies of MTCT of HIV that can be easily implemented in practice. We also illustrate the usefulness of OpenBUGS and R for multiple imputation based on flexible imputation models.


Simulating visit attendance

We sampled attendance at first and second visits with probabilities equal to 0.94 and 0.85 respectively. If the infant had a negative test at either birth or 4–8 weeks, then he had a test result at 12 months with probability equal to 0.80. If the infant missed both the birth and 4–8 week visit, he had a test result from the one year visit with probability equal to 0.20. If the infant missed the 12 month visit, he had a test result from the 9 month visit with probability equal to 0.50. If subject tested positive at the 12 month visit, he had a sample from the 9 month visit with probability equal to 0.80. If the infant missed the 9 and 12 month visits, he had a test result from the 6 month visit with probability equal to 0.60. If subject was positive at the 9 month visit, he had a test result from the 6 month visit with probability equal to 0.80. If the infant missed the 6, 9 or 12 month visit, he had a test result from the 3 month visit with probability equal to 0.20. If the infant had a positive test result from the 6 month visit, he had a test result from 3 months with probability equal to 0.40. At each visit, his visit time was simulated according to the observed distribution of visits in HPTN 024.

Contributor Information

Elizabeth R. Brown, Department of Biostatistics, University of Washington, Seattle, WA 98195, USA.

Ying Qing Chen, Vaccine and Infectious Diseases Institute, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.


  • Balasubramanian R, Lagakos SW. Estimation of the timing of perinatal transmission of HIV. Biometrics. 2001;57:1048–1058. [PubMed]
  • Balasubramanian R, Lagakos SW. Estimation of a failure time distribution based on imperfect diagnostic tests. Biometrika. 2003;90:171–182.
  • Bebchuk J, Betensky R. Multiple imputation for simple estimation of the hazard function based on interval censored data. Statistics in Medicine. 2000;19:405–419. [PubMed]
  • Berkson J, Gage RP. Survival curve for cancer patients following treatment. Journal of the American Statistical Association. 1952;47:501–515.
  • Coutsoudis A, Pillay K, Kuhn L, Spooner E, Tsai WY, Coovadia HM. Method of feeding and transmission of hiv-1 from mothers to children by 15 months of age: prospective cohort study from durban, south africa. AIDS. 2001;15(3):379–387. [PubMed]
  • Cunningham C, Charbonneau T, Song K, et al. Comparison of human immunodeficiency virus 1 DNA polymerase ch ain reaction and qualitative and quantitative RNA polymerase chain reaction in human immunodeficiency virus 1-exposed infants. Pediatric Infectious Disease Journal. 1999;18:30–35. [PubMed]
  • Dunn D, Simonds R, Bulterys M, et al. Interventions to prevent vertical transmission of HIV-1: effect on viral detection rate in early infant samples. AIDS. 2000;14:1421–1428. [PubMed]
  • Farewell VT. The use of mixture models for the analysis of survival data with long-term survivors. Biometrics. 1982;38:1041–1046. [PubMed]
  • Finkelstein DM. A proportional hazards model for interval-censored failure time data. Biometrics. 1986;42(4):845–854. [PubMed]
  • Glynn RJ, Rosner B. Multiple imputation to estimate the association between eyes in disease progression with interval-censored data. Statistics in Medicine. 2004;23(21):3307–3318. [PubMed]
  • Goggins WB, Finkelstein DM, Schoenfeld DA, Zaslavsk y AM. A markov chain monte carlo em algorithm for analyzing interval-censored data under the cox proportional hazards model. Biometrics. 1998;54(4):1498–1507. [PubMed]
  • Gupte N, Brookmeyer R, Bollinger R, Gray G. Modeling maternal-infant HIV transmission in the presence of breastfeeding with an imperfect test. Biometrics. 2007;63:1189–1197. [PubMed]
  • Hsu CH, Taylor JMG, Murray S, Commenges D. Multiple imputation for interval censored data with auxiliary variables. Statistics in Medicine. 2007;26(4):769–781. [PubMed]
  • Iliff PJ, Piwoz EG, Tavengwa NV, Zunguza CD, Marinda ET, Nathoo KJ, Moulton LH, Ward BJ, Humphrey JH. Early exclusive breastfeeding reduces the risk of postnatal HIV-1 transmission and increases HIV-free survival. AIDS. 2005;19(7):699–708. [PubMed]
  • Kourtis AP, Lee FK, Abrams EJ, Jamieson DJ, Bulterys M. Mother-to-child transmission of hiv-1: timing and implications for prevention. Lancet Infectious Diseases. 2006;6(11):726–732. [PubMed]
  • Kuk AYC, Chen CH. A mixture model combining logistic-regression with proportional hazards regression. Biometrika. 1992;79(3):531–541.
  • Luzuriaga K, Newell ML, Dabis F, Excler JL, Sullivan JL. Vaccines to prevent transmission of hiv-1 via breastmilk: scientific and logistical priorities. Lancet. 2006;368(9534):511–521. [PubMed]
  • Magoni M, Bassani L, Okong P, Kituuka P, Germinario EP, Giuliano M, Vella S, Xo Mode of infant feeding and hiv infection in children in a program for prevention of mother-to-child transmission in uganda. AIDS. 2005;19(4):433–437. [PubMed]
  • Pan W. A multiple imputation approach to Cox regression with interval -censored data. Biometrics. 2000;56:199–203. [PubMed]
  • Piwoz EG, Humphrey JH, Marinda ET, Mutasa K, Moulton LH, Iliff PJ. Effects of infant sex on mother-to-child transmission of HIV-1 according to timing of infection in zimbabwe. Aids. 2006;20(15):1981–1984. [PubMed]
  • R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; Vienna, Austria: 2006.
  • Rubin D. Multiple imputation after 18+ years. Journal of the American Statistical Association. 1996;91:473–489.
  • Saba J, Haverkamp G, Gray G, McIntyre J, Mmiro F, Ndugwa C, Coovadia HM, Moodley J, Kilewo C, Massawe A, Kituuka P, Okong P, von Briesen H, Goudsmit J, Biberfeld G, Grulich A, Weverling GJ, Lange JMA. Efficacy of three short-course regimens of zidovudine and lamivudine in preventing early and late transmission of HIV-1 from mother to child in Tanzania, South Africa, and Uganda (Petra study): a randomised, double-blind, placebo-controlled trial. Lancet. 2002;359(9313):1178–1186. [PubMed]
  • Satten GA. Rank-based inference in the proportional hazards model for interval censored data. Biometrika. 1996;83(2):355–370.
  • Sen B, Banerjee M. A pseudolikelihood method for analyzing interval censored data. Biometrika. 2007;94(1):71–86.
  • Simonds R, Brown T, Thea D, et al. Sensitivity and specificity of a qualitative RNA detection assay to diagnose HIV infection in young infants. AIDS. 1998;12:1545–1549. [PubMed]
  • Spiegelhalter D, Thomas A, Best N, Lunn D. WinBUGS: User Manual, Version 2.10. Medical Research Council Biostatistics Unit; Cambridge: 2005.
  • Taha TE, Brown ER, Hoffman IF, Fawzi W, Read JS, Sinkala M, Martinson FEA, Kafulafula G, Msamanga G, Emel L, Adeniyi-Jones S, Goldenberg R. the HPTN024 Team. A phase III clinical trial of antiobiotics to reduce chorioamn ionitis-related perinatal HIV-1 transmission. AIDS. 2006;20:1313–1321. [PubMed]
  • Taha TE, Hoover DR, Kumwenda NI, Fiscus SA, Kafulafula G, Nkhoma C, Chen S, Piwowar E, Broadhead RL, Jackson JB, Miotti PG. Late postnatal transmission of HIV-1 and associated factors. JOURNAL OF INFECTIOUS DISEASES. 2007;196(1):10–14. [PubMed]
  • The Breastfeeding and HIV International Transmission Study Group. Late postnatal transmission of HIV-1 in breast-fed children: An individual patient data meta-analysis. Journal of Infectious Diseases. 2004;189(12):2154–2166. [PubMed]
  • Thomas A, Hara BO, Ligges U, Sturtz S. Making BUGS open. R News. 2006;6:12–17.
  • Turnbull BW. Empirical distribution function with arbitrarily grouped, censored and tr uncated data. Journal of the Royal Statistical Society Series B-Methodological. 1976;38(3):290–295.
  • Wei G, Tanner M. Applications of multiple imputation to the analysis of censored regression data. Biometrics. 1991;47:1297–1309. [PubMed]
  • Young N, Shaffer N, Chaowanachan T, Chotpitayasunondh T, Vanparapar N, Mock P, Waranawat N, Chokephaibulkit K, Chuachoowong R, Wasinrapee P, Mastro T, Simonds R. Early diagnosis of HIV-1-infected infants in Thailand using RNA and DNA PCR assays sensitive to non-B subtypes. Journal of Acquired Immune Deficiency Syndromes. 2000;24:401–407. [PubMed]
  • Zhang P, Lagakos SW. Analysis of time to a silent event whose occurrence is monitored with error, with application to mother-to-child HIV transmission. Statistics in Medicine. 2007 doi: 10.1002/sim.3125. [PubMed] [Cross Ref]