Home | About | Journals | Submit | Contact Us | Français |

**|**Int J Biostat**|**PMC3404555

Formats

Article sections

- Abstract
- Introduction
- Bayesian inference on joint models with skew-normal distributions
- Analysis of AIDS clinical data
- Discussion
- References

Authors

Related links

Int J Biostat. 2011 January 1; 7(1): Article 8.

Published online 2011 January 6. doi: 10.2202/1557-4679.1292

PMCID: PMC3404555

Yangxin Huang, University of South Florida;

Copyright © 2011 The Berkeley Electronic Press. All rights reserved

In recent years, various mixed-effects models have been suggested for estimating viral decay rates in HIV dynamic models for complex longitudinal data. Among those models are linear mixed-effects (LME), nonlinear mixed-effects (NLME), and semiparametric nonlinear mixed-effects (SNLME) models. However, a critical question is whether these models produce coherent estimates of viral decay rates, and if not, which model is appropriate and should be used in practice. In addition, one often assumes that a model random error is normally distributed, but the normality assumption may be unrealistic, particularly if the data exhibit skewness. Moreover, some covariates such as CD4 cell count may be often measured with substantial errors. This paper addresses these issues simultaneously by jointly modeling the response variable with skewness and a covariate process with measurement errors using a Bayesian approach to investigate how estimated parameters are changed or different under these three models. A real data set from an AIDS clinical trial study was used to illustrate the proposed models and methods. It was found that there was a significant incongruity in the estimated decay rates in viral loads based on the three mixed-effects models, suggesting that the decay rates estimated by using Bayesian LME or NLME joint models should be interpreted differently from those estimated by using Bayesian SNLME joint models. The findings also suggest that the Bayesian SNLME joint model is preferred to other models because an arbitrary data truncation is not necessary; and it is also shown that the models with a skew-normal distribution and/or measurement errors in covariate may achieve reliable results when the data exhibit skewness.

Modeling of HIV dynamics in AIDS research has greatly improved our understanding of the pathogenesis of HIV-1 infection and guided for the treatment of AIDS patients and evaluation of antiretroviral (ARV) therapies (Perelson, 1997; Wu and Ding, 1999). Such models often incorporate repeated measures over a period of treatment to assess rates of changes in viral load (number of HIV RNA copies in plasma). Recent research indicates that the first-phase decay rate of viral response to treatment is a potentially useful biomarker for ARV potency (Ding and Wu, 1999). Even though various statistical modeling and analysis methods have been applied for estimating the parameters of HIV dynamics such as linear and nonlinear regression (Perelson, 1997), linear mixed-effects (LME) and nonlinear mixed-effects (NLME) modeling approach (Wu and Ding, 1999; Wu, et al., 2004; Wu, 2004), nonparametric NLME modeling approach (Liu and Wu, 2007; Liang et al., 2003; Wu and Zhang, 2002; Wu, et al., 2004), joint model approach via Monte Carlo EM algorithm (Liu and Wu, 2007; Wu, 2002; Wu, 2004) and Bayesian NLME modeling approach via Markov Chain Monte Carlo (MCMC) procedure (Huang, 2006; Huang and Dagne, 2010), it is not clear which model should be used to estimate the first-phase decay rate. More importantly, in all of these models at least one of the following three important issues standout.

Firstly, the common assumption of distributions for (within-subject) random error is normal in those statistical models. This assumption may lack the robustness against departures from normality and/or outliers as discussed by Verbeke and Lesaffre (1996), and may also lead to misleading results (Verbeke and Lesaffre, 1996; Ghosh et al., 2007). Very often in AIDS studies, the virologic responses exhibit skewness. For example, Figure 1 displays the histograms of repeated viral load (in ln scale) and CD4 cell count measurements for 44 subjects enrolled in the AIDS clinical trial study–A5055 (Acosta et al., 2004). It seems that for these data sets to be analyzed in this paper, the viral load responses are highly skewed even after ln-transformation. One way to incorporate skewness is to use a skew-normal (SN) distribution for (within-subject) random errors. Secondly, the mixed-effects models have been used in the previous studies to account for both between-subject and within-subject variations in viral load measurements which are associated with covariates including CD4 cell count. However, the covariates such as CD4 cell count which were considered in those studies are often measured with substantial errors and highly skewed as shown in Figure 1(bottom panel). Thirdly, a major challenge with these modeling approaches is the associated intensive computation burden in the inference, and in some cases it can even be computationally infeasible. Particularly, for nonlinear longitudinal models in the presence of measurement errors in covariates, the computational problem becomes much worse. In addition, there may exist the problem of non-convergence of these algorithms under the framework of likelihood estimation. To the best of our knowledge, there is little literature on simultaneously addressing measurement errors in covariates and a SN distribution for random errors to compare performance of the various mixed-effects models under the framework of Bayesian mixed-effects modeling approach. This article provides a unified approach to investigate SN Bayesian mixed-effects models with covariate measurement errors. It is noted that the random-effects could assume to have SN distribution. Huang and Dagne (2010) have studied NLME models where both model error and random-effects were assumed to have skew-normal distribution. It was found, in comparison of random-effects with normal and random-effects with SN, that the modeling results were very similar and not significantly different. Along with this finding, the random-effects are assumed to have a normal distribution in the mixed-effects models considered in this study.

The histograms of viral load (in ln scale) and standardized CD4 cell count measured from day 0 to day 35 [Set (a): data cut at truncation day 35], day 84 [Set (b): data cut at truncation day 84] and the end of study period [Set (c): complete data] for **...**

In this paper, the main focus is to provide a comprehensive comparison of three mixed-effects models (LME, NLME and semiparametric NLME) with a SN distribution and measurement errors in covariates for estimated viral decay rates in viral dynamic models. We consider a multivariate SN distribution introduced by Sahu et al.(2003) which is suitable for a Bayesian inference since it is built using conditional method and is defined in Appendix A. The rest of the paper is organized as follows. Section 2 presents a general modeling approach for SN Bayesian semiparametric nonlinear mixed-effects (SN-BSNLME) joint models which include three specific models as special cases to be discussed in Section 3. We describe data from an AIDS clinical study that motivated this research, discuss the specific models for HIV dynamics and reports the results obtained by using the three different methods or models. Finally, the paper concludes with some discussions in Section 4.

This section will briefly discuss measurement error joint models with a skew-normal distribution. Various covariate models were investigated in the literature (Carroll et al., 2006; Higgins et al., 1997; Liu and Wu, 2007; Wu, 2002). However, the fundamental assumption of distributions for measurement random errors is normal in these statistical covariate models and this assumption may lack the robustness against departures from normality and/or may violate the agreement with observed data. In this paper, we extend the covariate models to have a skew-normal distribution for measurement errors. For simplicity, we consider a single time-varying covariate. Let *z _{ik}* be the observed covariate value for individual

(1)

where *z _{i}* = (

In this section, we present the joint models and methods in general forms, illustrating that our methods may be applicable in other applications. Denote the number of subjects by *n* and the number of measurements on the *i*th subject by *n _{i}*. For the response process, we consider a general semiparametric NLME (SNLME) model incorporating possibly mismeasured time-varying covariates and model random errors with a skew-normal distribution.

(2)

where *y _{i}* = (

Semiparametric NLME model (2) reverts to a parametric NLME model when the nonparametric parts *w*(*t*) and *h _{i}*(

(3)

where *μ _{p}* and

(4)

where *β* = (*β*^{†}* ^{T}*,
)

(5)

where *s*_{4} = *s*_{3} + *q*, *g _{i}*(

Under Bayesian framework, we still need to specify prior distributions for unknown parameters in the models (1) and (5) as follows.

(6)

where *s*_{5} = *s*_{2} + *p*, the mutually independent Inverse Gamma (*IG*), Normal (*N*) and Inverse Wishart (*IW*) prior distributions are chosen to facilitate computations (Davidian and Giltinan, 1995). The super-parameter matrices ∧_{1}, ∧_{2}, Ω_{1}, Ω_{2}, Γ_{1} and Γ_{2} can be assumed to be diagonal for convenient implementation.

We assume that *e _{i}*,

(7)

where *I*(*w _{k}* > 0) is an indicator function and

Let *θ* = {*α*, *β*, *τ*^{2}, *σ*^{2}, Σ* _{a}*, Σ

(8)

In general, the integrals in (8) are of high dimension and do not have closed form. Analytic approximations to the integrals may not be sufficient accurate. Therefore, it is prohibitive to directly calculate the posterior distribution of *θ* based on the observed data. As an alternative, MCMC procedures can be used to sample based on (8) using the Gibbs sampler along with the Metropolis-Hasting (M-H) algorithm. The above representations based on the models are useful as it allows to implement easily using the WinBUGS codes (Lunn et al., 2000).

We illustrate our methods using a real AIDS clinical data. The study consists of 44 HIV-infected patients who were treated with a potent ARV regimen. Viral load, CD4 count and other variables were repeatedly measured over a period of 24 weeks. RNA viral load was measured in copies/mL at study days 0, 7, 14, 28, 56, 84, 112, 140 and 168 of follow-up. The nucleic acid sequence-based amplification assay (NASBA) was used to measure plasma HIV-1 RNA, with a lower limit of quantification, 50 copies/mL. the HIV-1 RNA measures below this limit are not considered reliable, therefore we simply imputed such values as 25 copies/mL (Acosta et al., 2004; Davidian and Giltinan, 1995). Covariates such as CD4 count including associated baseline data were also measured throughout the study on similar schemes. Figure 2 shows the measurements of viral load in natural ln scale and CD4 count for the three randomly selected patients. Both viral load and CD4 cell count trajectories exhibit distinctive and important patterns throughout the time course. We can see that the rate change in viral load appears to vary substantially across patients, reflecting both biological variation and systematic associations with subject-level covariates.

Profiles of viral load (response) in ln scale and CD4 cell count (covariate) for three randomly selected patients. The horizontal line is below the detectable level of viral load (3.91=ln(50)) and the two vertical lines represent truncation days 35 and **...**

As is evident from Figure 3(c), RNA levels varied widely during the early treatment stage. For some patients, the RNA level decreased rapidly with treatment (described as the first-phase decay rate); for others it decreased slowly. Studying the relation between baseline RNA and the first-phase decay rate will provide useful information for clinical decision-making and treatment individualization. The first-phase decay rate indicates the potency of ARV therapies (Ding and Wu, 1999). If we know the potency of a treatment at an earlier stage, we may be able to avoid the less potent regimens for a particular patient. This will help clinicians to select a treatment for their patients.

Profiles of viral load in ln scale from an AIDS clinical trial study. Change in HIV-1 load, measured from RNA levels in plasma, with time during treatment from day 0 to (a) day 35, (b) day 84 and (c) the end of study period.

Viral dynamic models can be formulated through a system of ordinary differential equations (ODE) (Huang et al., 2006; Wu et al., 1998; Wu and Ding, 1999). In practice, some investigators have used a LME model simplified from an ODE system to fit viral load data from the very early time period such as displayed in Figure 3(a). Their response model can be described by the following linear model

(9)

where *y*(*t*) is the natural logarithm of the number of HIV-1 RNA copies per mL of plasma, *e*(*t*) is the measurement error, *β*_{2} is the first-phase viral decay rate which may represent the minimum turnover rate of productively infected cells (Perelson et al., 1997) and *β*_{1} is macro-parameter with exp(*β*_{1}) being the baseline viral load at time *t* = 0.

Based on biological and clinical arguments, an effective model used to estimate viral dynamic parameters is the biphasic model approximated from a compartmental analysis-based an ODE system (Perelson et al., 1997; Wu and Ding, 1999). This model plays an important role in modeling HIV dynamics and has demonstrated promise in studying HIV response process. The model is described as follows.

(10)

where *β*_{4} is the second-phase viral decay rate which may represent the minimum turnover rate of latently or long-lived infected cells (Perelson et al., 1997). It is generally assumed that *β*_{2} > *β*_{4}, which assures that the model is identifiable and is appropriate for empirical studies. It is of particular interest to estimate the viral decay rates *β*_{2} and *β*_{4} because they quantify the antiviral effect and hence can be used to assess the efficacy of the antiviral treatments (Ding and Wu, 1999). In estimating these decay rates, only the early segment of the viral load trajectory data (for example, data shown in Figure 3(b)) can be used (Perelson et al., 1997; Wu and Ding, 1999), because the viral load trajectory may have a different shape in later stages (see Figure 3(c)).

Although the models (9) and (10) are widely used in HIV dynamic studies and have shown promise, there are some concerns. For example, when different models give different conclusions, how do we know which is correct and should be used? In our analysis of the clinical data shown above, the models (9) and (10) produce incongruous estimates of viral decay rates of *β*_{2} and provide conflicting results on their correlations with baseline viral load. In addition, both models are applied to the early segment of the viral load data. That means one has to cut the data at some arbitrary time point. It is not obvious what time point should be the cut-point or whether we should use different cut-points. To avoid these problems, a semiparametric biexponential model was proposed as follows (Wu and Zhang, 2002).

(11)

where the second-phase decay rate *β*_{4}(*t*) is a smooth unknown function. Intuitively, model (11) is more reasonable because it assumes that the viral decay rate in the second-phase can vary with time as a result of drug resistance, pharmacokinetics, medication adherence and other relevant clinical factors likely to affect changes in the viral load during the late stage of treatment. Therefore, all data obtained during ARV treatment can be used by fitting model (11). We also assume that *β*_{2} > *β*_{4}(*t*) for all time in order to guarantee that there is the first phase of curve decay. This is a semiparametric model because of the mechanistic structure (two-exponential) with constant parameters (*β*_{1}, *β*_{2}, *β*_{3}) and a time-varying parameter (*β*_{4}(*t*)) to capture the time-varying effects of the treatment and over a longer period. Actually, by including long-term viral load data, the estimate of *β*_{2} may be more accurate and reasonable compared with those obtained in previous studies (Perelson et al., 1997; Wu and Ding, 1999) after excluding long-term viral load data for modeling and analysis by some *ad hoc* rules (that is, the screening and inclusion of viral load data are quite arbitrary). In the mean time, this model enjoys the flexibility of a semiparametric function for the second-phase decay rate *β*_{4}(*t*). The estimate of *β*_{4}(*t*) provides not only an approximate turnover rate over time of long-lived/latently infected cells at the early stage of treatment as the standard parametric model does, but also more importantly describes how it may change over a long treatment period as driven by, presumably, drug resistance, non-compliance and other clinical determinants. Most importantly, the semiparametric model is capable of modeling long-term viral load data of which the trajectory may vary substantially among different patients (Wu and Zhang, 2002). It is noted that the three different models are applied to different segments of the viral dynamic data. Therefore, the standard goodness-of-fit or model selection methods cannot be used to identify the appropriate model.

To characterize skewness appeared often in viral loads and CD4 cell counts, we will develop SN Bayesian mixed-effects joint models in conjunction with the three dynamic response models and LME covariate model. To model the covariate CD4 process, we consider empirical polynomial LME models and choose the best model (1) with quadratic (*l* = 3) based on AIC/BIC model selection criteria as suggested by Liu and Wu (2007) and Wu (2002). We thus adapted the quadratic polynomial as the SN covariate model (1) with
for the CD4 trajectory as follows.

(12)

where
, *α* = (*α*_{1}, *α*_{2}, *α*_{3})* ^{T}* is population (fixed-effects) parameter vector, and

We employ the linear combinations of natural cubic splines with percentile-based knots to approximate the nonparametric functions *w*(*t*) and *h _{i}*(

(13)

where *μ*_{3} = (*μ*_{0}, *μ*_{1}, *μ*_{2})* ^{T}* and

**Model I:** SN Bayesian semiparametric nonlinear mixed-effects (SN-BSNLME) joint model (5) based on the semiparametric biexponential model (11) in conjunction with the SN covariate model (12) along with specified prior distributions (6) can be expressed as

(14)

where
, *b _{i}* = (

**Model II:** SN Bayesian Nonlinear Mixed-Effects (SN-BNLME) joint model based on the biexponential model (10) in conjunction with the SN covariate model (12) can be expressed as

(15)

where *β* = (*β*_{1}, *β*_{2}, *β*_{3}, *β*_{4}, *β*_{5})* ^{T}* and

**Model III:** SN Bayesian Linear Mixed-Effects (SN-BLME) model based on the linear model (9) can be written as

(16)

where *β* = (*β*_{1}, *β*_{2})* ^{T}* and

In this section, we report the results of our analysis of the three data sets (mentioned in Figure 3) using SN-BLME, SN-BNLME and SN-BSNLME joint models, respectively. A natural ln transformation for viral load data was used in the analysis in order to stabilize the variation of measurement error and speed up estimation algorithm. To avoid very small (large) estimates which may be unstable, we standardize the time-varying covariate CD4 cell counts (each CD4 value is subtracted by mean 375.46 and divided by standard deviation 228.57) and rescale the original time *t* (in days) so that the time scale is between 0 and 1. To fit the SN-BLME model and the SN-BNLME joint model, we included only the viral load data from day 0 to day 35 (Figure 3(a)) and day 0 to day 84 (Figure 3(b)), respectively, because the SN-BLME model could only be used to fit linear trajectories of viral load and the SN-BNLME assumptions might be violated after long-term treatment if there are rebounds of viral load (i.e., we excluded the significant rebound data from the analysis). In fitting the SN-BSNLME joint model, we use all viral load data shown in Figure 3(c).

To carry out Bayesian inference, we need to specify the values of the hyperparameters in the prior distributions. In Bayesian approach, we only need to specify the priors at the population level. We take weakly informative prior distributions for all the parameters. In particular, (i) fixed-effects were taken to be independent normal distribution *N*(0, 100) for each component of the population parameter vectors *α* and *β*. (ii) For the scale parameters *σ*^{2} and *τ*^{2} we assume a limiting non-informative inverse gamma prior distribution, *IG*(0.01, 0.01) so that the distribution has mean 1 and variance 100. (iii) The priors for the variance-covariance matrices of the random-effects Σ* _{a}* and Σ

The MCMC sampler was implemented using WinBUGS software (Lunn et al., 2000), and the program codes are available in Appendix B. In particular, the MCMC scheme for drawing samples from the posterior distributions of all parameters in the both response and covariate models is obtained by iterating between the following two steps: (i) Gibbs sampler is used to update *α*, *β*, *τ*^{2}, *σ*^{2}, Σ* _{a}*, Σ

We will investigate the following two scenarios. Firstly, As shown in Figure 1, the histograms of viral load and CD4 cell count clearly indicate their asymmetric nature and it seems adequate fitting a SN model to the data set. Since a normal distribution is a special case of a SN distribution when skewness parameter is zero, we will investigate how a SN distribution for random error contributes to modeling results and parameter estimation in comparison with a normal distribution for random error. Secondly, we also estimate themodel parameters by using the ‘naive’ method, which does not separate the measurement errors from the true CD4 values (i.e., completely ignores measurement errors in CD4 values in the modeling). That is, the ‘naive’ method only uses the observed CD4 values *z _{i j}* rather than true (unobservable) CD4 values
in the response Models I–II in which case the joint models revert to regular models without covariate models involved for inference. We use the ‘naive’ method as a comparison to the joint modeling method proposed in Section 2. This comparison attempts to investigate how the measurement errors in CD4 contribute to parameter estimation.

In this section, we investigate how a SN distribution contributes to modeling results in comparison with a normal distribution for random error. The population posterior mean (PM), the corresponding standard deviation (SD) and 95% credible interval (CI) for fixed-effect parameters based on the models with a SN or normal random error are presented in Table 1. The following findings are observed based on the estimated results. (i) Firstly, in the response models for the most interested parameters (*β*_{2}, *β*_{4}, *β*_{5}), *β*_{2} based on the three models with a normal random error are larger than the corresponding that based on the three models with a SN random error; all the estimates are statistically significant since the 95% CIs do not contain zero. Secondly, for (*β*_{4}, *β*_{5}), it can be seen that the estimates based on the SNBNLME and SN-BSNLME models are substantially different from those based on the N-BNLME and N-BSNLME models. Thirdly, for the variance parameter *σ*^{2}, its estimated values (0.04, 0.08, 0.09) based on the three models with a SN random error are much smaller than those (0.96, 0.54, 1.31) based on the three models with a normal random error. Finally, for the skewness parameter *δ _{e}*, we found that

A summary of the estimated posterior mean (PM) of interested population (fixed-effects) and precision parameters, the corresponding standard deviation (SD) and lower limit (*L*_{CI}) and upper limit (*U*_{CI}) of 95% equal-tail credible interval (CI) from the normal **...**

Figure 4 displays the three randomly selected individual estimates of viral load trajectories along with the associated 95% CIs on each fitted value obtained based on the BLME (left), BNLME (center) and BSNLME (right) models with a Normal (dotted line) or SN (solid line) random error, respectively. The following findings are observed from joint modeling results. (i) The estimated individual trajectories for the models where the random error is assumed to be SN fit the originally observed values much closer than those for the corresponding models where the random error is assumed to be normal. (ii) Overall, the 95% CI associated with each of fitted values from the normal models is wider than that from the corresponding SN models. (iii) All the 95% CIs from three SN models cover the true (observed) viral load values, while some of 95% CIs from three normal models do not. For example, for patient 39 whose observed value at day 115 is 10.57, the 95% CI from the SN-BSNLME model is (9.90,11.02) with the fitted value 10.51, while the corresponding 95% CI from the N-BSNLME model is (7.19, 9.53) with the fitted value 8.33 which does not cover the observed value 10.57.

The individual estimates of viral load trajectories for three randomly selected patients based on the BLME (left), BNLME (center) and BSNLME (right) models with a normal (dotted line) or SN (solid line) random error. The respective vertical dotted line **...**

We also investigate the model fitting results for each of the three mixed-effects models with normal and SN random errors, respectively. We have seen that, in general, all the models provided a reasonably good fit to the observed data for most patients in our study, although the fitting for a few patients was not completely satisfactory due to unusual viral load fluctuation patterns for these patients. To assess the goodness-of-fit of each of the three mixed-effects models with normal and SN random errors, the diagnosis plots of the observed values versus the fitted values are presented in Figure 5. It was seen from Figure 5 that the models where the random error is assumed to be SN provided better fit to observed data, compared with the models where the random error is assumed to be normal. This result can be also explained by examining the SN or normal Q-Q plots of the residuals (Figure 6) where all plots show the existence of outliers. From these plots, it can be seen that the SN models only have few negative outliers and thus fit observed data better than the corresponding normal models. This finding is further confirmed by their residual sum of squares (RSS). That is, for the BLME model the RSSs are 3.62 (SN random error) and 117.37 (normal random error); for the BNLME model the RSSs are 6.64 (SN random error) and 63.78 (normal random error); for the BSNLME model the RSSs are 7.49 (SN random error) and 312.58 (normal random error). As one of referees pointed out, we also checked the residual distributions by plotting densities of residuals for the three models with SN or normal distributions (the plots are not shown). Based on those plots, the residuals appear to be symmetrically distributed for the models with normal distribution, while the residuals follow a positive skew-distribution for the models with SN distribution, confirming the assumptions of the residual distributions.

The observed values versus fitted values of ln(RNA) based on the BLME (left), BNLME (center) and BSNLME (right) models with a normal or SN random error.

SN or normal Q-Q plots of residuals with lines based on the BLME (left), BNLME (center) and BSNLME (right) models with a normal or SN random error.

For selecting the best model that fits the data adequately, a Bayesian selection criterion is used. This criterion, known as deviance information criterion (DIC), was first suggested in a recent publication by Spiegelhalter et al.(2002). As with other model selection criteria, we caution that DIC is not intended for identification of the ‘correct’ model, but rather merely as a method of comparing a collection of alternative formulations. In each of the three models with the specification of different distributions for the random error, DIC can be used to find out how assumption of a SN distribution contributes to virologic responses and parameter estimation in comparison with that of a normal distribution. We found that the DIC values, 335.14 (SN-BLME), 607.24 (SN-NLME) and 1051.21 (SN-SNLME) for the three models with a SN random error are smaller than the corresponding those, 524.54 (normal-BLME), 701.17 (normal-NLME) and 1355.45 (normal-SNLME) for the three models with a normal random error, respectively.

As mentioned before, it is hard to tell which model is ‘correct’ but which one fits data better. Therefore, based on the DIC criterion, the results indicate that each of the three models with a SN random error fits the data better, supporting the contention of a departure from normality. These results are consistent with those from diagnosis of the goodness-of-fit displayed in Figure 5. In summary, our results may suggest that it is very important to assume a SN distribution for the response models in order to achieve reliable results, in particular if data exhibit skewness. Along with these observations, we further report our results in details only for the three models with a SN random error.

The population estimates of the viral dynamic parameters presented in Table 1 based on the three (SN-BLME, SN-BNLME, SN-BSNLME) models indicate that the estimates of *β*_{1} from the different models agree fairly well. However, the estimates (20.9 and 17.1) of the first decay rate *β*_{2} by SN-BNLME and SN-BSNLME modeling methods are significantly different from that (3.43) obtained by SN-BLME modeling method. Although the estimates of *β*_{2} by SN-BNLME and SN-BSNLME modeling methods are comparable, one problem is that we considered only 84-day data for SN-BNLME model fitting. This means that only 68% of the data from the 168-day period were included due to arbitrary truncation of data. Therefore, the SN-BNLME modeling may not be efficient. In this case, we prefer to use the SN-BSNLME model in which a smooth function of treatment time is incorporated into the second-phase decay rate to better catch rebound viral load data and all data measured during the treatment period can be used.

In the SN-BLME and SN-BNLME model fittings, individual curves (solid lines in Figure 4) for each subject follow a similar trend; that is, the trajectories of viral load decay in all 3 subjects decrease rapidly in the first-phase, then flatten in general. When the entire treatment period is considered, the viral loads of subject 39 rebound after the second-phase, whereas the viral loads of subjects 23 and 32 remain low until the end of the treatment period. Obviously, the SN-BLME and SN-BNLME model fittings are reasonable for data cutting at days 35 and 84, but they do not represent data measured over the whole treatment period.

It is also worth noting the differences among the estimated values of the first-phase decay rate (*β _{i}*

Correlations between baseline ln(RNA) levels and the subject-specific first phase viral decay rates, *β*_{i}_{2} estimated by using each of the three different methods. The solid lines are robust (MM-estimator) linear regression fit. The correlation coefficients **...**

The incongruity in the individual estimates of the first-phase viral decay rates and their correlations with baseline ln(RNA) levels, as determined by the SNBLME, SN-BNLME and SN-BSNLME modeling methods, is significantly different with the following two observed scenarios: (i) Although the individual estimates of *β _{i}*

To fit the SN-BLME model, we truncated the data at day 35 in this study. However, it is not clear where one should cut the data between the first- and second-phases of decay. Also, different subjects may have different change points between the two phases. For example, truncation at day 35 may cause data from the second-phase to be included with first-phase data. It is for this reason that SN-BLME models underestimate the first-phase decay rates (*β*_{2}). The SN-BSNLME modeling method is preferable to the SN-BLME and SN-BNLME modeling methods, especially for sparse individual data. We believe that estimates obtained from the SN-BSNLME modeling method and their correlations with baseline ln(RNA) levels are reliable since the complete data are used. Conversely, the both SN-BLME and SN-BNLME model fittings may result in misleading conclusions, as shown above, perhaps because it is impossible to find a truncation point to data for these two model fittings that would be suitable for all patients. The estimated values of *β*_{2} would be affected by the inclusion of second-phase data if truncation occurred too late, and by the loss of data if truncation was too early.

For comparison, as an example, we also employed the ‘naive’ method-based the SN-BSNLME model to estimate the model parameters presented in Table 1 using the observed CD4 values and ignoring the CD4 measurement errors. It can be seen from the estimated results that the estimates of the parameters from the naive method are significantly larger than those from the joint modeling method. It indicates that the naive method may produce overestimated results with substantial biases; in particular, the estimated covariate CD4 effect (*β*_{5}) from the naive method is 5 times greater than that from the joint modeling method. The joint modeling method appears to give larger SD in most cases, probably because it incorporates the variation from fitting the CD4 covariate process. Further, the estimate of the model skewness parameter *δ _{e}* for the naive method is slightly smaller than that for the joint modeling method; this result suggests that the naive method may underestimate the skewness parameter due to ignoring measurement errors in CD4 values. Thus the difference of the naive estimates and the joint modeling estimates, due to whether or not ignoring potential CD4 measurement errors in conjunction with the SN-BSNLME model, indicates that it is important to take the measurement errors into account in the analysis when covariates are measured with errors.

For viral dynamic models with skewness characteristics of viral load responses and CD4 measurement errors in covariate, we have investigated and compared the three Bayesian mixed-effects models with a SN distribution that may be preferred over those with a normal distribution in the sense that it produces less biased parameter estimates and provides better fit to observed data. The proposed method may have a significant impact on AIDS research because, in the presence of skewness in the data, appropriate statistical inference for HIV dynamics is important for making robust conclusions and reliable clinical decisions. Our proposed method is quite general and so can be used to other applications. This kind of SN modeling approach is important in many biostatistical application areas, allowing accurate inference of parameters while adjusting for the data with skewness. The SN distribution is shown to provide an alternative to normal (symmetric) distribution that is often assumed in statistical models. The results indicate that with SN distribution assumption, there is potential to gain efficiency and accuracy in estimating certain parameters when the normality assumption does not hold in the data. The models considered in this paper can be easily fitted using MCMC procedure. Moreover, the proposed modeling approach is fitted using the WinBUGS package that is available publicly. This makes our approach quite powerful and accessible to practicing statisticians in the fields.

To estimate the viral dynamic parameters and study the relation of the baseline level of HIV-1 RNA in plasma with the decay rate of the first-phase of response to treatment, we compared the results of SN-BLME, SN-BNLME and SNBSNLME model fittings, and found that the both SN-BLME and SN-BNLME model fittings in short-term dynamics may result in misleading conclusions due to data truncation. Of particular interest is that when long-term dynamics are considered, SN-BNLME may also become unreliable because of the complexity of the second-phase of decay. The foregoing results indicate that in a two-phase HIV dynamic model, the first decay rate may remain constant, while the second decay rate may change which depends on time-varying CD4 covariate during the period of study. The analysis results suggest that there may be a significantly positive correlation between the first-phase viral decay and the baseline HIV-1 RNA levels based on the SN-BSNLME modeling method. This finding is consistent with those reported in publications (Notermans et al.,1998; Wu et al., 2004). This positive correlation may be partially explained by the fact that the higher baseline viral load value, which is equivalent to the lower baseline CD4 value due to a negative relation between these two baseline factors, suggests a lower turnover rate of hymphocyte cells, which may lead to a positive correlation between the first-phase viral decay rate (*β _{i}*

We would point out that the problems we have addressed in this paper cannot be resolved by the standard goodness-of-fit or model selection methods, which are usually used when applying different models to the same data set. From Figure 4, we can see that all three models (BLME, BNLME and BSNLME) fitted the corresponding data very well. Goodness-of-fit or model selection methods are unable to identify the right model. We have found that the different models should not be applied to the same, entire data set, but applied simultaneously to appropriate segments of the data set such that the results will be biologically meaningful.

In conclusion, BLME fitting may be misleading and its use should be avoided; BNLME fitting may work well but is subject to data truncation; BSNLME fitting works in a similar way to BNLME fitting but has no data-screening problems associated with its use. Care is necessary in the implementation of BNLME and BSNLME fittings. With the introduction of SN distribution in the models, the estimated results suggest that the skewness parameters in viral load and CD4 cell count are estimated to be significantly positive for each of the three models. This confirms the positive skewness of the viral load and CD4 data presented in Figure 1. Thus, we may conclude that accounting for significant skewness is required when one models a data set which exhibits skewness.

This paper combines new technologies in mathematical modeling and statistical inference with advances in viral dynamics and ARV treatment to quantify complex HIV disease mechanisms. The complex nature of HIV/AIDS will naturally pose some challenges such as nonignorable missing data and data with detection limit problems. Furthermore, other more flexible distributions for model errors such as skew-*t* distributions (Sahu et al., 2003), normal/independent distributions (Lange and Sinsheimer, 1993) or nonparametric distributions such as an unspecified distribution that has a Dirichlet process prior (Ferguson, 1973) can be assumed. These complicated problems are beyond the purpose of this article, but a further study may be warranted. We are actively investigating these problems and associated simulation studies now. We hope that we could report these interesting results in the near future.

Recently, there has been an increasing interest in finding more flexible methods to represent features of the data as adequately as possible and to reduce unrealistic assumptions. One approach for data modeling consists in constructing flexible parametric classes of multivariate distributions that is different from the normal distribution. The skew-elliptical distribution is an attractive class of asymmetric thick-tailed parametric structure which includes the skew-normal (SN) distribution as a special case. Different versions of the multivariate SN distributions have been considered and used in the literature (Arellano-Valle et al., 2005; Azzalini and Capitanio, 1999; Arellano-Valle and Genton, 2005; Arellano-Valle and Azzalini, 2006; Azzalini and Dalla-Valle, 1996; Sahu et al., 2003 and among others). These studies demonstrated that the SN distribution has reasonable flexibility in real data fitting, while it maintains some convenient formal properties of the normal density. For more detailed discussions on properties and theories of SN distribution and its potential applications as well as differences among various versions of SN distributions, see References listed above.

In this paper, we consider a multivariate SN distribution introduced by Sahu et al.(2003) which is suitable for straightforward Bayesian analysis through hierarchical representations since it is built using conditional method. In particular, it is relatively easy to implement and provides an interesting alternative to other computationally challenging parametric or nonparametric models (Ferguson, 1973; Lange and Sinsheimer, 1993; M*ü*ller and Rosner, 1997). For completeness, this section is started by briefly summarizing the multivariate SN distribution that will be used in defining the SN joint models considered in this paper.

An *m*-dimensional random vector *Y* follows an *m* variate SN distribution with location vector *μ*, *m* × *m* positive (diagonal) dispersion matrix Σ and *m* × *m* skewness matrix Δ(*δ*) = *diag*(*δ*_{1}, *δ*_{2}, …, *δ _{m}*), if its probability density function (pdf) is given by

(17)

where *y*_{*} = *y* − *μ*, *δ* = (*δ*_{1}, *δ*_{2}, …, *δ _{m}*)

We enclose WinBUGS program code based on Model I (SN-BSNLME model) for reference. WinBUGS program codes of Models II and III can be easily modified from it or are available from authors upon request.

# *Model I: SN Bayesian Semiparametric Nonlinear Mixed-Effects (SN-BSNLME)*

# *Model in conjunction with the semiparametric biexponential model (11)*.

## **Variables in the datasets “rnacd48” and “Z-matrix” for spline:**

#y[,1] = serial number

#y[,2] =arm

#y[,3] = patid

#y[,4] =id

#y[,5] =time (day)

#y[,6] = logrna

#y[,7] = rna

#y[,8] = cd4

#y[,9]=cd8

# [,10]=logerna

#y[,11]= cd4 (standardized)

#y[,12]= cd8 (standardized)

#y[,13]= time(day) (rescaled between 0 and 1 by dividing by max)

#y[,14]= cd4.baseline (standardized)

#Z[,]= Values of spline basis functions

##**Begin of model**

model {

for (i in 1:44){

a2[i,1]< −0

a2[i,2]< −0

a2[i,3]< −0

a2[i,4]< −0

a3[i,1]< −0

a3[i,2]< −0

a3[i,3]< −0

b[i,1:4] ~dmnorm(a2[i,1:4],Sigma2[,]) # random effects for main model

a[i,1:3] ~dmnorm(a3[i,1:3],Sigma3[,]) # random effects for covariate model

}

for (j in 1 : 310) {

## **Modeling true CD4 via measurement errors model**

z.star[j]< −(alpha[1]+a[y[j,4],1])+(alpha[2]+a[y[j,4],2])*y[j,13]+(alpha[3]+

a[y[j,4],3])*y[j,13]*y[j,13]+delta2*w2[j] ## Skew-normal

w2[j] ~ dnorm(0, 1 ) I(0,)

# z.star[j]< −(alpha[1]+a[y[j,4],1])+(alpha[2]+a[y[j,4],2])*y[j,13]+

# (alpha[3]+a[y[j,4],3])*y[j,13]*y[j,13] ## Normal

y[j,11] ~ dnorm(z.star[j],tau2) ## y[j,11] =observed cd4

## **Viral load response model associated with true CD4 covariate**

betai1[j] < − beta[1]+ b[y[j,4],1] ## y[,4]=id

betai2[j] < − beta[2]+b[y[j,4],2]

betai3[j] < − beta[3]+ b[y[j,4],3]

betaij4[j]<−beta[4]+beta[5]*z.star[j]+mu.not[1]*Z[j,2]+mu.not[2]*Z[j,3]+b[y[j,4],4]

dm1[j] < − betai1[j]-step(betai2[j]-betaij4[j])*betai2[j]*y[j,13]

dm2[j] < −betai3[j]-step(betai2[j]-betaij4[j])*betaij4[j]*y[j,13]

dm3[j]< − exp(dm1[j])

dm4[j]< − exp(dm2[j])

dm5[j]< − dm3[j] +dm4[j]

mu[j] < − log(dm5[j]) + delta*w[j] ## Skew-normal

w[j] ~ dnorm(0, 1 ) I(0,)

# mu[j] < − log(dm5[j]) ## Normal

y[j,10] ~ dnorm(mu[j], tau) ## y[,10]=logerna

## **Fitted values and residues**

fit[j] < − mu[j]

res[j] < − y[j,10]-fit[j]

}

## **Prior distributions of the hyperparameters**

# (1) Coefficients

for (l in 1:5)

{ beta[l] ~ dnorm(0,1.0E-2) }

for (l in 1:2)

{mu.not[l] ~ dnorm(0,1.0E-2)}

for (k in 1:3){ alpha[k] ~ dnorm(0,1.0E-2) }

# (2) Precision parameters

tau ~ dgamma(0.01,0.01)

sigma.tau < − 1/tau

tau2 ~ dgamma(0.01,0.01)

sigma.tau2 < − 1/tau2

# (3) Variance-covariance matrices

Sigma2[1:4,1:4] ~ dwish(R2[,],5)

v2[1:4,1:4] < − inverse(Sigma2[,])

Sigma3[1:3,1:3]*sim* dwish(R3[,],5)

v3[1:3,1:3] < − inverse(Sigma3[,])

# (4) Skewness parameters

delta ~dnorm(0.0, 0.01)

delta2 ~ dnorm(0.0, 0.01)

} ## **End of model**

## **Part of data**

list(R2 =structure( .Data = c(1, 0,0,0,0,1,0,0,0,0,1,0,0,0,0,1), .Dim = c(4, 4)),

R3 = structure( .Data = c(1, 0,0,0,1,0,0,0,1), .Dim = c(3, 3)))

## **Initial values**

list(beta=c(5.4,3.4,4.0,1.5,4.2),alpha=c(0,0,0),mu.not=c(0,0),

delta=0.5,delta2=0.5,tau=1,tau2=1,

Sigma2= structure( .Data = c(1.229, .043,-.750,.170, .043,.090,.002,-.013,

-.750,-.002,1.059,-.214, .170,-.013,-.214, .120), .Dim = c(4, 4)),

Sigma3= structure( .Data = c(.1, 0,0, 0,.1,0, 0,0,.1), .Dim = c(3, 3)))

## **End of Program**

**Author Notes:** The authors gratefully acknowledge the Editor and two anonymous referees for their insightful comments and helpful suggestions that led to a marked improvement of the article. We thank A5055 study investigators for allowing us to use the data from their study. This research was partially supported by NIAID/NIH grant R03 AI080338 and MSP/NSA grant H98230-09-1-0053 to Y. Huang.

Yangxin Huang, University of South Florida.

Ren Chen, University of South Florida.

Getachew Dagne, University of South Florida.

- Acosta EP, Wu H, Hammer SM, Yu S, Kuritzkes DR, Walawander A, Eron JJ, Fichtenbaum CJ, Pettinelli C, Neath D, Ferguson E, Saan AJ, Gerber JG, for the Adult ACTG 5055 Protocol Team Comparison of two indinavir/ritonavir regimens in treatment of HIV-infected individuals. Journal of Acquired Immune Deficiency Syndromes. 2004;37:1358–1366. doi: 10.1097/00126334-200411010-00004. [PubMed] [Cross Ref]
- Arellano-Valle RB, Bolfarine H, Lachos VH. Skew-normal linear mixed models. Journal of Data Science. 2005;3:415–438.
- Azzalini A, Capitanio A. Statistical applications of the multivariate skew normal distribution. Journal of Royal Statistical Society, Series B. 1999;67:579–602. doi: 10.1111/1467-9868.00194. [Cross Ref]
- Arellano-Valle RB, Genton MG. On fundamental skew distributions. Journal of Multivariate Analysis. 2005;96:93–116. doi: 10.1016/j.jmva.2004.10.002. [Cross Ref]
- Arellano-Valle RB, Azzalini A. On the unification of families of skew-normal distributions. Scandinavian Journal of Statistics. 2006;33:561–574. doi: 10.1111/j.1467-9469.2006.00503.x. [Cross Ref]
- Azzalini A, Dalla-Valle A. The multivariate skew-normal distribution. Biometrika. 1996;83:715–726. doi: 10.1093/biomet/83.4.715. [Cross Ref]
- Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models: A Modern Perspective 2nd edition. London: Chapman and Hall; 2006.
- Ding AA, Wu H. Relationships between antiviral treatment effects and biphasic viral decay rates in modeling HIV dynamics. Mathematical Biosciences. 1999;160:63–82. doi: 10.1016/S0025-5564(99)00021-8. [PubMed] [Cross Ref]
- Davidian M, Gilitinan DM. Nonlinear Models for Repeated Measurement Data. London: Chapman & Hall; 1995.
- Ferguson TS. A Bayesian analysis of some nonparametric problems. Annals of Statistics. 1973;1:209–230. doi: 10.1214/aos/1176342360. [Cross Ref]
- Ghosh P, Branco MD, Chakraborty H. Bivariate random effect model using skew normal distribution with application to HIV-RNA. Statistics in Medicine. 2007;26:1255–1267. doi: 10.1002/sim.2667. [PubMed] [Cross Ref]
- Higgins M, Davidian M, Gilitinan DM. A two-step approach to measurement error in time-dependent covariates in nonlinear mixed-effects models, with application to IGF-I pharmacokinetics. Journal of the American Statistical association. 1997;92:436–448. doi: 10.2307/2965691. [Cross Ref]
- Huang Y, Liu D, Wu H. Hierarchical Bayesian methods for estimation of parameters in a longitudinal HIV dynamic system. Biometrics. 2006;62:413–423. doi: 10.1111/j.1541-0420.2005.00447.x. [PMC free article] [PubMed] [Cross Ref]
- Huang Y, Dagne G. Skew-normal Bayesian nonlinear mixed-effects models with application to AIDS studies. Statistics in Medicine. 2010;29:2384–2398. [PubMed]
- Lange KL, Sinsheimer JS. Normal/independent distributions and their applications in robust regression. J Comput Graph Stat. 1993;2:175–198. doi: 10.2307/1390698. [Cross Ref]
- Liang H, Wu H, Carroll R. The relationship between virologic and immunologic responses in aids clinical research using mixed-effect varyingcoefficient semiparametric models with measurement error. Biostatistics. 2003;4:297–312. doi: 10.1093/biostatistics/4.2.297. [PubMed] [Cross Ref]
- Liu W, Wu L. Simultaneous inference for semiparametric nonlinear mixed-effects models with covariate measurement errors and missing responses. Biometrics. 2007;63:342–350. doi: 10.1111/j.1541-0420.2006.00687.x. [PubMed] [Cross Ref]
- Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS – a Bayesian modelling framework: concepts, structure, an extensibility. Statistics and Computing. 2000;10:325–337. doi: 10.1023/A:1008929526011. [Cross Ref]
- Müller P, Rosner GL. A Bayesian populationmodel with hierarchical mixture priors applied to blood count data. Journal of the American Statistical Association. 1997;92:1279–1292. doi: 10.2307/2965398. [Cross Ref]
- Notermans DW, Goudsmit J, Danner SA, De Wolf F, Perelson AS, Mittler J. Rate of HIV-1 decline following antiretroviral therapy is related to viral load at baseline and drug regimen. AIDS. 1998;12:1483–1490. doi: 10.1097/00002030-199812000-00010. [PubMed] [Cross Ref]
- Perelson AS, Essunger P, Cao Y, Vesanen M, Hurley A, Saksela K, Markowitz M, Ho DD. Decay characteristics of HIV-1-infected compartments during combination therapy. Nature. 1997;387:188–191. doi: 10.1038/387188a0. [PubMed] [Cross Ref]
- Sahu SK, Dey DK, Branco MD. A new class of multivariate skew distributions with applications to Bayesian regression models. The Canadian Journal of Statistics. 2003;31:129–150. doi: 10.2307/3316064. [Cross Ref]
- Spiegelhalter DJ, Best NG, Carlin BP, Van der Linde A. Bayesian measures of model complexity and fit (with Discussion) Journal of the Royal Statistics Society, Series B. 2002;64:583–639. doi: 10.1111/1467-9868.00353. [Cross Ref]
- Verbeke G, Lesaffre E. A linear mixed-effects model with heterogeneity in random-effects population. Journal of the American Statistical Association. 1996;91:217–221. doi: 10.2307/2291398. [Cross Ref]
- Wu H, Ding AA. Population HIV-1 dynamics
*in vivo*: applicable models and inferential tools for virological data from AIDS clinical trials. Biometrics. 1999;55:410–418. doi: 10.1111/j.0006-341X.1999.00410.x. [PubMed] [Cross Ref] - Wu L. A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to AIDS studies. Journal of the American Statistical Association. 2002;97:955–964. doi: 10.1198/016214502388618744. [Cross Ref]
- Wu H, Zhang JT. The study of long-term HIV dynamics using semiparametric non-linear mixed-effects models. Statistics in Medicine. 2002;21:3655–3675. doi: 10.1002/sim.1317. [PubMed] [Cross Ref]
- Wu H, Lathey J, Ruan P, Douglas SD, Spector SA, Lindsey J, Hughes‘234 MD, Rudy BJ, Flynn PM, PACTG 381 TEAM Relationship of plasma HIV-1 RNA dynamics to baseline factors and virological responses to Highly Active Antiretroviral Therapy (HAART) in adolescents infected through risk behavior. Journal of Infectious Diseases. 2004;189:593–601. doi: 10.1086/381500. [PubMed] [Cross Ref]
- Wu H, Zhao C, Liang H. Comparison of linear, nonlinear and semiparametric mixed-effects models for estimating HIV dynamic parameters. Biometrical Journal. 2004;6:233–245. doi: 10.1002/bimj.200310019. [Cross Ref]
- Wu L. Simultaneous inference for longitudinal data with detection limits and covariates measured with errors, with application to AIDS studies. Statistics in Medicine. 2004;23:1715–1731. doi: 10.1002/sim.1748. [PubMed] [Cross Ref]

Articles from The International Journal of Biostatistics are provided here courtesy of **Berkeley Electronic Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |