Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2743512

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Nonlinear Random Effects Finite Mixture Model and MAP Estimation Problem
- 3. Solution via the EM Algorithm
- 4. Subpopulation Classification and Model Selection
- 5. Example
- 6. Discussion
- References

Authors

Related links

Comput Stat Data Anal. Author manuscript; available in PMC 2010 October 1.

Published in final edited form as:

Comput Stat Data Anal. 2009 October 1; 53(12): 3907–3915.

doi: 10.1016/j.csda.2009.04.017PMCID: PMC2743512

NIHMSID: NIHMS117073

See other articles in PMC that cite the published article.

Pharmacokinetic/pharmacodynamic phenotypes are identified using nonlinear random effects models with finite mixture structures. A maximum *a posteriori* probability estimation approach is presented using an EM algorithm with importance sampling. Parameters for the conjugate prior densities can be based on prior studies or set to represent vague knowledge about the model parameters. A detailed simulation study illustrates the feasibility of the approach and evaluates its performance, including selecting the number of mixture components and proper subject classification.

The use of mathematical modeling is central to the study of the absorption, distribution and elimination of therapeutic drugs and to understanding how drugs produce their effects. From its inception the field of pharmacokinetics and pharmacodynamics has incorporated methods of mathematical modeling, simulation and computation in an effort to better understand and quantify the processes of uptake, disposition and action of therapeutic drugs. These methods for pharmacokinetic/pharmacodynamic (PK/PD) systems analysis impact all aspects of drug development including *in vitro*, animal and human testing, as well as drug therapy. Modeling methodologies developed for studying pharmacokinetic/pharmacodynamic processes confront many challenges related in part to the severe restrictions on the number and type of measurements that are available from laboratory experiments and clinical trials, as well as the variability in the experiments and the uncertainty associated with the processes themselves.

Since their initial application to pharmacokinetics in the 1970’s, Bayesian methods have provided a framework for PK/PD modeling in drug development that can address some of the above-mentioned challenges. Sheiner *et al.* (1975) applied Bayesian estimation (maximum *a posteriori* probability estimation, MAP) to combine population information with drug concentration measurements obtained in an individual, in order to determine a patient-specific dosage regimen. Katz, Azen and Schumitzky (1981) reported the first efforts to calculate the complete posterior distribution of model parameters in a nonlinear pharmacokinetic individual estimation problem, for which they used numerical quadrature methods to perform the needed multi-dimensional integrations.

The population PK/PD problem has also been cast in a Bayesian framework, initially by Racine-Poon (1985) using a two-stage approach, and more generally by Wakefield *et al.* (1994). Solution to this computationally demanding problem is accomplished through the application of Markov chain Monte Carlo methods pioneered by Gelfand and Smith (1990) and now available in general purpose software (see Lunn *et al.*, 2002 for discuss relevant to PK/PD population modeling).

Population PK/PD modeling, including Bayesian approaches, are used in drug development to identify the influence of measured covariates (e.g., demographic factors and disease status) on drug kinetics and response. It is now recognized, however, that genetic polymorphisms in drug metabolism and in the molecular targets of drug therapy can also influence the efficacy and toxicity of medications (Evans and Relling, 1999). Population modeling approaches that can identify and model distinct subpopulations not related to available measured covariates may, therefore, help determine otherwise unknown genetic and other determinants of observed pharmacokinetic/pharmacodynamic phenotypes.

We have previously reported on a maximum likelihood approach using finite mixture models to identify subpopulations with distinct pharmacokinetic/pharmacodynamic properties (Wang *et al.*, 2007). Wakefield and Walker (1997) and Rosner and Mueller (1997) have introduced Bayesian approaches to address this problem within a nonparametric mixture model framework. In this paper a computationally practical maximum *a posteriori* probability estimation (MAP) approach is proposed using finite mixture models. Section 2 of this paper describes the finite mixture model within a nonlinear random effects framework and defines the MAP estimation problem. A solution via the EM algorithm is presented in Section 3. Subject classification and model selection issues are discussed in Section 4. An example and detailed simulation study are presented in Section 5 and Section 6 contains a discussion. The Appendix discusses important asymptotic properties of the MAP estimator, further motivating its use, and derives the formulas for the asymptotic covariance.

A two-stage nonlinear random effects model that incorporates a finite mixture model is given by

$${Y}_{i}\mid {\theta}_{i},\beta \sim N({h}_{i}({\theta}_{i}),{G}_{i}({\theta}_{i},\beta )),\phantom{\rule{0.38889em}{0ex}}i=1,\dots ,n$$

(1)

and

$${\theta}_{1},\dots ,{\theta}_{n}{\sim}_{i.i.d}\sum _{k=1}^{K}{w}_{k}N({\mu}_{k},{\mathrm{\sum}}_{k}),$$

(2)

where *i* =1, …, *n* indexes the individuals and *k* =1, …, *K* indexes the mixing components. The alternate problem formulation presented in Cruz-Mesía *et al.* (2008) and in Pauler and Laird (2000), also results in the solution outlined below.

At the first stage represented by (1), *Y _{i}* = (

At the second stage given by (2), a finite mixture model with *K* multivariate normal components is used to describe the population distribution. The weights {*w _{k}*}are nonnegative numbers summing to one, denoting the relative size of each mixing component (subpopulation), for which

Letting represent the collection of parameters {*β*, (*w _{k}*,

$$p(\phi \mid {Y}^{n})=\frac{p({Y}^{n}\mid \phi )p(\phi )}{p({Y}^{n})}=\frac{{\displaystyle \prod _{i=1}^{n}}p({Y}_{i}\mid \phi )p(\phi )}{p({Y}^{n})}.$$

where *Y ^{n}*={

For mixture of normal distributions, a multivariate normal prior on the mean for each mixing component is given by:

$${\mu}_{k}\mid {\mathrm{\sum}}_{k}\sim N({\lambda}_{k},{\mathrm{\sum}}_{k}/{\tau}_{k}),k=1,\dots ,K,$$

(3)

where *λ _{k}* and

$${({\mathrm{\sum}}_{k})}^{-1}\sim \mathit{Wishart}({q}_{k},{\mathrm{\Psi}}_{k}),\phantom{\rule{0.16667em}{0ex}}k=1,\dots ,K.$$

(4)

The mixing weights have a Dirichlet distribution as the prior:

$$({w}_{1},\dots ,{w}_{K})\sim \mathit{Dirichlet}({a}_{1},\dots ,{a}_{K}).$$

(5)

We consider the model given by (1) and (2) for the important case *G _{i}*(

$${({\sigma}^{2})}^{-1}\sim \mathit{Gamma}(a,b).$$

(6)

These densities are conjugate priors to the multivariate normal mixtures (see appendix for the parameterizations used).

The hyper-parameters {*a*, *b*, (*a _{k}*,

The EM algorithm, originally introduced by Dempster, Laird and Rubin (1977), is used to perform iterative computation of maximum likelihood (ML) estimates. It is applied below to solve the MAP estimation problem defined in the previous section.

The component label vector *z _{i}* is introduced as a

$$Q(\phi ,{\phi}^{(r)})=E\{log{L}_{c}(\phi )\mid {Y}^{N},{\phi}^{(r)}\}+logp(\phi ),$$

where
$log{L}_{c}(\phi )={\displaystyle \sum _{i=1}^{n}}{\displaystyle \sum _{k=1}^{K}}{z}_{i}(k)log\left({w}_{k}p\left({Y}_{i},{\theta}_{i}\mid {\sigma}^{2},{\mu}_{k},{\mathrm{\sum}}_{k}\right)\right)$. In the M step, the posterior mode ^{(}^{r}^{+1)} is estimated as the optimizer of *Q*(,^{(}^{r}^{)}) such that
${\phi}^{(r+1)}=\underset{\phi}{argmax}Q(\phi ,{\phi}^{(r)})$. Under the prior defined above, the updating process of the M step is:

$$\begin{array}{l}{{w}_{k}}^{(r+1)}=\frac{{\displaystyle \sum _{i=1}^{n}}\int {g}_{ik}({\theta}_{i},{\phi}^{(r)})d{\theta}_{i}+({a}_{k}-1)}{n-K+{\displaystyle \sum _{k=1}^{K}}{a}_{k}},\\ {\mu}_{k}^{(r+1)}=\frac{{\displaystyle \sum _{i=1}^{n}}\int {\theta}_{i}{g}_{ik}({\theta}_{i},{\phi}^{(r)})d{\theta}_{i}+{\tau}_{k}{\lambda}_{k}}{{\displaystyle \sum _{i=1}^{n}}\int {g}_{ik}({\theta}_{i},{\phi}^{(r)})d{\theta}_{i}+{\tau}_{k}},\\ {\mathbf{\sum}}_{k}^{(r+1)}=\frac{{\displaystyle \sum _{i=1}^{n}}\int ({\theta}_{i}-{{\mu}_{k}}^{(r+1)})({\theta}_{i}-{{\mu}_{k}}^{(r+1)}){g}_{ik}({\theta}_{i},{\phi}^{(r)})d{\theta}_{i}+{\tau}_{k}({\lambda}_{k}-{{\mu}_{k}}^{(r+1)}){({\lambda}_{k}-{{\mu}_{k}}^{(r+1)})}^{T}+{\mathbf{\psi}}_{k}}{{\displaystyle \sum _{i=1}^{n}}\int {g}_{ik}({\theta}_{i},{\phi}^{(r)})d{\theta}_{i}+{q}_{k}-d},\end{array}$$

and

$${({\sigma}^{2})}^{(r+1)}=\frac{{\displaystyle \sum _{i=1}^{n}}{\displaystyle \sum _{k=1}^{K}}\int {({Y}_{i}-{h}_{i}({\theta}_{i}))}^{T}{H}_{i}{({\theta}_{i})}^{-1}({Y}_{i}-{h}_{i}({\theta}_{i})){g}_{ik}({\theta}_{i},{\phi}^{(r)})d{\theta}_{i}+2b}{{\displaystyle \sum _{i=1}^{n}}{m}_{i}+2(a-1)},$$

where
${g}_{ik}({\theta}_{i},\phi )=\frac{{w}_{k}p({Y}_{i}\mid {\sigma}^{2},{\theta}_{i})p({\theta}_{i}\mid {\mu}_{k},{\mathrm{\sum}}_{k})}{{\displaystyle \sum _{k=1}^{K}}{w}_{k}\int p({Y}_{i}\mid {\sigma}^{2},{\theta}_{i})p({\theta}_{i}\mid {\mu}_{k},{\mathrm{\sum}}_{k})d{\theta}_{i}}\text{for}\phantom{\rule{0.16667em}{0ex}}k=1,\dots ,K$ and *d* = dim(*θ _{i}*).

The E and M steps are repeated until convergence. Discussions on the sufficient conditions for the convergence can be found at Dempster et al. (1977), Wu (1983) and Tseng (2005). In the appendix, an error analysis for * ^{MAP}* is presented.

In order to implement the algorithm, all the integrals in the updating equations must be evaluated at each iterative step. For the maximum likelihood mixture model problem we have successfully used importance sampling to calculate the corresponding integrals in the EM algorithm (see Wang *et al.* 2007). The same method can be applied here and was used in the examples presented below. In brief, an envelope function *p _{e}*

$$\int f({\mathbf{\theta}}_{i}){g}_{ik}({\mathbf{\theta}}_{i},\phi )d{\mathbf{\theta}}_{i}\cong \frac{{\displaystyle \sum _{l=1}^{T}}f\left({\mathbf{\theta}}_{i(k)}^{(l)}\right){w}_{k}p({\mathbf{Y}}_{i}\mid {\sigma}^{2},{\mathbf{\theta}}_{i(k)}^{(l)})p({\mathbf{\theta}}_{i(k)}^{(l)}\mid {\mathbf{\mu}}_{k},{\mathbf{\sum}}_{k})/{p}_{e(k)}({\mathbf{\theta}}_{i(k)}^{(l)})}{{\displaystyle \sum _{k=1}^{K}}{\displaystyle \sum _{l=1}^{T}}{w}_{k}p({\mathbf{Y}}_{i}\mid {\sigma}^{2},{\mathbf{\theta}}_{i(k)}^{(l)})p({\mathbf{\theta}}_{i(k)}^{(l)}\mid {\mathbf{\mu}}_{k},{\mathbf{\sum}}_{k})/{p}_{e(k)}({\mathbf{\theta}}_{i(k)}^{(l)})}.$$

The envelope distribution is taken to be a multivariate normal density using the subject’s previously estimated mixing-component specific conditional mean (i.e., ∫**θ*** _{i}g_{ik}* (

Assigning each individual subject to a subpopulation follows the same method as presented in Wang *et al*. (2007) for ML estimation. The E-step computes the posterior probability that subject *i* belongs to the *k*th subgroup:

$${\tau}_{i}(k)=\frac{{w}_{k}p({Y}_{i}\mid {\sigma}^{2},{\mu}_{k},{\mathrm{\sum}}_{k})}{{\displaystyle \sum _{k=1}^{K}}{w}_{k}p({Y}_{i}\mid {\sigma}^{2},{\mu}_{k},{\mathrm{\sum}}_{k})}=\frac{{w}_{k}\int p({Y}_{i}\mid {\sigma}^{2},{\theta}_{i})p({\theta}_{i}\mid {\mu}_{k},{\mathrm{\sum}}_{k})d{\theta}_{i}}{{\displaystyle \sum _{k=1}^{K}}{w}_{k}\int p({Y}_{i}\mid {\sigma}^{2},{\theta}_{i})p({\theta}_{i}\mid {\mu}_{k},{\mathrm{\sum}}_{k})d{\theta}_{i}},$$

and each individual is classified to the subpopulation associated with the highest posterior probability of membership. For example, for each *i,* (*i* =1, …, *n*), set * _{i}* (

Determining the number of mixing components *K* in a finite mixture model is an important but difficult problem. To compare two non-nested models, in contrast to likelihood ratio procedures for comparing nested models, several criteria have been proposed. For example, Lemenuel-diot *et al*. (2006) based such model selection on the Kullback-Leibler test, with the null hypothesis of *p*_{0} components versus the alternative hypothesis of *p*_{1} components (*p*_{1} > *p*_{0}). Other criteria are based on a penalized objective function. For two models A and B with different values of *K*, say *K*_{A} and *K*_{B}, one would compare objective functions Q_{A} and Q_{B}, and prefer the model with the larger value. The Akaike Information Criterion (AIC) takes the form *AIC* = *L* − *P,* where *L* is the maximized log-likelihood for the model and *P* is the total number of estimated parameters. The Bayesian information criterion (BIC) is defined as *BIC* = *L* − 1/2*P* log *N,* where *N* is the number of observations in the data set. Fraley and Raftery (2005) reported using a BIC criterion for mixture model selection. Based on its simplicity and good behavior, their version of BIC criterion is also adopted here, with *L* replaced by the log-likelihood evaluated at * ^{MAP}*.

A one-compartment model with first-order elimination and first-order absorption is used as the model of the drug’s plasma concentrations *y _{ji}*, where

$${y}_{ji}=\frac{DK{a}_{i}}{{V}_{i}K{a}_{i}-C{L}_{i}}\left({e}^{-\frac{C{L}_{i}}{{V}_{i}}{t}_{j}}-{e}^{-K{a}_{i}{t}_{j}}\right)\phantom{\rule{0.16667em}{0ex}}\left(1+{\epsilon}_{ji}\right)$$

The model parameters *V* (*L*) and *K _{a}* (

$$C{L}_{i}{\sim}_{i.i.d.}{w}_{1}LN({\mu}_{C{L}_{1}}=5,{\sigma}_{C{L}_{1}}^{2}={(0.5{\mu}_{C{L}_{1}})}^{2})+{w}_{2}LN({\mu}_{C{L}_{2}}=10,{\sigma}_{C{L}_{2}}^{2}={(0.5{\mu}_{C{L}_{2}})}^{2})$$

with mixing weights *w*_{1} =0.3 and *w*_{2} =0.7. A sparse sampling schedule was used (m=4) with *t*_{1} =2, *t*_{2} =8, *t*_{3} =12 and *t*_{24} = 24 hrs. This is a difficult mixture problem, as is evident from inspection of the density for *CL* shown in Fig. 1, and was taken from a study of the kinetics of the drug metoprolol used by Kaila et al. (2006). The within individual error *ε _{ji}* is assumed to be i.i.d with

The MAP estimates were obtained for each of the 300 population data sets using the EM algorithm with importance sampling as described above (the lognormal kinetic parameters were transformed). Very dispersed priors were assumed for the transformed parameters as follows:

$$\begin{array}{l}{\mu}_{1}\mid {\mathrm{\sum}}_{1}\sim N(log(5),\phantom{\rule{0.16667em}{0ex}}{\mathrm{\sum}}_{1}/0.01)\phantom{\rule{0.16667em}{0ex}}\phantom{\rule{0.16667em}{0ex}}\phantom{\rule{0.16667em}{0ex}}{\mu}_{2}\mid {\mathrm{\sum}}_{2}\sim N(log(10),{\mathrm{\sum}}_{2}/0.01)\\ {({\mathrm{\sum}}_{1})}^{-1}={({\mathrm{\sum}}_{2})}^{-1}\sim \mathit{Wishart}(1,0.25)\\ ({w}_{1},{w}_{2})\sim \mathit{Dirichlet}(1,1)\\ {({\sigma}^{2})}^{-1}\sim \mathit{Gamma}(1,0)\end{array}$$

For each of the estimated parameters, its percent prediction error was calculated for each population data set as:

$${\text{pe}}_{j}=100\times ({\phi}_{j}^{\text{MAP}}-{\phi}_{j})/{\phi}_{j},j=1,\dots ,300$$

These percent prediction errors were used to calculate the mean prediction error and root mean square prediction error (RMSE) for each parameter. In addition, the individual subject classification accuracy was evaluated for each population data set.

Figure 2 plots the estimated parameter population distributions of the 300 simulated data sets, as well as the true distributions used in the simulation, while Table 1 shows the mean estimates, estimation biases and root mean square errors (RMSE). The histogram of the percent correct classification is presented in Figure 3.

True (bold lines) and estimated (thin lines) population densities of CL (upper panel), V (middle panel) and Ka (lower panel).

True population values and mean of parameter estimates (300 trials), mean percent prediction error (PE) and root mean square percent prediction error (RMSE)

The data were also analyzed assuming a single component model for *CL* (lognormal) as well as for *V* and *K _{a}* as above. Model selection was based on the Bayesian information criterion (BIC) to select between a one or two component model for

True (bold lines) and estimated (thin lines) population densities of CL for the single component analysis

In this paper, a maximum *a posteriori* probability estimation approach is presented for nonlinear random effects finite mixtures models that has application to identifying hidden subpopulations in pharmacokinetic/pharmacodynamic studies. Previously reported nonparametric Bayesian approaches to this problem (Wakefield and Walker (1997), Rosner and Mueller (1997)) have advantages over the MAP estimation approach presented herein, including calculation of the complete posterior distribution of model parameters including the number of mixing components. However, the computational challenges associated with the proper solution to the nonparametric Bayesian mixture problem are considerable, whereas the MAP estimation approach using the EM algorithm with importance sampling presented here is straightforward in comparison.

In calculating the MAP estimator, the values of the parameters defining the conjugate prior densities may be available from previous studies. When no prior information is available, these parameters can be set to reflect very disperse priors (e.g, *τ _{k}* → 0 and other parameters set as illustrated in the example presented). For linear mixture models with diffuse priors, as noted by Fraley and Raftery (2005), it is expected that the MAP results will be similar to the MLE results when they latter can be calculated. The advantage of the MAP estimator in such cases is that it avoids the unboundedness that can be associated with maximum likelihood mixture model problems (Fraley and Raftery (2005), Ormoneit and Tresp (1998)).

For determining the number of components in mixture models, several measures have been suggested, including the BIC criterion used in the example presented in this paper. In addition, *a priori* knowledge or assumptions about the biological mechanism for the modeled PK/PD polymorphism can also facilitate the model selection procedure, when combined with a model selection measure. For example, for drugs primarily metabolized by the liver, the information on hepatic cytochrome P450 family can help to decide a reasonable range for the number of clearance subgroups (e.g., K=3 accounting for extensive, intermediate, and poor metabolizer subpopulations) thus limiting the number of competing models to be tested.

This work was supported in part by National Institute of Health grants P41-EB001978 and R01-GM068968.

Let
${\phi}_{n}^{\mathit{MAP}}$ be the posterior mode of given *Y ^{n}*. Assuming that there is a “true” parameter

This result is now stated more precisely. Assume that the Hessian matrix
${A}_{n}(\phi )=\frac{{\partial}^{2}logp(\phi \mid {Y}^{n})}{\partial \phi \partial \phi}$ is invertible at
$\phi ={\phi}_{n}^{\mathit{MAP}}$. Then given the regularity conditions of Philppou and Roussas (1975) and Bernardo and Smith (2001), it can be shown that
${\mathrm{\Gamma}}_{n}^{-1/2}({\phi}_{n}-{\phi}_{n}^{ML})$ converges in distribution to a standard multivariate normal random variable as *n* → ∞, where * _{n}* ~

The computation of Γ* _{n}* proceeds as follows:

$$\frac{{\partial}^{2}logp(\phi \mid {Y}^{n})}{\partial \phi \partial \phi}=\left({\displaystyle \sum _{i=1}^{n}}\frac{{\partial}^{2}logp({Y}_{i}\mid \phi )}{\partial \phi \partial \phi}\right)+\frac{{\partial}^{2}logp(\phi )}{\partial \phi \partial \phi}$$

and

$$-{\displaystyle \sum _{i=1}^{n}}\frac{\partial logp({Y}_{i}\mid {\phi}_{n}^{\mathit{MAP}})}{\partial \phi \partial \phi}\approx {\displaystyle \sum _{i=1}^{n}}{V}_{i}({\phi}_{n}^{\mathit{MAP}})$$

where

$${V}_{i}(\phi )=\frac{\partial logp({Y}_{i}\mid \phi )}{\partial \phi}{(\frac{\partial logp({Y}_{i}\mid \phi )}{\partial \phi})}^{T}$$

It follows that

$${\mathrm{\Gamma}}_{n}\approx {\left({\displaystyle \sum _{i=1}^{n}}{V}_{i}({\phi}_{n}^{\mathit{MAP}})-\frac{{\partial}^{2}logp({\phi}_{n}^{\mathit{MAP}})}{\partial \phi \partial \phi}\right)}^{-1}.$$

Note that in the maximum likelihood setting of Wang et al (2007),
$\mathit{Cov}({\phi}_{n}^{ML})\approx {\left({\displaystyle \sum _{i=1}^{n}}{V}_{i}({\phi}_{n}^{\mathit{MAP}})\right)}^{-1}$. So the difference between MAP and ML is the term
$\frac{{\partial}^{2}logp({\phi}_{n}^{\mathit{MAP}})}{\partial \phi \partial \phi}$. As *n* → ∞, the contribution of this term diminishes.

In this asymptotic analysis the precisions (*σ*^{2})^{−1} and
${\mathrm{\sum}}_{k}^{-1}$ are considered as primary variables instead of the variances *σ*^{2} and Σ* _{k}*. This is standard practice in the normal, gamma, Wishart model. Further using
${\mathrm{\sum}}_{k}^{-1}$ instead of Σ

We first calculate *V _{i}*(),

$$\frac{\partial}{\partial \phi}logp({Y}_{i}\mid \phi )={\displaystyle \sum _{k=1}^{K}}\int \{\frac{\partial}{\partial \phi}logp({Y}_{i}\mid \phi ,\gamma ){w}_{k}\eta (\theta \mid {\mu}_{k},{\mathrm{\sum}}_{k})\}{g}_{ik}({\theta}_{i},\phi )d\theta ,$$

We have:

$${s}_{{({\sigma}^{2})}^{-1}}=\frac{\partial}{\partial {({\sigma}^{2})}^{-1}}logp({Y}_{i}\mid \phi )={\displaystyle \sum _{k=1}^{K}}\int \{(1/2){m}_{i}{\sigma}^{2}-(1/2){({Y}_{i}-{h}_{i}(\theta ))}^{T}{H}_{i}{(\theta )}^{-1}({Y}_{i}-{h}_{i}(\theta )\}{g}_{ik}(\theta ,\phi )d\theta $$

Next using the constraint: *w _{K}* = 1 –

$${s}_{{w}_{k}}=\frac{\partial}{\partial {w}_{k}}logp({Y}_{i}\mid \phi )=\frac{1}{{w}_{k}}\int {g}_{ik}(\theta ,\phi )d{\theta}_{i}-\frac{1}{{w}_{K}}\int {g}_{iK}(\theta ,\phi )d\theta .$$

And for *k* =1, …, *K*

$$\begin{array}{l}{s}_{{\mu}_{k}}=\frac{\partial}{\partial {\mu}_{k}}logp({Y}_{i}\mid \phi )=\int {g}_{ik}(\theta ,\phi )\{{{\mathrm{\sum}}_{k}}^{-1}(\theta -{\mu}_{k})\}d\theta ,\\ \frac{\partial}{\partial {\mathrm{\sum}}_{k}^{-1}}logp({Y}_{i}\mid \phi )=\int \{(1/2){\mathrm{\sum}}_{k}-(1/2)(\theta -{\mu}_{k}){(\theta -{\mu}_{k})}^{T}\}{g}_{ik}(\theta ,\phi )d\theta ,\end{array}$$

so that

$${s}_{\mathit{vech}({\mathrm{\sum}}_{k}^{-1})}=\frac{\partial}{\partial \mathit{vech}({\mathrm{\sum}}_{k}^{-1})}logp({Y}_{i}\mid \phi )=\mathit{vech}[2\frac{\partial}{\partial {\mathrm{\sum}}_{k}^{-1}}logp({Y}_{i}\mid \phi )-\mathit{Diagonal}(\frac{\partial}{\partial {\mathrm{\sum}}_{k}^{-1}}logp({Y}_{i}\mid \phi ))]$$

where *vech*(*X)* is the vector of the lower triangular components of a symmetric matrix *X.* Putting these results together produces the vector

$${s}_{i}=({s}_{{({\sigma}^{2})}^{-1}},{s}_{{w}_{1}},\dots ,{s}_{{w}_{K-1}},{s}_{{\mu}_{1}},\dots ,{s}_{{\mu}_{K}},{s}_{\mathit{vech}({\mathrm{\sum}}_{1}^{-1})},\dots ,{s}_{\mathit{vech}({\mathrm{\sum}}_{K}^{-1})}),$$

and the formula

$${V}_{i}(\phi )={({\displaystyle \sum _{i=1}^{n}}{s}_{i}{{s}_{i}}^{T})}^{-1}.$$

All the above computations can be performed during the importance sampler calculation at the final iteration of the EM algorithm.

It remains to calculate
$\frac{{\partial}^{2}logp(\phi )}{\partial \phi \partial \phi}$. For this calculation we represent the parameter as
$\phi =\{{({\sigma}^{2})}^{-1},{w}_{1},\dots ,{w}_{K-1},({\mu}_{k},{\mathrm{\sum}}_{k}^{-1}),k=1,\dots ,K\}$. Now suppose*θ* has *p* components. Then each *μ _{k}* has

$$logp(\phi )=logp({\sigma}^{-2})+logp({w}_{1},\dots {w}_{K-1})+{\sum}_{k=1}^{K}logp({\mu}_{k}\mid {\mathrm{\sum}}_{k}^{-1})+{\sum}_{k=1}^{K}logp({\mathrm{\sum}}_{k}^{-1})$$

so that
$B=\frac{{\partial}^{2}logp(\phi )}{\partial \phi \partial \phi}$ is a block diagonal matrix of the form *B* =*Diag* (*B _{σ}*

To calculate *B* the following parameterizations for the prior *p*() are assumed:

$$\begin{array}{l}p(\kappa )=c{\kappa}^{(a-1)}exp(-b\kappa ),\phantom{\rule{0.16667em}{0ex}}\phantom{\rule{0.16667em}{0ex}}\phantom{\rule{0.16667em}{0ex}}\kappa ={\sigma}^{-2}\\ p({w}_{1},\dots ,{w}_{K-1})=c{({w}_{1})}^{{a}_{1}-1}\cdots \xb7{({w}_{K})}^{{a}_{K}-1},\phantom{\rule{0.16667em}{0ex}}\phantom{\rule{0.16667em}{0ex}}\phantom{\rule{0.16667em}{0ex}}{w}_{K}=1-({w}_{1}+\dots +{w}_{K-1})\\ p({\mu}_{k}\mid {\mathrm{\sum}}_{k}^{-1})=C{(det{\mathrm{\sum}}_{k}^{-1})}^{1/2}exp((-{\tau}_{k}/2){({\mu}_{k}-{\lambda}_{k})}^{T}{\mathrm{\sum}}_{k}^{-1}({\mu}_{k}-{\lambda}_{k})\\ p({\mathrm{\sum}}_{k}^{-1})=C{(det{\mathrm{\sum}}_{k}^{-1})}^{\beta}exp(-tr({\mathrm{\Psi}}_{k}{\mathrm{\sum}}_{k}^{-1})/2),\phantom{\rule{0.16667em}{0ex}}\phantom{\rule{0.16667em}{0ex}}\phantom{\rule{0.16667em}{0ex}}\beta =({q}_{k}-d-1)/2\end{array}$$

It follows:

$$\begin{array}{l}{B}_{{\sigma}^{-2}}=\frac{{\partial}^{2}}{\partial {\sigma}^{-2}\partial {\sigma}^{-2}}logp({\sigma}^{-2})=-(a-1){\sigma}^{4}\\ {B}_{w}=\frac{{\partial}^{2}}{\partial w\partial w}logp(w),\phantom{\rule{0.16667em}{0ex}}w=({w}_{1},\dots ,{w}_{K-1})\\ {[{B}_{w}]}_{rr}=-({a}_{r}-1)/{({w}_{r})}^{2}+({a}_{K}-1)/{({w}_{K})}^{2},\phantom{\rule{0.16667em}{0ex}}r=1,\dots ,p-1\\ {[{B}_{w}]}_{rs}=({a}_{K}-1)/{({w}_{K})}^{2},\phantom{\rule{0.16667em}{0ex}}1\le r,s\le p-1,\phantom{\rule{0.16667em}{0ex}}r\ne s\\ {B}_{({\mu}_{k},{v}_{k})}=\left[\begin{array}{cc}{G}_{p\times p}& {H}_{p\times p(p+1)/2}\\ {H}^{T}& {J}_{p(p+1)/2\times p(p+1)/2}\end{array}\right]\\ G=\frac{{\partial}^{2}}{\partial {\mu}_{k}\partial {\mu}_{k}}logp({\mu}_{k}\mid {\mathrm{\sum}}_{k}^{-1})=-{\tau}_{k}{\mathrm{\sum}}_{k}^{-1}\\ H=\frac{{\partial}^{2}}{\partial {\mu}_{k}\partial \mathit{vech}({\mathrm{\sum}}_{k}^{-1})}logp({\mu}_{k}\mid {\mathrm{\sum}}_{k})=-{\tau}_{k}{D}^{T}(({\mu}_{k}-{\lambda}_{k})\otimes I)D,\\ J={J}_{1}+{J}_{2}\\ {J}_{1}=\frac{{\partial}^{2}}{\partial \mathit{vech}({\mathrm{\sum}}_{k}^{-1})\partial \mathit{vech}({\mathrm{\sum}}_{k}^{-1})}logp({\mu}_{k}\mid {\mathrm{\sum}}_{k}^{-1})=-(1/2){D}^{T}{\mathrm{\sum}}_{k}\otimes {\mathrm{\sum}}_{k}D\\ {J}_{2}=\frac{{\partial}^{2}}{\partial \mathit{vech}({\mathrm{\sum}}_{k}^{-1})\partial \mathit{vech}({\mathrm{\sum}}_{k}^{-1})}logp({\mathrm{\sum}}_{k})=-\beta {D}^{T}{\mathrm{\sum}}_{k}\otimes {\mathrm{\sum}}_{k}D\end{array}$$

In the above equations, *D* is the permutation matrix satisfying *D vech* (*X)* =*vec* (*X)*, where *vec* (*X)* is the vector of the stacked columns of the symmetric matrix *X,* and is the Kronecker product (see Minka (2000)).

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

- Bernardo JM, Smith AFM. Bayesian Theory, Section 5.3.2. Wiley; New York: 2001.
- Cruz-Mesía R, Quintana FA, Marshall G. Model-based clustering for longitudinal data. Computational Statistics & Data Analysis. 2008;52:1441–1457.
- Dempster AP, Laird N, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Roy Statist Soc B. 1977;39:1–38.
- Evans WE, Relling MV. Pharmacogenomics: Translating functional genomics into rational therapeutics. Science. 1999;286:487–491. [PubMed]
- Fraley C, Raftery AE. Technical Report no. 486. Department of Statistics, University of Washington; 2005. Bayesian regularization for Normal mixture estimation and model-based clustering. http://handle.dtic.mil/100.2/ADA454825.
- Gelfand AE, Smith AFM. Sampling-based approaches to calculating marginal densities. J of the Am Stat Assoc. 1990;85:398–409.
- Katz D, Azen SP, Schumitzky A. Bayesian approach to the analysis of nonlinear models: Implementation and evaluation. Biometrics. 1981;37:137–142.
- Kaila N, Straka RJ, Brundage RC. Mixture models and subpopulation classification: a pharmacokinetic simulation study and application to metoprolol CYP2D6 phenotype. J Pharmacokinet Pharmacodyn. 2007;34:141–156. [PubMed]
- Lemenuel-diot A, Laveille C, Frey N, Jochemsen R, Mallet A. Mixture modeling for the detection of subpopulations in a pharmacokinetic/pharmacodynamic analysis. J Pharmacokinet Pharmacodyn. 2006;34:157–181. [PubMed]
- Lunn DJ, Best N, Thomas A, Wakefield J, Spiegelhalter D. Bayesian analysis of population PK/PD models: general concepts and software. J of Pharmacokin and Pharmacodyn. 2002;29:217–307. [PubMed]
- Mentré F, Mallet A, Baccar D. Optimal design in random effect regression models. Biometrika. 1997;84:429–442.
- Minka T. Old and new matrix algebra useful for statistics. MIT Media Lab. 2000. http://research.microsoft.com/minka/papers/matrix.
- Ormoneit D, Tresp V. Averaging, maximum penalized likelihood and Bayesian estimation for improving Gaussian mixture probability density estimates. IEEE Transactions on Neural Networks. 1998;9:639–650. [PubMed]
- Pauler DK, Laird NM. A mixture model for longitudinal data with application to assessment of noncompliance. Biometrics. 2000;56:464–472. [PubMed]
- Racine-Poon A. A Bayesian approach to non-linear random effects models. Biometrics. 1985;41:1015–1024. [PubMed]
- Rosner GL, Muller P. Bayesian population pharmacokinetic and pharmacodynamic analyses using mixture models. J Pharmacokinet Biopharm. 1997;2:209–234. [PubMed]
- Sheiner LB, Halkin H, Peck C, Rosenberg B, Melmon KL. Improved computer assisted digoxin therapy: a method using feedback of measured serum digoxin concentrations. Ann Intern Med. 1975;82:619–627. [PubMed]
- Tseng P. An analysis of the EM algorithm and entropy-like proximal point methods. Math Oper Res. 2005;29:27–44.
- Wakefield JC, Smith AFM, Racine-Poon A, Gelfand AE. Bayesian analysis of linear and non-linear population models by using the Gibbs sampler. Applied Statistics. 1994;43:201–221.
- Wakefield JC, Walker SG. Bayesian nonparametric population models: formulation and comparison with likelihood approaches. J Pharmacokinet Biopharm. 1997;25:235–253. [PubMed]
- Wang X, Schumitzky A, D’Argenio DZ. Nonlinear random effects mixture models: Maximum likelihood estimation via the EM algorithm. Comput Stat Data Anal. 2007;51:6614–6623. [PMC free article] [PubMed]
- White H. Estimation, Inference and Specification Analysis. Cambridge University Press; 1994. Chapter 6.
- Wu CF. On the convergence properties of the EM algorithm. Ann Stat. 1983;11:95–103.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |