Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Comput Stat Data Anal. Author manuscript; available in PMC 2010 October 1.
Published in final edited form as:
Comput Stat Data Anal. 2009 October 1; 53(12): 3907–3915.
doi:  10.1016/j.csda.2009.04.017
PMCID: PMC2743512

Population Pharmacokinetic/Pharmacodyanamic Mixture Models via Maximum a Posteriori Estimation


Pharmacokinetic/pharmacodynamic phenotypes are identified using nonlinear random effects models with finite mixture structures. A maximum a posteriori probability estimation approach is presented using an EM algorithm with importance sampling. Parameters for the conjugate prior densities can be based on prior studies or set to represent vague knowledge about the model parameters. A detailed simulation study illustrates the feasibility of the approach and evaluates its performance, including selecting the number of mixture components and proper subject classification.

Keywords: Nonlinear random effects, EM algorithm, Importance sampling, Bayesian Analysis

1. Introduction

The use of mathematical modeling is central to the study of the absorption, distribution and elimination of therapeutic drugs and to understanding how drugs produce their effects. From its inception the field of pharmacokinetics and pharmacodynamics has incorporated methods of mathematical modeling, simulation and computation in an effort to better understand and quantify the processes of uptake, disposition and action of therapeutic drugs. These methods for pharmacokinetic/pharmacodynamic (PK/PD) systems analysis impact all aspects of drug development including in vitro, animal and human testing, as well as drug therapy. Modeling methodologies developed for studying pharmacokinetic/pharmacodynamic processes confront many challenges related in part to the severe restrictions on the number and type of measurements that are available from laboratory experiments and clinical trials, as well as the variability in the experiments and the uncertainty associated with the processes themselves.

Since their initial application to pharmacokinetics in the 1970’s, Bayesian methods have provided a framework for PK/PD modeling in drug development that can address some of the above-mentioned challenges. Sheiner et al. (1975) applied Bayesian estimation (maximum a posteriori probability estimation, MAP) to combine population information with drug concentration measurements obtained in an individual, in order to determine a patient-specific dosage regimen. Katz, Azen and Schumitzky (1981) reported the first efforts to calculate the complete posterior distribution of model parameters in a nonlinear pharmacokinetic individual estimation problem, for which they used numerical quadrature methods to perform the needed multi-dimensional integrations.

The population PK/PD problem has also been cast in a Bayesian framework, initially by Racine-Poon (1985) using a two-stage approach, and more generally by Wakefield et al. (1994). Solution to this computationally demanding problem is accomplished through the application of Markov chain Monte Carlo methods pioneered by Gelfand and Smith (1990) and now available in general purpose software (see Lunn et al., 2002 for discuss relevant to PK/PD population modeling).

Population PK/PD modeling, including Bayesian approaches, are used in drug development to identify the influence of measured covariates (e.g., demographic factors and disease status) on drug kinetics and response. It is now recognized, however, that genetic polymorphisms in drug metabolism and in the molecular targets of drug therapy can also influence the efficacy and toxicity of medications (Evans and Relling, 1999). Population modeling approaches that can identify and model distinct subpopulations not related to available measured covariates may, therefore, help determine otherwise unknown genetic and other determinants of observed pharmacokinetic/pharmacodynamic phenotypes.

We have previously reported on a maximum likelihood approach using finite mixture models to identify subpopulations with distinct pharmacokinetic/pharmacodynamic properties (Wang et al., 2007). Wakefield and Walker (1997) and Rosner and Mueller (1997) have introduced Bayesian approaches to address this problem within a nonparametric mixture model framework. In this paper a computationally practical maximum a posteriori probability estimation (MAP) approach is proposed using finite mixture models. Section 2 of this paper describes the finite mixture model within a nonlinear random effects framework and defines the MAP estimation problem. A solution via the EM algorithm is presented in Section 3. Subject classification and model selection issues are discussed in Section 4. An example and detailed simulation study are presented in Section 5 and Section 6 contains a discussion. The Appendix discusses important asymptotic properties of the MAP estimator, further motivating its use, and derives the formulas for the asymptotic covariance.

2. Nonlinear Random Effects Finite Mixture Model and MAP Estimation Problem

A two-stage nonlinear random effects model that incorporates a finite mixture model is given by




where i =1, …, n indexes the individuals and k =1, …, K indexes the mixing components. The alternate problem formulation presented in Cruz-Mesía et al. (2008) and in Pauler and Laird (2000), also results in the solution outlined below.

At the first stage represented by (1), Yi = (y1i, …, ymii)T is the observation vector for the ith individual (Yi [set membership]Rmi); hi(θi) is the function defining the pharmacokinetic/pharmacodynamic (PK/PD) model, including subject specific variables (e.g., drug doses); θi is the vector of model parameters (random effects) (θi [set membership] Rp); and Gi(θi, β) is a positive definite covariance matrix (Gi [set membership] Rmi×mi) that may depend upon θi as well as on other parameters β (fixed effects) (β [set membership] Rq).

At the second stage given by (2), a finite mixture model with K multivariate normal components is used to describe the population distribution. The weights {wk}are nonnegative numbers summing to one, denoting the relative size of each mixing component (subpopulation), for which μk (μk [set membership] Rp) is the mean vector and Σkk [set membership] Rp×p) is the positive definite covariance matrix.

Letting [var phi] represent the collection of parameters {β, (wk, μk, Σk), k = 1, …, K}, the population problem involves estimating [var phi] given the observed data{Y1, …, Yn}. If [var phi] is regarded as a random variable with a known prior distribution p([var phi]), the maximum a posteriori probability (MAP) estimate ([var phi]MAP) is the mode of the posterior distribution p([var phi]|Yn):


where Yn={Y1, …, Yn}. The MAP estimator enjoys the same large sample properties as the ML estimator, namely consistency and asymptotic normality. In addition, the MAP objective function is a regularization of the likelihood function and as such avoids the well-documented singularities and degeneracies of mixture models (see, Fraley and Raftery (2005) and Ormoneit and Tresp(1998)).

For mixture of normal distributions, a multivariate normal prior on the mean for each mixing component is given by:


where λk and τ k can be viewed as the mean and shrinkage respectively. An inverse Wishart prior with degrees of freedom qk and scale matrix Ψk is assigned to each covariance component:


The mixing weights have a Dirichlet distribution as the prior:


We consider the model given by (1) and (2) for the important case Gi(θi, β) = σ2Hi(θi), where Hi(θi) is a known function and β = σ2. The prior for σ2 is an inverse Gamma distribution:


These densities are conjugate priors to the multivariate normal mixtures (see appendix for the parameterizations used).

The hyper-parameters {a, b, (ak,λk,τk,qkk), k = 1, …, K} can be based on prior studies or set to represent vague knowledge about the parameters (see below).

3. Solution via the EM Algorithm

The EM algorithm, originally introduced by Dempster, Laird and Rubin (1977), is used to perform iterative computation of maximum likelihood (ML) estimates. It is applied below to solve the MAP estimation problem defined in the previous section.

The component label vector zi is introduced as a K dimensional indicator such that zi(k) is one or zero depending on whether or not the parameter θi arises from the kth mixing component. Letting [var phi](r) ={β,(wk(r), μk(r), Σk(r), k = 1, …, K)} represent the parameters at the rth iteration of the EM algorithm, the E step computes a conditional posterior expectation, given by


where logLc(φ)=i=1nk=1Kzi(k)log(wkp(Yi,θiσ2,μk,k)). In the M step, the posterior mode [var phi](r+1) is estimated as the optimizer of Q([var phi],[var phi](r)) such that φ(r+1)=argmaxφQ(φ,φ(r)). Under the prior defined above, the updating process of the M step is:




where gik(θi,φ)=wkp(Yiσ2,θi)p(θiμk,k)k=1Kwkp(Yiσ2,θi)p(θiμk,k)dθifork=1,,K and d = dim(θi).

The E and M steps are repeated until convergence. Discussions on the sufficient conditions for the convergence can be found at Dempster et al. (1977), Wu (1983) and Tseng (2005). In the appendix, an error analysis for [var phi]MAP is presented.

In order to implement the algorithm, all the integrals in the updating equations must be evaluated at each iterative step. For the maximum likelihood mixture model problem we have successfully used importance sampling to calculate the corresponding integrals in the EM algorithm (see Wang et al. 2007). The same method can be applied here and was used in the examples presented below. In brief, an envelope function pe(k) is selected for each mixing component and for each subject, so that a number of individual specific random samples are taken from pe(k):θi(k)(1),,θi(k)(T)θi). As all the integrals in the algorithm share the form ∫f(θi)gik (θi, [var phi])dθi, they are approximated as follows:


The envelope distribution is taken to be a multivariate normal density using the subject’s previously estimated mixing-component specific conditional mean (i.e., ∫θigik (θi, [var phi])dθi) and conditional covariance (i.e., ∫(θiμk)(θiμk)T gik (θi, [var phi])dθi) as its mean and covariance. The number of independent samples (T) depends on the complexity of the model and the required accuracy in the integral approximations.

4. Subpopulation Classification and Model Selection

Assigning each individual subject to a subpopulation follows the same method as presented in Wang et al. (2007) for ML estimation. The E-step computes the posterior probability that subject i belongs to the kth subgroup:


and each individual is classified to the subpopulation associated with the highest posterior probability of membership. For example, for each i, (i =1, …, n), set zi (k) = 1 if k=argmaxcτi(c) or to zero otherwise. No additional computation is required since all the τ i (k) are evaluated at each EM step.

Determining the number of mixing components K in a finite mixture model is an important but difficult problem. To compare two non-nested models, in contrast to likelihood ratio procedures for comparing nested models, several criteria have been proposed. For example, Lemenuel-diot et al. (2006) based such model selection on the Kullback-Leibler test, with the null hypothesis of p0 components versus the alternative hypothesis of p1 components (p1 > p0). Other criteria are based on a penalized objective function. For two models A and B with different values of K, say KA and KB, one would compare objective functions QA and QB, and prefer the model with the larger value. The Akaike Information Criterion (AIC) takes the form AIC = LP, where L is the maximized log-likelihood for the model and P is the total number of estimated parameters. The Bayesian information criterion (BIC) is defined as BIC = L − 1/2P log N, where N is the number of observations in the data set. Fraley and Raftery (2005) reported using a BIC criterion for mixture model selection. Based on its simplicity and good behavior, their version of BIC criterion is also adopted here, with L replaced by the log-likelihood evaluated at [var phi]MAP.

5. Example

A one-compartment model with first-order elimination and first-order absorption is used as the model of the drug’s plasma concentrations yji, where


The model parameters V (L) and Ka (hr−1) were assumed to be independent with Vii.i.d.LN(μV=50,σV2=102) and Kaii.i.d.LN(μKa=1,σKa2=0.22). The drug’s clearance CL (L/hr) was assumed to be a mixture of two lognormal densities:


with mixing weights w1 =0.3 and w2 =0.7. A sparse sampling schedule was used (m=4) with t1 =2, t2 =8, t3 =12 and t24 = 24 hrs. This is a difficult mixture problem, as is evident from inspection of the density for CL shown in Fig. 1, and was taken from a study of the kinetics of the drug metoprolol used by Kaila et al. (2006). The within individual error εji is assumed to be i.i.d with σ2 =0.12. The number of samples was reduced from the six sample times used by Kaila et al. (1, 2, 4, 8, 12, 24 hrs) to the four sample times used here in order to provide an even more challenging test. The number and timing of samples can influence estimation results and formal approaches to sample schedule design for population studies are available (Mentré et al., (1997)). A total of 300 population data sets were simulated from this model each consisting of n =50 subjects.

Fig. 1
Simulated population density of clearance CL

The MAP estimates were obtained for each of the 300 population data sets using the EM algorithm with importance sampling as described above (the lognormal kinetic parameters were transformed). Very dispersed priors were assumed for the transformed parameters as follows:


For each of the estimated parameters[var phi], its percent prediction error was calculated for each population data set as:


These percent prediction errors were used to calculate the mean prediction error and root mean square prediction error (RMSE) for each parameter. In addition, the individual subject classification accuracy was evaluated for each population data set.

Figure 2 plots the estimated parameter population distributions of the 300 simulated data sets, as well as the true distributions used in the simulation, while Table 1 shows the mean estimates, estimation biases and root mean square errors (RMSE). The histogram of the percent correct classification is presented in Figure 3.

Fig. 2
True (bold lines) and estimated (thin lines) population densities of CL (upper panel), V (middle panel) and Ka (lower panel).
Fig. 3
Histogram of percent correct classification.
Table 1
True population values and mean of parameter estimates (300 trials), mean percent prediction error (PE) and root mean square percent prediction error (RMSE)

The data were also analyzed assuming a single component model for CL (lognormal) as well as for V and Ka as above. Model selection was based on the Bayesian information criterion (BIC) to select between a one or two component model for CL in each of the 300 population data sets. The estimated densities of clearance are displayed in Figure 4, while the detailed estimation results are given in Table 2. Based on BIC values, the two-component lognormal model for CL was correctly selected in 198 out of the 300 population trials.

Fig. 4
True (bold lines) and estimated (thin lines) population densities of CL for the single component analysis
Table 2
True population values and mean of parameter estimates (300 trials) for the single component model

6. Discussion

In this paper, a maximum a posteriori probability estimation approach is presented for nonlinear random effects finite mixtures models that has application to identifying hidden subpopulations in pharmacokinetic/pharmacodynamic studies. Previously reported nonparametric Bayesian approaches to this problem (Wakefield and Walker (1997), Rosner and Mueller (1997)) have advantages over the MAP estimation approach presented herein, including calculation of the complete posterior distribution of model parameters including the number of mixing components. However, the computational challenges associated with the proper solution to the nonparametric Bayesian mixture problem are considerable, whereas the MAP estimation approach using the EM algorithm with importance sampling presented here is straightforward in comparison.

In calculating the MAP estimator, the values of the parameters defining the conjugate prior densities may be available from previous studies. When no prior information is available, these parameters can be set to reflect very disperse priors (e.g, τk → 0 and other parameters set as illustrated in the example presented). For linear mixture models with diffuse priors, as noted by Fraley and Raftery (2005), it is expected that the MAP results will be similar to the MLE results when they latter can be calculated. The advantage of the MAP estimator in such cases is that it avoids the unboundedness that can be associated with maximum likelihood mixture model problems (Fraley and Raftery (2005), Ormoneit and Tresp (1998)).

For determining the number of components in mixture models, several measures have been suggested, including the BIC criterion used in the example presented in this paper. In addition, a priori knowledge or assumptions about the biological mechanism for the modeled PK/PD polymorphism can also facilitate the model selection procedure, when combined with a model selection measure. For example, for drugs primarily metabolized by the liver, the information on hepatic cytochrome P450 family can help to decide a reasonable range for the number of clearance subgroups (e.g., K=3 accounting for extensive, intermediate, and poor metabolizer subpopulations) thus limiting the number of competing models to be tested.


This work was supported in part by National Institute of Health grants P41-EB001978 and R01-GM068968.

Appendix: Asymptotic Properites

Let φnMAP be the posterior mode of [var phi] given Yn. Assuming that there is a “true” parameter [var phi]0 which lies in the interior of the support of p([var phi]), it can be shown under suitable hypotheses that φnMAP is consistent and asymptotically normal (White, 1994). However, from a Bayesian perspective, it is more natural to investigate the asymptotic behavior of the conditional density p([var phi]|Yn). Under the assumptions stated below, it is found that p([var phi]|Yn) is asymptotically normal with mean φnMAP. This gives another justification for calculating the MAP estimate.

This result is now stated more precisely. Assume that the Hessian matrix An(φ)=2logp(φYn)φφ is invertible at φ=φnMAP. Then given the regularity conditions of Philppou and Roussas (1975) and Bernardo and Smith (2001), it can be shown that Γn1/2(φnφnML) converges in distribution to a standard multivariate normal random variable as n → ∞, where [var phi]n ~ p([var phi]|Yn) and Γn1=An(φnMAP). It follows that asymptotically, φnMAP and Γn are the posterior mean and posterior covariance of p([var phi]|Yn). For any component of φnMAP, the posterior standard error is the corresponding diagonal element of Γn.

The computation of Γn proceeds as follows:






It follows that


Note that in the maximum likelihood setting of Wang et al (2007), Cov(φnML)(i=1nVi(φnMAP))1. So the difference between MAP and ML is the term 2logp(φnMAP)φφ. As n → ∞, the contribution of this term diminishes.

In this asymptotic analysis the precisions (σ2)−1 and k1 are considered as primary variables instead of the variances σ2 and Σk. This is standard practice in the normal, gamma, Wishart model. Further using k1 instead of Σk greatly simplifies the second derivative calculations.

We first calculate Vi([var phi]), i =1, …, n. The formulas below are similar to those given in given in Wang, et al (2007). The gradient components are calculated using the relation:


We have:


Next using the constraint: wK = 1 – w1 – … – wK−1, we have for k =1, …, K − 1


And for k =1, …, K


so that


where vech(X) is the vector of the lower triangular components of a symmetric matrix X. Putting these results together produces the vector


and the formula


All the above computations can be performed during the importance sampler calculation at the final iteration of the EM algorithm.

It remains to calculate 2logp(φ)φφ. For this calculation we represent the parameter [var phi] as φ={(σ2)1,w1,,wK1,(μk,k1),k=1,,K}. Now supposeθ has p components. Then each μk has p components and each k1 has p(p+1)/2 unique components. Therefore 2logp(φ)φφ is an R×R matrix, where R = 1 + (p − 1) + Kp + Kp(p+1)/2. Now


so that B=2logp(φ)φφ is a block diagonal matrix of the form B =Diag (Bσ−2; Bw; B(μkνk), k =1, …, K), νk=vech(k1).

To calculate B the following parameterizations for the prior p([var phi]) are assumed:


It follows:


In the above equations, D is the permutation matrix satisfying D vech (X) =vec (X), where vec (X) is the vector of the stacked columns of the symmetric matrix X, and [multiply sign in circle] is the Kronecker product (see Minka (2000)).


Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


  • Bernardo JM, Smith AFM. Bayesian Theory, Section 5.3.2. Wiley; New York: 2001.
  • Cruz-Mesía R, Quintana FA, Marshall G. Model-based clustering for longitudinal data. Computational Statistics & Data Analysis. 2008;52:1441–1457.
  • Dempster AP, Laird N, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J Roy Statist Soc B. 1977;39:1–38.
  • Evans WE, Relling MV. Pharmacogenomics: Translating functional genomics into rational therapeutics. Science. 1999;286:487–491. [PubMed]
  • Fraley C, Raftery AE. Technical Report no. 486. Department of Statistics, University of Washington; 2005. Bayesian regularization for Normal mixture estimation and model-based clustering.
  • Gelfand AE, Smith AFM. Sampling-based approaches to calculating marginal densities. J of the Am Stat Assoc. 1990;85:398–409.
  • Katz D, Azen SP, Schumitzky A. Bayesian approach to the analysis of nonlinear models: Implementation and evaluation. Biometrics. 1981;37:137–142.
  • Kaila N, Straka RJ, Brundage RC. Mixture models and subpopulation classification: a pharmacokinetic simulation study and application to metoprolol CYP2D6 phenotype. J Pharmacokinet Pharmacodyn. 2007;34:141–156. [PubMed]
  • Lemenuel-diot A, Laveille C, Frey N, Jochemsen R, Mallet A. Mixture modeling for the detection of subpopulations in a pharmacokinetic/pharmacodynamic analysis. J Pharmacokinet Pharmacodyn. 2006;34:157–181. [PubMed]
  • Lunn DJ, Best N, Thomas A, Wakefield J, Spiegelhalter D. Bayesian analysis of population PK/PD models: general concepts and software. J of Pharmacokin and Pharmacodyn. 2002;29:217–307. [PubMed]
  • Mentré F, Mallet A, Baccar D. Optimal design in random effect regression models. Biometrika. 1997;84:429–442.
  • Minka T. Old and new matrix algebra useful for statistics. MIT Media Lab. 2000.
  • Ormoneit D, Tresp V. Averaging, maximum penalized likelihood and Bayesian estimation for improving Gaussian mixture probability density estimates. IEEE Transactions on Neural Networks. 1998;9:639–650. [PubMed]
  • Pauler DK, Laird NM. A mixture model for longitudinal data with application to assessment of noncompliance. Biometrics. 2000;56:464–472. [PubMed]
  • Racine-Poon A. A Bayesian approach to non-linear random effects models. Biometrics. 1985;41:1015–1024. [PubMed]
  • Rosner GL, Muller P. Bayesian population pharmacokinetic and pharmacodynamic analyses using mixture models. J Pharmacokinet Biopharm. 1997;2:209–234. [PubMed]
  • Sheiner LB, Halkin H, Peck C, Rosenberg B, Melmon KL. Improved computer assisted digoxin therapy: a method using feedback of measured serum digoxin concentrations. Ann Intern Med. 1975;82:619–627. [PubMed]
  • Tseng P. An analysis of the EM algorithm and entropy-like proximal point methods. Math Oper Res. 2005;29:27–44.
  • Wakefield JC, Smith AFM, Racine-Poon A, Gelfand AE. Bayesian analysis of linear and non-linear population models by using the Gibbs sampler. Applied Statistics. 1994;43:201–221.
  • Wakefield JC, Walker SG. Bayesian nonparametric population models: formulation and comparison with likelihood approaches. J Pharmacokinet Biopharm. 1997;25:235–253. [PubMed]
  • Wang X, Schumitzky A, D’Argenio DZ. Nonlinear random effects mixture models: Maximum likelihood estimation via the EM algorithm. Comput Stat Data Anal. 2007;51:6614–6623. [PMC free article] [PubMed]
  • White H. Estimation, Inference and Specification Analysis. Cambridge University Press; 1994. Chapter 6.
  • Wu CF. On the convergence properties of the EM algorithm. Ann Stat. 1983;11:95–103.