|Home | About | Journals | Submit | Contact Us | Français|
Empirical Bayes (“post hoc”) estimates (EBEs) of ηs provide modelers with diagnostics: the EBEs themselves, individual prediction (IPRED), and residual errors (individual weighted residual (IWRES)). When data are uninformative at the individual level, the EBE distribution will shrink towards zero (η-shrinkage, quantified as 1-SD(ηEBE)/ω), IPREDs towards the corresponding observations, and IWRES towards zero (ε-shrinkage, quantified as 1-SD(IWRES)). These diagnostics are widely used in pharmacokinetic (PK) pharmacodynamic (PD) modeling; we investigate here their usefulness in the presence of shrinkage. Datasets were simulated from a range of PK PD models, EBEs estimated in non-linear mixed effects modeling based on the true or a misspecified model, and desired diagnostics evaluated both qualitatively and quantitatively. Identified consequences of η-shrinkage on EBE-based model diagnostics include non-normal and/or asymmetric distribution of EBEs with their mean values (“ETABAR”) significantly different from zero, even for a correctly specified model; EBE–EBE correlations and covariate relationships may be masked, falsely induced, or the shape of the true relationship distorted. Consequences of ε-shrinkage included low power of IPRED and IWRES to diagnose structural and residual error model misspecification, respectively. EBE-based diagnostics should be interpreted with caution whenever substantial η- or ε-shrinkage exists (usually greater than 20% to 30%). Reporting the magnitude of η- and ε-shrinkage will facilitate the informed use and interpretation of EBE-based diagnostics.
Model-based analysis has lately become a method of choice for analyzing the outcome from clinical trials in drug development. The model(s) explaining underlying pharmacokinetics (PK) and pharmacodynamics (PD) of a therapeutic agent as well as disease progression is used for data description, interpretation, and simulation of future scenarios, as well as decision-making. Thus, the choice of an adequate model is a crucial but often not straightforward procedure. The model building process is difficult and involves testing, evaluating, and diagnosing a range of plausible models with a major aim to make an adequate inference from the observed data, summarized as parameter estimates, variability estimates, and significant covariates (1).There are several tools assisting modelers in a model building process, including graphical diagnostics (1–3).
Graphical displays are extensively used during the model building process and are considered an essential tool for data visualization, inspection of model adequacy, and assumption testing. Graphical diagnostics are considered a powerful and intuitive visual method to be used not only by modelers but also as a communicating tool. The US Food and Drug Administration and the European Committee for Medicinal Products for Human use guidances explicitly mention the utility of graphical diagnostics. Specifically mentioned in both guidances are graphical diagnostics based on individual parameter estimates (4–6). The individual parameter estimates in non-linear mixed effects are estimated using the Bayesian methodology, and they are generally referred to as an empirical Bayes estimates (EBEs). The use of the phrase ‘empirical Bayes’ emphasizes that the parameters for the prior distribution are estimated from the data and are used as if they were known to obtain the posterior distribution. At one extreme, with no observations available, the patient will be regarded as a typical patient. At the other extreme, when data for an individual goes towards infinity, the prior will have marginal impact; in between these extremes, both factors will contribute, and depending on the relative variability (including both, between, and within subject variability), individual estimates could be closer either to the population mean or to the true individual parameter value (7–9).
Although the advantages and strengths of graphical diagnostics have been justified and stressed on several occasions, shortcomings have not been systematically explored (10). In this work, we investigate the informative value of the EBE-based diagnostics and how their usefulness depends on the individual data richness. Additionally, circumstances which are likely to misguide modelers towards making erroneous decisions in model development, relating to choice of structural, covariate, and stochastic model components, are underlined.
Derivation of individual parameter estimates (EBEs), denoted for an individual j as a vector of s model parameters Pj=(pj1,pj2…pjs), requires quantities such as (1) typical individual estimates denoted as a vector of θ=(θ1,…θs); (2) the magnitude of inter-individual variability of parameters in question, variances (ω12,…,ωs2), and covariances of the Ω-matrix; as well as (3) the magnitude (variance) of residual variability (σ2) to be known. The number of variance parameters (ω2) does not necessarily need to be the same as the number of typical parameter estimates (θ). Then, in individual j, the kth parameter’s deviation from the typical parameter value in the population can be denoted ηjk. In parametric non-linear mixed effect modeling, the shape of the distribution of the ηs is assumed to be normal. Commonly, to avoid negative parameter values, the parameter distribution is log-normally transformed using the relationship described in Eq. 1 where ηjk enters nonlinearly into the expression for the parameter in order to appropriately mimic the parameter distribution shape in the population. Also, as Eq. 1 is straightforward, the ηjk value is often referred to as an individual parameter. Thus, considering Eq. 1, the vector ηk will be used as an indicator for the individual parameter throughout the remainder of this communication.
For individual j, individual parameter can be estimated from the observed data vector yji=(yj1, yj2,…,yjn), n being the number of observations within an individual and known prior parameter distribution. Detailed derivation of this estimation procedure is available elsewhere (7,11).
With ŷji, we denote a model prediction of yji, defined as a function of parameter vector Pj and Xji, Xji being the vector of independent variables (such as time and dose covariates) related to yji (Eq. 2).
In this communication, ŷji is referred to as an individual prediction (IPRED) and to ηEBE,jk as the EBE. Another useful term, the individual weighted residual (IWRES), is calculated following Eq. 3. By using the standard deviation of the residual variability as a weighting factor, the IWRES distribution is standardized to have a zero mean and unit variance expectation when the model and parameters are correct at the individual level.
If the population model is adequate, the quality of the individual parameter estimate will depend heavily on the observed data available. When data are sparse and less informative on individual parameters, it is expected that the EBEs will deviate less from the population mean, which will result in a difference in the EBE and priors distribution, mainly in terms of decreased variance of EBEs but also with possible distortions of the distribution shape. In an extreme case of no data available on a particular individual, the individual’s EBE will be equal to the population value. So, the variance of EBE distribution is shrinking towards zero as the quantity of information at the individual level diminishes, a phenomenon defined as η-shrinkage (shη). Similarly, as individual parameter estimate tend towards the true individual parameter in case of informative data, IWRES distribution approaches a normal distribution with zero mean and unit variance. Conversely, as data diminishes, the IWRES distribution shrinks towards zero. We define this phenomenon as ε-shrinkage (shε) and is sometimes also called “overfitting”.
η- and ε-shrinkage are calculated using Eqs. 4 and 5. A shrinkage magnitude of zero, i.e., no shrinkage, corresponds to the situation when the model is correct and individual data is sufficiently abundant so that individual parameter estimates mimic the true individual parameter estimates, and a magnitude of one is the opposite case when data contain virtually no information about the parameters in question and the individual parameter values approach the typical parameter value.
The influence of shrinkage on EBE-based diagnostics was assessed via ordinary (n≤10) or Monte Carlo (MC; n=100) PK or PD model simulation–estimation procedures using non-linear mixed effects modeling (NONMEM; 12). The model summary is given in Table I. Data sets with different information content were created by altering the number (sample frequency) and location (sampling time) of samples. Each scenario defined with a certain model, and study design underwent multiple simulations. If not specified otherwise below, each simulated dataset contained 1,000 subjects. Thereafter, the EBEs were estimated using either the true and/or an intentionally misspecified model for each dataset using Bayesian estimation (i.e., MAXEVAL=0 option) in NONMEM. In this estimation, the prior information, i.e., population parameter estimates, variances, covariances, and residual variability were fixed to the same values used for simulations of the data sets, whenever misspecification was not introduced.
Based on estimated EBEs, IPREDs, and IWRES, graphical diagnostics were created and subsequently inspected for trends or indications of model misspecification. Also, the extent of η- and ε-shrinkage was calculated for these estimated EBE or IWRES distributions. Overall, the relationship between the extent of shrinkage and informativeness of diagnostics was assessed both qualitatively, i.e., by visual inspection of whether or not the resulting plot indicated the expected pattern, and quantitatively, i.e., to determine at what shrinkage magnitude a particular diagnostics lose its utility.
Commonly used EBE diagnostics were explored. A brief description of these diagnostics as well as the methodology used to study their informativeness for each case is given below.
A visual diagnostic often used to inspect the EBE distribution and to confirm the normality assumption is either a histogram or, preferably, a quantile–quantile-plot (qq-plot), where deviations from normality can be detected.
Multiple datasets (n=10 for each unique design) were simulated from a PK model (one compartment model with first order absorption, model 4) using different study designs with respect to sampling frequency (2–10) and sampling times. The EBEs were estimated using the true model for each dataset. EBE and IWRES distributions were visualized and compared to the true normal distribution using qq-plots.
A hypothesis test if the mean value of EBEs is significantly different from zero is performed at each run, and the p value of this test is reported in the so-called “ETABAR outcome” from NONMEM. A significant ETABAR is considered as an indication for a potential model misspecification.
Data sets were simulated from one compartment intravenous bolus model (model 1) and one compartment model with first order absorption (model 4) to study this diagnostic. The simulated datasets comprised two and three samples each for models 1 and 2, respectively. Two conditions were studied: (1) dependence of ηCL ETABAR on the time of the last sample in model 1 and (2) dependence of the ηka ETABAR on the time of the first sample in model 2. For condition 1, seven (7) MC simulation–estimation studies were performed using different study designs in which the last sample was positioned at the time equal to the 0.67, 1, 2, 3, 4, 6, and 8 of the disposition half-life (t1/2=3 h) for seven MC studies. Similarly, for condition 2, eight MC simulation–estimation studies was performed in which the first samples was positioned to be equal to the 0.25, 0.5, 0.75, 1, 1.5, 2, 2.5, and 4 absorption half-lives (t1/2abs=1 h). Each MC involved 100 replicates of simulated datasets. The true model was fitted to the simulated data using First Order Conditional Estimation method (FOCE) in NONMEM VI.
This diagnostic is often used for inspection of the correlation between individual parameter estimates. If the plot indicates a markedly non-zero correlation for a model where the population correlation value was fixed to zero, the correlation is generally tested in the model for its significance. If the plot indicates zero correlation for a model where the population correlation value was also zero, the correlation is often not tested for its significance.
Two conditions were studied: the value of EBEs to (1) indicate correlation if it is truly present and/or (2) correctly determining when the correlation is not present. Condition 1 was studied using model 1, in which the true correlation between individual parameters of ηCL and ηV was 0.75. Datasets were simulated using this model and different study designs. For each design, ten replicates were simulated. The model with true parameter estimates but no correlation was used to estimate EBEs. The expectation is that EBE versus EBE plot will indicate parameter correlation. Correlation between EBEs values for ηCL and ηV was computed for each EBE versus EBE plot. Condition 2 was studied using model 4 and model 9 (an Emax model). In these models, the true correlation between all individual parameters was zero. All study conditions were equivalent to the study described above; however, the true model with no correlation was used to estimate EBEs.
This diagnostic is used for inspection of relationships between parameters and covariates. When a large number of covariates are available, this diagnostic enables a quick screening of relationships. Only indicated relationships are often subsequently estimated directly in the population model. EBEs are also used in automated procedures such as the generalized additive modeling (13) and to investigate the shape of the covariate-parameter relationship.
The utility of EBEs to correctly indicate the EBE-covariate relationship when (1) it is truly present and (2) when the covariate relationship is not present were studied using model 4. A direct relationship between typical value of volume of distribution and weight was used for simulations. Multiple datasets were simulated under different study designs, where early observations were censored sequentially. A model with no covariate relationship was fitted to the data and used to estimate EBEs which were subsequently plotted versus the covariate (weight). Correlation between EBEs and weight as well as the slope of EBE versus weight relationships were calculated for each data set.
This is a commonly used diagnostic for detection of (structural) model misspecification. Its usefulness, compared to the DV versus population prediction (PRED) plot, comes from separating different sources of variability.
A range of models was used to simulate multiple datasets, which were subsequently analyzed with structurally misspecified models. The examples involve the following pairs of models (first model in a pair is used for simulation, second for estimation): (1) sigmoidal Emax model–Emax model (models 9 and 8), (2) transit compartment absorption model–first order absorption model (model 6 and 4), (3) two compartment model–one compartment model (models 3 and 1), (4) one compartment model with first order absorption–one compartment model with zero order absorption (models 4 and 5), and (5) one compartment model with Michaelis–Menten elimination–one compartment model with linear elimination (models 2 and 1). For each condition, multiple datasets with different number of observations were simulated. IPREDs were derived from the misspecified model.
This is a commonly used diagnostic for assessing the residual error model. An appropriate residual error model should be associated with a horizontal regression line with no slope.
Multiple datasets based on different study designs where observations were censored randomly were simulated using model 8 with combined proportional and additive error model and subsequently fitted with the misspecified model containing proportional error model only. Absolute values of IWRES were plotted versus IPRED for determination of residual model misspecification. The expectation was that IWRES values would indicate model misspecification visualized as a linear regression line being not horizontal (e.g., having slope different than 0). The slope of this regression line was calculated for each case.
Relationship between η- and ε-shrinkage was studies using model 7. First, rich datasets were simulated (number of observations=25). Then, sparse datasets were created by censoring observations within subjects either randomly or uniformly (same observation deleted for all subjects), keeping the total number of observations the same for all subjects. This procedure was repeated until two observations per subject remained. For each dataset, EBEs were estimated using the true model and true parameter estimates. η- and ε-shrinkage were calculated and inspected for relationships.
In general, in the absence of shrinkage, all EBE-based diagnostics are powerful model evaluation tools. These diagnostics separate different sources of variability, thus graphs become easier to interpret. However, in the presence of shrinkage, all the discussed diagnostics begin losing their informativeness and could potentially become misleading as indicated in our detailed evaluation of (1) consequences of η-shrinkage, (2) consequences of ε-shrinkage, and (3) relationship between η- and ε-shrinkage.
EBEs may, in addition to shrinkage, show non-normal distribution even when the true underlying η distribution is normal. A pattern of increasing deviations from normality in the qq-plot with increasing shrinkage, exemplified in Fig. 1 (left panels) for EBE of ηCL, was evident for all parameters of the studied model (model 4). The larger the shrinkage, the larger are the observed deviations from normality.
As a result of asymmetric η-shrinkage, mean values of EBEs (“ETABAR”) may be significantly different from zero, even for a correctly specified model. The relationship between ETABAR for ηka and ηCL distributions and the time of the first or last sample, respectively, is shown in Fig. 2. For ηka, when the first sample is taken early enough, i.e., informative on this parameter, ETABAR is close to zero, as expected. If the first sample is taken later in time, this sample is not particularly informative for individuals that have fast absorption because their absorption process is essentially complete by that time, but it will still be informative for individuals that have slow absorption; thus, shrinkage is expected to happen mainly for individuals with high ka values, which leads to asymmetric shrinkage resulting in ETABAR values significantly different from zero. A similar phenomenon is evident for the ηCL EBE distribution (right panel of Fig. 2).
When EBEs are used for inspections for parameter correlations, they can indicate correlation when correlation truly does not exist (upper panel of Fig. 3), and oppositely, correlation when truly present would sometimes not been detected by EBEs (lower panel of Fig. 3). The upper-left corner and lower-left corner panels of Fig. 3 represent the true parameter relationships, while other panels show relationships based on the EBEs estimated under the true model for different study designs. As Fig. 4 indicates, commonly induced correlations include those between ηEC50~ηEmax and ηka~ηV. The quantitative relationship between shrinkage extent and informativeness of these plots is shown in Fig. 4. Both phenomena, induced and hidden correlations, become apparent when shrinkage is higher than 20–30%.
Similarly, when EBEs are used for a search of significant covariates, they can obscure relationships, show distorted shape, or even in certain circumstances falsely indicate relationship, when relationship truly does not exist (Figs. 5 and and6).6). The left panel of Fig. 6 shows falsely induced relationship between ηka and weight, when the true covariate relationship was between volume and weight. The right panel of Fig. 6 shows how an existing relationship between weight and ηV is less evident, the higher the shrinkage in ηV. The observed pattern appears due to induced correlation between ηka and ηV.
The IWRES distribution may in the presence of ε-shrinkage show non-normal distribution even when the underlying model is correct (Fig. 1, right panel).
The power of IPREDs to detect model misspecification is reduced in the presence of ε-shrinkage. Figure 7 shows the DV versus IPRED diagnostic when a misspecified model (zero order absorption, model 5) was applied to data simulated with first order absorption (model 5) in two scenarios: with rich data resulting with low ε-shrinkage and with sparse data resulting in high η- and ε-shrinkage. When ε-shrinkage is low, the plot clearly indicates the model misspecification; however, when shrinkage is substantial, signs of model misspecification are absent, and the plot indicates a perfect fit. Similar phenomena were observed with all other scenarios tested. Already, at the ε-shrinkage extent of 20–30%, it was hard to visually detect model misspecification.
The power of IWRES to detect residual model misspecification diminishes with an increase in ε-shrinkage. Figure 8 (left panel) shows IWRES diagnostics produced based on misspecified model (proportional residual error only) fitted to the data simulated with the model that had combined error model (additive + proportional residual error). A quantitative relationship between ε-shrinkage and model misspecification indication, expressed as the slope of the regression line for |IWRES| versus IPRED, is shown in Fig. 8 (right panel).
Relationships between η- and ε-shrinkage are shown in Fig. 9. As expected, these two terms are positively correlated. The shape of this relationship is however different for different study designs.
In this paper, we have evaluated the usefulness of EBE-based diagnostics in different circumstances and their dependencies on the information content in the datasets. These diagnostics are powerful when the individual’s data provide a substantial amount of information on the model parameters. However, with decreased information content per individual, these diagnostics become not only uninformative but potentially also misleading. The EBE-based diagnosis is a central part of the model building process, but if a modeler would accept such diagnostics without considering shrinkage, it may impede decision-making, increase time for data analysis, decrease trust in adequate models, and accept inappropriate models.
Although it is difficult to give a rule of thumb at what shrinkage extent diagnostics becomes misleading, we observed in our examples that already at a level of 20–30% for both, η- and ε-shrinkage, EBE-based diagnostics generally lost their power, and false indications started to appear. Therefore, it seems rational to provide values for shrinkage whenever EBE-based diagnostics are used for communicating model quality. It is important to clarify that all mentioned patterns are associated with the graphical diagnostics solely. When both models were tested directly in NONMEM, correct and misspecified, the correct model would be selected for every example, based on standard model building procedures using the objective function values. Thus, significant shrinkage in most of the cases does not indicate any problem with the dataset or with the model; it only affects the diagnostics based on EBE. Thus, whenever shrinkage is present, a model building process involving more direct testing and less or no reliance on EBE-based diagnostics should be used. Additionally, other types of diagnostics ought to be used in these cases, for example simulation-based diagnostic (10) or conditional weighted residuals (14).
An interesting finding was that EBEs may in the presence of shrinkage indicate false relationships or hide true relationships when used for covariate screening. Here, this was a consequence of the induced correlation between two parameters, ηka and ηV. (in general, sign of such an induced correlation may depend on the design). This finding is of special interest when EBEs are used for selection of covariates that will be further tested in the model for their significance. If only certain parameters are screened for covariates, it may happen that EBEs would indicate false parameter-covariate relationships, which may even turn out to be significant when tested directly in the model, while the covariate was truly related to other parameter. Therefore, whenever this is the case (i.e., single parameter is screened for covariate relationships and a certain covariate appears to be significant), the modeler should also test the relationship between the significant covariate and other parameters in order to determine the appropriate relationship.
We have observed that η- and ε-shrinkages are correlated, which is intuitive. However, while η-shrinkage may appear substantial even when observations per subject are numerous, ε-shrinkage remains low in these circumstances. ε-Shrinkage becomes substantial when the number of observations is equal or less than the number of random effects. As a general principle, the lower the residual variability, the more informative are observations, and the lower will the η-shrinkage be. Also, as a general principle, a higher number of between-subject random effect and higher variability in these will lead to a higher flexibility in the individual model to approach to observed values and therefore lead to higher ε-shrinkage.
We have so far only discussed shrinkage magnitudes between zero and one. Based on theoretical considerations, η- and ε-shrinkage should, in the absence of model misspecification, be positive (15). Indeed, whenever the shrinkage was computed under the correct model, shrinkage values were in this range. For a number of scenarios when data were simulated and model parameters re-estimated with misspecifications of the parameter distribution, shrinkage was non-negative. However, shrinkage may also take negative values in certain circumstances; we found negative shrinkage in some real data examples, typically with rich data and a small number of subjects. Also, in situations where a parameter variance is fixed to a lower value than the true value, negative shrinkage may occur. Thus, negative shrinkage may be a useful signal for possible model misspecification.
This exercise was performed on simulation examples, and one can argue that such scenarios would not occur often in model building processes using data from real clinical trials. However, when inspecting 29 published PK and PD models, we identified that shrinkage was present in all of them (Table II). Substantial shrinkage is often present in PD parameters and PK parameters describing absorption and distribution. Even the most well-informed parameter, clearance (CL), showed shrinkage above 20% in about one third of the PK studies. A large number of observations do not guarantee the absence of η-shrinkage as it can occur even in cases when the number of samples is as large as 20 observations per individual. This is simply because the studies are not designed to provide perfect information on all model parameters. In this work, we explored implications of EBE shrinkage for model diagnostics. However, EBEs are also used in other aspects of non-linear mixed effect analysis, for example, in FOCE, sequential PK-PD analysis or nonparametric estimation in NONMEM (14,16,17). Shrinkage would affect these estimation procedures too; however, this is outside the scope of the present investigation.
In conclusion, when shrinkage is higher than about 20–30%, EBE-based diagnostics lack informativeness and may be misleading. Therefore, it is desirable to report the extent of ε- and η-shrinkage to assess the relevance of diagnostics employing EBEs, IPRED, and IWRES. When shrinkage is high, other diagnostics and more direct population model estimation need to be employed in model building and evaluation.
The authors would like to thank Dr. Siv Jönsson for valuable comments on the manuscript, Paul Baverel for his help with reviewing shrinkage in real models and datasets, and the anonymous reviewers for their valuable comments on the manuscript.