We hypothesized that a single-objective, hybrid genetic algorithm (SOHGA) approach for selection of covariates and function forms in pharmacokinetic model building would accurately identify covariates, models, and appropriate initial estimates of model parameters with equal or superior fits to clinical data versus a manual stepwise method building approach. For covariate identification in a simulated dataset, we found that a SOHGA with a 10 point penalty per covariate correctly identified as many true covariates as an automated stepwise covariate modeling (three of four covariates) but with fewer spurious covariate relationships (1 vs. 2 spurious covariates). When applied to clinical datasets, we found that the best SOHGA candidate models outperformed the final manual stepwise models by at least 10 points in the AIC for four of seven compounds with nominal differences for three compounds and 10 point worse performance for none of the compounds. These results are consistent with the finding of a previous single compound pilot study in which the genetic algorithm approach to pharmacokinetic model building resulted in a better fit to clinical data than a stepwise modeling approach [3
For the fits to simulated data, as expected, models with more stringent criteria for covariate incorporation had fewer spurious covariates. Changing the p value for exclusion from 0.05 to 0.01 in the automated SCM method led to 1 fewer spurious covariates and increasing the point penalty in the SOHGA from 3.84 to 10 points per covariate led to 2 fewer spurious covariates. While it is possible that a more stringent criteria for covariate incorporation could lead to exclusion of true covariates, this was not this case in this dataset as all automated SCM and SOHGA options identified three of four true covariates. However, all automated SCM and SOHGA options failed to identify the effect of body surface area on the volume of distribution. This was unexpected because the change in volume due to body surface area was expected to be equivalent to the change due to gender and gender was identified as a significant covariate by all automated approaches. It is possible that individual volumes of distribution happened to be selected from the random distribution such that the effect of body surface area was diminished from the expected value.
The typical values of parameters estimated by the automated SCM, Lasso, and SOHGA methods were all similar to those of the true model. This suggests that the SOHGA can accurately identify parameter values. However, this results is not generalizable based on a single simulation study and does not imply that SOHGA parameter estimates will be accurate for all possible datasets and model scenarios.
It is important to note that, although both the SCM and SOHGA included the spurious covariate of height as a predictor of volume rather than the true covariate of body-surface area, the objective function values for these models are lower than for the true model (3.3 points lower for SOHGA and 10.1 points lower for SCM). This suggests that, based on the change in objective function, both methods selected the better covariate although it was not the true (used in the simulation model) covariate. The correlation between height and body-surface area was high (0.92) so the effect of height was likely due to its correlation with body-surface area.
For the clinical datasets, the best SOHGA candidate models of all seven compounds successfully converged while the stepwise approach fixed the absorption rate constant for four compounds to achieve convergence. In addition, the SOHGA approach identified a candidate model for the risperidone which successfully converged using the FOCE with interaction approach which the stepwise approach did not. This suggests that, the SOHGA approach identified regions of solution space that converged in which the stepwise approach did not. This is most likely because the SOHGA approach can search the error, covariate, and initial estimate space for regions that converge and select candidate models from within those regions. Fixing the absorption rate constant in the stepwise approach restricted the models to a particular region of solution space. The improvements in AIC with the SOHGA approach to modeling olanzapine (
AIC = −468.5) is likely due to the extra degree of modeling freedom provided by the absorption rate model parameter which the stepwise approach fixed based on literature values. This parameter was fixed early in the manual model building process due to a failure of the model to converge and was not revisited after subsequent changes were made to the model. This inability to estimate a specific parameter at one point in the modeling process but have the parameter be identifiable when other features are present is an example of the interdependence of model features and is similar to the documented interaction between compartments, variance terms, and covariates [2
]. In contrast, the SOHGA method revisits and retests such a “decision” multiple times and in different combinations with other model features. This reduces the risk of missing important features as well as making it more likely to select the numerically optimal combination of features.
The lower AIC with the SOHGA approach to modeling citalopram (
AIC = −22.3) is most likely due to the inclusion of additional covariate terms in the models. The best SOHGA candidate model for citalopram captures two of the three covariates of the stepwise model along with four additional covariate relationships and nearly identical the error structures between the two models. These model improvements suggest that the SOHGA identified interactions between model components that were not recognized by the stepwise approach. The lower AIC for risperidone with the SOHGA (
AIC = −278.1) is likely a combination variable absorption rate and additional covariates; the best SOHGA candidate model for risperidone included four covariate terms versus zero for the final stepwise model although the best SOHGA model has one less mixture compartment for clearance. Finally, the lower AIC for DMAG with the SOHGA (
AIC = −22.3) is most likely due to application of inter-occasion variability on clearance as opposed to the central volume.
However, the minor differences RMSE between the final stepwise models and best SOHGA candidates suggests that the improvements seen in AIC and the model description of data may not necessarily translate to benefits in predictive value.
Although a condition number penalty was not included in the SOHGA fitness function, the best SOHGA candidate model had condition numbers <1,000 for all compounds except for citalopram and risperidone. This suggests that the data was capable of supporting the best SOHGA candidate models except for those of citalopram and risperidone. However, with the addition of the condition number penalty, the SOHGA approach identified candidate models for citalopram and risperidone with condition numbers <1,000.
Over the seven compounds, the best SOHGA candidate models included more covariates than the stepwise approach (23 vs. 13) while including fewer variability terms (15 vs. 22); the SOHGA tended to describe variability with covariates rather than unknown variability relative to the stepwise approach. This was despite the penalty for fixed effect (THETA) and inter-individual variability (OMEGA) terms being equal (10 points). However, the best SOHGA candidate models did not include half of the covariates identified by the stepwise approach. This raises concerns about whether these covariates are an artifact of the model building approach [2
It should be noted that the criteria for inclusion of a covariate in the stepwise regression was 3.84, based on the likelihood ratio test while the criteria for inclusion in the SOHGA approach was 10 points. This higher threshold for inclusion of a covariate effect in the SOHGA may be responsible for the lack of inclusion of other covariate relationships that were found using the stepwise regression approach. However, as demonstrated in the simulated dataset, this higher threshold for inclusion also provides some protection from inclusion of spurious covariates.
The 26.7 % median change in pharmacokinetic model parameters between the final stepwise and best SOHGA candidate models suggests that the model parameters are generally comparable. Much of this difference may be due to the fixed absorption rate constants in the final stepwise models and the differing covariate, variability, and error structures between the final stepwise and best SOHGA candidate models.
The strengths of this study are that the SOHGA method was tested on multiple compounds with data coming from multiple sites. The SOHGA was also applied over various conditions as the data for different compounds covered various time scales (months for SZ patients, weeks for AD patients, and hours for citalopram and DMAG), both oral and intravenous administration, and degrees of data and patient sample sizes and sparseness.
This study has several limitations. First, all models were either one-, two-, or three-compartment models, so the generalizability of the effectiveness of SOHGA approach to other model structures is uncertain. However, the SOHGA approach is entirely general and can be used for models described by ordinary differential equations, mixture models (such as the mixture model on clearance of risperidone implemented in this study), and odd type data. Another limitation is that the study involved mostly schizophrenia and Alzheimer’s patients and medications. The extent to which these results are generalizable to other medications and patients is uncertain.
There are also general limitations to using a genetic algorithm. While genetic algorithms quickly converge to “good” regions of the solution space, convergence to the “best” local and global solutions is much slower due to the random nature of the method. The hybrid technique aids in identifying local optima but convergence to the global optimum within the simulation cannot be guaranteed. Another consequence is that the genetic algorithm approach could result in multiple, equally valid solutions from different regions of the solution space. If these regions have different characteristics, it would be difficult to draw conclusions about covariates and model structure from strictly numerical results. This emphasizes the important role for the modelers in assessing biological plausibility, graphic diagnostics, and clinical importance of model features. While genetic algorithms can identify globally optimal solutions, the robustness of these solutions was not considered here. Finally, genetic algorithms are most efficient when the candidate models are evaluated in parallel rather than series so multiple core computers are recommended. The SOHGA models presented here took 6–150 h on a 24 processor computer server. More computational power, as is readily available with cloud computing, grid computing, or other shared resources methods, would decrease the run times. However, given this modest hardware configuration and the weeks to months of time to develop a model using manual approaches, the computational costs are not overly burdensome.
Also, the use of a single objective function can potentially introduce biases due to the ad hoc weighting. In this analysis, a ten point penalty for each model parameter was chosen based on the “Sheiner criteria.” (Lewis Sheiner, personal communication with one of the authors (MS). Dr. Sheiner explained that he often used 10 points as his criteria to include a parameter to correct for multiple comparisons, and because he rarely could see any difference in plots if the change in OBV was less than 10 points.) A smaller penalty would likely lead to more covariates in the model (and better overall fit but with a higher chance of spurious covariates) and a higher penalty would lead to inclusion of fewer covariates (and worse overall fit but with a lower chance of spurious covariates). The large penalty assigned for failure of convergence and failure of the covariance step (and the condition number, when implemented) was to give a high priority to these outcomes, as some modelers feel strongly that a successful covariance step is a necessary (but not sufficient) condition for a “final” model. A potential solution to these ad hoc penalties is the use of a multi objective function—with a dimension for each penalty term in the single objective algorithm—would yield a front of non-dominated optimal solutions [9
]. However, both of these approaches (single objective and multi objective optimization) to the covariance step rely on a pass/fail criterion. Information from the covariance step output can often be used to understand additional opportunities to improve the model. However, hypothesis generation is the responsibility of the user and additional hypotheses to improve the model can be used to expand the search space on subsequent SOHGA analyses or by traditional forward addition/backward elimination.
The SOHGA algorithm presented here also does not consider the feasibility of or prior knowledge about model parameters. However, these factors could be included by introducing additional weighting in the fitness function. That is, an additional bonus is applied for covariates that the modeler feels should be included (e.g., a weight effect on volume of distribution) and a penalty is imposed for undesired interactions (e.g., having both body-mass index and weight modify a variable). While this weighting cannot guarantee the inclusion/exclusion of model parameters, it does shift the likelihood based on the modelers input.
Finally, using the SOHGA approach may result in the temptation to inadequately explore model parameter relationships and potential bias using graphics in pharmacokinetic model building. That is, the user may be tempted to spend less time looking at diagnostic plots and generating biologically sound hypotheses. The SOHGA cannot replace thorough examination of the data and hypothesis generation; SOHGA only automates the actual search part of the stepwise regression (e.g., construction of control files, running the model, quantifying results and construction of new control files). All intellectual input into the model selection process—the examination of graphics and hypothesis generation—remains the responsibility of the modeler. In a manner similar to traditional stepwise analysis, the best models from a SOHGA analysis can be examined using traditional diagnostic plots or other means and additional hypotheses generated to explain observed bias. All models run by SOHGA are available and the interface makes in convenient to identify models with desirable characteristics such as lower value of the objective function or parsimony. The search space can then be expanded to include these new hypotheses before SOHGA is run again.
Further, these results do not suggest that SOHGA, or any automated search algorithm, is likely to provide the final model for an analysis. A more practical approach to arriving at the “final” model may be that SOHGA would provide something comparable to initial parameter estimates for non-linear regression. That is, it is common practice to start a model building exercise with a trivial model (e.g., one compartment, no covariates, no inter-occasion variability, no mixture models, etc.), but it is likely that, while probably not the final model, a model from SOHGA will be closer to the true global minimum in the search space than the traditional trivial model. Given what is known about the failure of the assumption of monotonicity of the model search space [2
], having a better starting place in the search space may result in a higher likelihood of finding the global minimum using traditional forward addition and backward elimination model building methods rather than a local minimum. A likely use scenario might be that the user assesses a number of the “best” models selected by SOHGA (e.g., those with low OBV, the most parsimonious, etc.) and the strengths and biases of each model can be assessed using standard diagnostic graphics and methods as well as biological plausibility. The features from different models can be recombined based on these assessments. As is the case for non-linear regression, the number of iterations may be fewer if a better starting point is provided.
It is likely that, initially at least, the SOHGA approach is best suited for compounds in which the biology is fairly well understood, and not for highly exploratory analyses. There are two reasons for this. First, experience suggests that hypotheses in poorly understood compounds typically come in small numbers—often one at a time—after examination of plots. The SOHGA approach requires that many (although not all) hypotheses be available initially. As discussed above, it is reasonable to perform a SOHGA search, examine the results and conclude that additional hypotheses are required, and then perform another SOHGA search with an expanded search space. But, if the drug is very poorly understood, few hypotheses may be available prior to the start of the model building. As a result, a process of running SOHGA with only a few hypotheses, examining the results, and generating new hypotheses may become even more tedious than step wise regression. Second, SOHGA is very computationally intensive. Complex model often require ordinary differential equation solutions, which can be very computationally intensive as well, making SOHGA impractical.
In conclusion, our results suggest that a single-objective, hybrid genetic algorithm can be used to fit pharmacokinetic model structures and covariates to data. This approach could be used either stand-alone or to identify regions of the solution space that could be explored further manually. Further additions to the genetic algorithm could include other objective measures of model quality and multi-objective optimization. Genetic algorithms provide a systematic way to identify covariates, interactions, initial parameter estimates, and the model structure for pharmacokinetic models accounting for interactions among model components that may otherwise be difficult to identify.