Virus capsid assembly has been a key model system for understanding the principles of complicated molecular self-assembly systems [1
]. Simulation studies have been central to these efforts, providing a way to study emergent properties and conduct in silico experiments that would be infeasible in practice and intractable to analyze theoretically. For example, simulation studies have made it possible to infer emergent properties of different hypothetical models of assembly [3
], explore effects of parameter variations on assembly progress [5
], and examine details of reaction pathways and assembly mechanisms implied by theoretical models of assembly [7
]. There are, however, substantial obstacles to successful use of simulation methods, one of the most important being the need for accurate parameters of assembly. For capsid assembly, these parameters would typically correspond to rates of reactions involved in assembly or disassembly of coat proteins. While estimates of free energies of binding have been derived for some viruses [14
], precise rate constants for coat-coat binding are not known for any real virus system. This limitation has led to studies typically either assuming literature-derived “best guesses” for unknown parameters [16
], attempting to infer approximate rate parameters from structural models [17
], or scanning parameter spaces to identify the range of possible behaviors available to a viral system [5
]. Such simulation studies have, however, suggested that relatively small changes in parameters can lead to large changes in assembly mechanisms [13
], calling into question how much one can learn about any particular virus from models with even well constrained parameter values. For simulations of capsid assembly to move from generalities about ranges of possible behaviors to predictive models of particular viruses will require new methods to learn the rate parameters underlying actual viral assembly systems.
Fortunately, simulation methods used to study capsid assembly in the abstract also in principle provide one a mechanism for learning parameters consistent with any given experimental measure of assembly progress. Generally, estimation of parameters in a computational or mathematical model of a system of interest is posed as an optimization problem. The goal of this optimization is to minimize an objective function
that measures deviation between simulated and real assembly data with respect to the vector of parameters
. While many generic optimization methods can in principle be applied to such problems, the appropriate methods for any particular system will depend on many characteristics of the system to be fit. Virus assembly systems present several special challenges to optimization approaches to parameter fitting. Individual simulations, and hence objective function evaluations, are often computationally costly; a single trajectory can require days or weeks of compute time to model the time scale of a typical assembly reaction. Furthermore, these computational costs can change rapidly with parameter values. In addition, simulations of more complicated virus models are usually conducted with stochastic methods (either stochastic simulation algorithm (SSA) models [18
] or Brownian dynamics and related particle models [5
]), making it difficult to accurately evaluate the objective function and, even more so, to evaluate the derivatives needed for most optimization methods.
Since we lack closed form expressions for non-trivial models of capsid assembly, the parameter fitting problem falls under the class of simulation optimization, where the objective function needs to be evaluated through a simulation [23
]. Reviews on the methods used for optimization of stochastic simulation systems can be found elsewhere [23
]. Optimization algorithms for stochastic systems (for example, quasi-gradient methods and algorithms of type Keifer-Wolfowitz) do not directly deal with the gradient of the objective function due to the errors introduced in the gradient estimates because of the noise embedded in such systems. Various techniques have been developed to approximate gradients in these methods, such as specialized finite difference schemes and infinitesimal perturbation analysis. These techniques, however, may impose restrictive conditions on the form of the potential surface to be fit. Another important class of method is response surface methodology, which fits a smoothed regression model to the potential surface and optimizes relative to the regression model. The minimum of the fitted function is then estimated to be the minimum of the search space [26
]. Though the number of simulations required for optimization by such a procedure is reduced compared to gradient-based methods, response surface methods can perform poorly when the metamodel is not a good approximation of the search space, the search space is inadequately sampled, or the search space is characterized by very sharp ridges and large valleys with nearly zero curvatures [26
In the present work, we develop a computational strategy for parameter fitting of capsid assembly systems designed to deal with the particular computational challenges capsid assembly systems present. The method interpolates between response surface and quasi-gradient approximations to provide fast handling of smooth regions of the objective function with robust handling of more difficult regions. We specifically develop the method for use with light scattering data, a widely used approach for monitoring in vitro capsid assembly systems [29
], although the algorithm is generic with respect to the data used. We demonstrate the approach on a problem of fitting rate parameters to light scattering data from a human papillomavirus (HPV) in vitro assembly system [30
]. We show that the method can achieve a good quality fit to the real data that shows moderate sensitivity to uncertainty in the estimates or the experimental data. This parameter fit further provide specific suggestions about mechanisms of assembly of the HPV in vitro system. The work provides a first step towards using prevailing coarse-grained models of capsid assembly to make inferences about assembly mechanisms for specific viruses and about how these mechanisms might be altered under different assembly conditions or hypothetical experimental or therapeutic interventions.