Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC5033129

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Variational Bayes
- 3 Penalized Splines With Heteroskedastic Errors
- 4 Variational Approximation of Heteroskedastic Model
- 5 Simulation Example
- 6 Application to Inference from SAXS
- 7 Discussion
- Supplementary Material
- References

Authors

Related links

J Comput Graph Stat. Author manuscript; available in PMC 2017 March 9.

Published in final edited form as:

J Comput Graph Stat. 2016; 25(1): 225–245.

Published online 2016 March 9. doi: 10.1080/10618600.2014.983642PMCID: PMC5033129

NIHMSID: NIHMS779487

Bruce D. Bugbee, Postdoctoral Fellow

Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, 77030 (Email: gro.nosrednadm@eebgubdb)

F. Jay Breidt, Professor

Department of Statistics, Colorado State University, Fort Collins, CO, 80523 (Email: ude.etatsoloc.tats@tdierbj)

Mark J. van der Woerd, Research Scientist

Department of Biochemistry & Molecular Biology, Colorado State University, Fort Collins, CO, 80523 (Email: ude.etatsoloc@dreoW red nav.kraM) October 12, 2014

Variational approximations provide fast, deterministic alternatives to Markov Chain Monte Carlo for Bayesian inference on the parameters of complex, hierarchical models. Variational approximations are often limited in practicality in the absence of conjugate posterior distributions. Recent work has focused on the application of variational methods to models with only partial conjugacy, such as in semiparametric regression with heteroskedastic errors. Here, both the mean and log variance functions are modeled as smooth functions of covariates. For this problem, we derive a mean field variational approximation with an embedded Laplace approximation to account for the non-conjugate structure. Empirical results with simulated and real data show that our approximate method has significant computational advantages over traditional Markov Chain Monte Carlo; in this case, a delayed rejection adaptive Metropolis algorithm. The variational approximation is much faster and eliminates the need for tuning parameter selection, achieves good fits for both the mean and log variance functions, and reasonably reflects the posterior uncertainty. We apply the methods to log-intensity data from a small angle X-ray scattering experiment, in which properly accounting for the smooth heteroskedasticity leads to significant improvements in posterior inference for key physical characteristics of an organic molecule.

Variational approximations allow for the approximate computation of posterior parameter distributions of complex models in a deterministic fashion (Wainwright and Jordan, 2008). The term *mean field variational Bayes* (MFVB) refers to a class of variational approximations to posterior distributions under a nonparametric product density constraint (Bishop, 2006; Ormerod and Wand, 2010). Applications of MFVB approximations include mixtures (Attias, 2000), linear mixed models (Ormerod and Wand, 2010), and nonparametric regression with missing data (Faes et al., 2011). Given the connection between MFVB and full posterior parameter conditionals, variational approximations can be directly implemented in situations where all parameters have known posterior conditionals and Gibbs sampling is appropriate (Casella and George, 1992). This has led to useful software packages such as Infer.NET that allow for “black box” computation of variational approximations given fully known posterior conditionals (Minka et al., 2010; Wang and Wand, 2011).

Recent studies have considered *non-conjugate models* where one or more of the posterior conditionals are not of known form (Carlin and Polson, 1991). Examples of such models include logistic regression (Jaakkola and Jordan, 1997), nonparametric regression with measurement error (Pham et al., 2013), generalized linear mixed models (Tan and Nott, 2013), and correlated topic models (Blei and Lafferty, 2007). Variational approximations of non-conjugate models often have to be derived on a case by case basis and can be difficult to handle. Wand et al. (2011) highlight the use of auxiliary variables, univariate quadrature methods, and finite mixture approximations for addressing hierarchical models with complex distributions. Knowles and Minka (2011) focus on generalizing non-conjugate variational inference for exponential family cases arising from multinomial and binary regression. Salimans and Knowles (2013) and Nott et al. (2013) have recently developed alternatives to standard mean field variational approximations, called structural or fixed-form variational Bayes. These methods have shown promise for addressing non-conjugate problems through the assumption of exponential forms and application of Monte Carlo integration.

One such non-conjugate model arises from semiparametric regression with heteroskedastic errors. Models of this type have been explored in Ruppert et al. (2003), Chan et al. (2006), and Crainiceanu et al. (2007). Using the mixed model formulation of penalized spline regression described in Ruppert et al. (2003), models of this type allow for parametric and nonparametric terms in both the mean and variance levels. Nott et al. (2012) look at the case of a linear fixed effect model with heteroskedastic errors. The work we present here is a novel application of variational approximation methods to the case of semiparametric regression in the presence of heteroskedastic errors, represented as multilevel linear mixed effect models. We use an embedded Laplace approximation to handle the non-conjugacy found in our variational approximation (Wang and Blei, 2013; Nott et al., 2012). Results of Salimans and Knowles (2013) and Nott et al. (2013) could also be applied in this context, though our method uses analytic formulas for the variational moments needed in our special case rather than relying on Monte Carlo integration.

Our work is motivated by data arising from small angle X-ray scattering (SAXS), an imaging technique used in the estimation of physical shape characteristics of complex macromolecules. The general experimental methodology consists of bombarding a sample of the molecule in solution with a beam of X-rays and measuring the resulting scattering pattern on a two-dimensional detector orthogonal to the X-ray beam. The scattering data are reduced from two dimensions to one by averaging intensity (photon counts on the detector) across all of the pixels at a given radial distance from a fixed center. The radii in the detector plane are equivalent to scattering angles relative to the incident X-ray beam: zero radius corresponds to the beam itself, and larger angles correspond to increasing radii. The resulting “intensity curve” is then the set of ordered pairs (angle, radial average intensity at that angle). See Figure 4. The one-dimensional data from SAXS experiments are heteroskedastic, due to characteristics of the molecule in solution and the measurement mechanism; see for example Breidt et al. (2012). SAXS is a low-resolution method in which physical parameters such as maximum linear particle dimension and squared radius of gyration $\left({R}_{g}^{2}\right)$ can be estimated. The squared radius of gyration of a particle is defined as the ratio of the particle’s moment of inertia and mass. Appropriate interval estimates for such parameters are of great importance to experimental scientists, who develop theoretical models consistent with the experimental scattering data. Accounting for the heteroskedasticity in the SAXS data is essential for proper inference on these physical parameters, which in turn allows experimental scientists to accurately determine the class of molecular structures that is consistent with a given SAXS dataset.

Constant error (top) and heteroskedastic (bottom) variational approximation fits of experimental SAXS data from a single 7-second exposure of a 4 mg/ml sample of the H2AH2B complex. The gray bands represent 95% pointwise credible intervals generated from **...**

Fast analytic methods are desirable for SAXS data. Exposure times on the order of one second produce a single SAXS image and data set, and experiments generating many images across a range of experimental conditions are common. We will demonstrate that our variational/Laplace method is substantially faster than a Markov Chain Monte Carlo alternative.

Unless otherwise stated, all random vectors **z** are assumed to be continuous with densities of the form *p*(**z**). Conditional probability densities will be written as *p*(**z** | **w**) with *p*(**z** | ·) representing the density conditional on all other random quantities in the model. Let ${(x-\kappa )}_{+}^{p}\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}\text{max}\{0,{(x-\kappa )}^{p}\}$. Diagonal matrices are written as diag(*a*_{1}*, …, a _{m}*) and block diagonal matrices as blockdiag(

We first provide a brief background on the theory of variational approximations. Let **y** be a response vector and * ψ* be a vector of parameters. Direct evaluation of

$$p(\psi \mid \mathbf{y})\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}\frac{p(\mathbf{y},\psi )}{p\left(\mathbf{y}\right)}$$

(1)

generally requires high dimensional integration, which may not be analytically tractable. A standard approach is to use Markov Chain Monte Carlo (MCMC) methods to approximate *p*(* ψ* |

Let *q*(* ψ*) be an arbitrary density function over the parameter space

$$\begin{array}{cc}\hfill \text{log}p\left(\mathbf{y}\right)\phantom{\rule{thickmathspace}{0ex}}& =\phantom{\rule{thickmathspace}{0ex}}{\int}_{\Psi}q\left(\psi \right)\text{log}p\left(\mathbf{y}\right)d\psi \hfill \\ \hfill & \ge {\int}_{\Psi}q\left(\psi \right)\text{log}\left\{\frac{p(\mathbf{y},\psi )}{q\left(\psi \right)}\right\}d\psi \hfill \\ \hfill & =:\text{log}\underset{\u2012}{p}(\mathbf{y};q),\hfill \end{array}$$

(2)

where the inequality follows from Kullback and Leibler (1951), who show the Kullback-Liebler (K-L) divergence $\int q\left(\psi \right)\text{log}\left\{\frac{q\left(\psi \right)}{p(\psi \mid \mathbf{y})}\right\}d\psi \ge 0$ with equality if and only if *q*(* ψ*) =

The most common restriction imposed on the density *q*(* ψ*), which we use here, is that for some partition of the parameter vector

$$q\left(\psi \right)\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}\prod _{l=1}^{L}{q}_{l}\left({\psi}_{l}\right).$$

(3)

Variational approximations that impose such a product-type density restriction on *q*(* ψ*) are referred to as

Adopting the notation in Ormerod and Wand (2010), the optimal variational densities, ${q}_{1}^{\ast}\left({\psi}_{1}\right),{q}_{2}^{\ast}\left({\psi}_{2}\right),\dots ,{q}_{L}^{\ast}\left({\psi}_{L}\right)$, that minimize the K-L divergence described in (2) can be found using the relationship

$${q}_{l}^{\ast}\left({\psi}_{l}\right)\propto \text{exp}\left[{E}_{-{\psi}_{l}}\left\{\text{log}p({\psi}_{l}\mid \cdot )\right\}\right].$$

(4)

Here *p*(* ψ_{l}* | ·) is the posterior conditional distribution

$$p({\psi}_{l}\mid \mathbf{y},{\psi}_{1},\dots ,{\psi}_{l-1},{\psi}_{l+1},\dots ,{\psi}_{L}).$$

(5)

The expectation *E*_{−ψl} is with respect to all *q*-densities except for *q*(* ψ_{l}*). The dependence of the optimal

Implementing a variational Bayes approximation under these conditions amounts to determining a set of *variational parameters* that fully define each *q*-density. Once these variational parameters are determined, the estimates of the model parameters *ψ*_{1}*, …, ψ _{L}* are taken from the densities ${q}_{1}^{\ast},\dots ,{q}_{L}^{\ast}$, usually as the means associated with each density.

A simple case of regression models in which both the response and the associated variances are smoothly varying across some underlying set of covariates is

$${y}_{i}\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}f\left({x}_{i}\right)+{\u03f5}_{i},\phantom{\rule{thickmathspace}{0ex}}{\u03f5}_{i}~\mathcal{\mathcal{N}}(0,{\sigma}_{i}^{2}),\phantom{\rule{thickmathspace}{0ex}}\text{log}\left({\sigma}_{i}^{2}\right)\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}g\left({x}_{i}\right)\phantom{\rule{thickmathspace}{0ex}}(i\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}1,2,\dots ,N).$$

(6)

Adopting a penalized spline formulation as described in Ruppert et al. (2003), the nonparametric model takes the form for all *i* = 1, 2, …, *N*:

$$\begin{array}{cc}\hfill & {y}_{i}\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}[1,{x}_{i},\dots ,{x}_{i}^{p},{({x}_{i}-{\kappa}_{1})}_{+}^{p},\dots ,{({x}_{i}-{\kappa}_{K})}_{+}^{p}]\left[\begin{array}{c}\hfill \beta \hfill \\ \hfill \mathbf{b}\hfill \end{array}\right]+{\u03f5}_{i}\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{\mathbf{C}}_{i}^{T}\left[\begin{array}{c}\hfill \beta \hfill \\ \hfill \mathbf{b}\hfill \end{array}\right]+{\u03f5}_{i}\hfill \\ \hfill & \mathbf{b}\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{({b}_{1},{b}_{2},\dots ,{b}_{K})}^{T}~{\mathcal{\mathcal{N}}}_{K}(0,{\sigma}_{\mathbf{b}}^{2}{\mathcal{\mathcal{I}}}_{K})\hfill \\ \hfill & {\u03f5}_{i}~\mathcal{\mathcal{N}}(0,{\sigma}_{i}^{2})\hfill \\ \hfill & \text{log}\left({\sigma}_{i}^{2}\right)\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}[1,{x}_{i},\dots ,{x}_{i}^{r},{({x}_{i}-{\kappa}_{1}^{V})}_{+}^{r},\dots ,{({x}_{i}-{\kappa}_{{K}_{V}}^{V})}_{+}^{r}]\left[\begin{array}{c}\hfill \delta \hfill \\ \hfill \mathbf{c}\hfill \end{array}\right]\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{\mathbf{C}}_{\mathbf{V}i}^{T}\left[\begin{array}{c}\hfill \delta \hfill \\ \hfill \mathbf{c}\hfill \end{array}\right]\hfill \\ \hfill & \mathbf{c}\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{({c}_{1},{c}_{2},\dots ,{c}_{{K}_{V}})}^{T}~{\mathcal{\mathcal{N}}}_{{K}_{V}}(0,{\sigma}_{\mathbf{c}}^{2}{\mathcal{\mathcal{I}}}_{{K}_{V}}),\hfill \end{array}$$

(7)

where * β* = (

Define the *N* × (*p* + *K* + 1) and *N* × (*r* + *K _{V}* + 1) design matrices $\mathbf{C}\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{\left[{\mathbf{C}}_{i}^{T}\right]}_{i=1}^{N}$ and ${\mathbf{C}}_{V}\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{\left[{\mathbf{C}}_{\mathbf{V}i}^{T}\right]}_{i=1}^{N}$ where the

For simplicity, we present our results in the context of the nonparametric model of (7). Minimal changes are required to handle more complex semiparametric structures. Additional parametric covariates can be included by expanding the fixed effect parameter vector and additional non-parametric components or parametric-by-nonparametric interactions can be handled through the inclusion of new random effects with corresponding variance terms.

The truncated polynomial spline basis is chosen both for illustrative purposes and for its usefulness in estimation of a key molecular property from SAXS data, as described in §6. The lack of orthogonality in this basis can cause computational problems, particularly for MCMC methods. Often, P-splines (Eilers and Marx, 1996) or O’Sullivan splines (Wand and Ormerod, 2008) are used as alternatives for their computational stability. Other common choices of basis functions include radial basis functions, B-splines, and thin plate splines, depending on the application. These basis choices affect the definition of the **X** and **Z** matrices but not the variational approximation method presented here.

Equation (7) can be considered a submodel of the class of spatially adaptive penalized spline regression with heteroskedastic errors described in Crainiceanu et al. (2007). These models allow for both the error variance (*σ*^{2}) and the random effects variance (${\sigma}_{\mathbf{b}}^{2}$) to be smoothly varying over some set of covariates. Our model is a simplification that holds ${\sigma}_{\mathbf{b}}^{2}$ as a univariate parameter to be estimated. A limitation of a model of this form is that it can have trouble fitting data in regions where dramatic increases in variation coincide with strong changes in curvature. The inclusion of a spatially adaptive random effect variance allows the model to handle changes in the curvature of the underlying function *f*(*x*) better. The work presented here focuses on cases where the underlying first derivative of *f*(*x*) appears to vary slowly and does not coincide with sharp increases of variation, as is the case for our motivating application to SAXS data.

Assume $\beta ~{\mathcal{\mathcal{N}}}_{p}(0,{\sigma}_{\beta}^{2}{\mathcal{\mathcal{I}}}_{p})$ and $\delta ~{\mathcal{\mathcal{N}}}_{r}(0,{\sigma}_{\delta}^{2}{\mathcal{\mathcal{I}}}_{r})$ where ${\sigma}_{\beta}^{2}$ and ${\sigma}_{\delta}^{2}$ are given hyperparameters. Assume Inverse Gamma priors ${\sigma}_{\mathbf{b}}^{2}~\mathcal{\mathcal{I}\mathcal{G}}({A}_{\mathbf{b}},{B}_{\mathbf{b}})$ and ${\sigma}_{\mathbf{c}}^{2}~\mathcal{\mathcal{I}\mathcal{G}}({A}_{\mathbf{c}},{B}_{\mathbf{c}})$ with hyperparameters *A*_{b}, *A*_{c}, *B*_{b}, and *B*_{c}. Define ${\Sigma}_{\theta}\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}\text{blockdiag}\{{\sigma}_{\beta}^{2}{\mathcal{\mathcal{I}}}_{p},{\sigma}_{\mathbf{b}}^{2}{\mathcal{\mathcal{I}}}_{K}\}$ and ${\Sigma}_{{\theta}_{V}}\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}\text{blockdiag}\{{\sigma}_{\delta}^{2}{\mathcal{\mathcal{I}}}_{r},{\sigma}_{\mathbf{c}}^{2}{\mathcal{\mathcal{I}}}_{{K}_{V}}\}$.

As in Crainiceanu et al. (2007), the model structure and conjugate nature of the priors described here lend themselves to a hybrid Gibbs sampling computational approach to fitting our model. The posterior conditionals of the parameters of interest are

$$\begin{array}{cc}\hfill & \theta \phantom{\rule{thickmathspace}{0ex}}\mid \phantom{\rule{thickmathspace}{0ex}}\cdot ~\mathcal{\mathcal{N}}({\mathbf{MC}}^{T}{\Sigma}^{-1}\mathbf{y},\mathbf{M})\hfill \\ \hfill & \phantom{\rule{thickmathspace}{0ex}}\mathbf{M}\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{({\mathbf{C}}^{T}{\Sigma}^{-1}\mathbf{C}+{{\Sigma}_{\theta}}^{-1})}^{-1}\hfill \\ \hfill & {\sigma}_{\mathbf{b}}^{2}\mid \cdot ~\mathcal{\mathcal{I}\mathcal{G}}\left({A}_{\mathbf{b}}+\frac{K}{2},{B}_{\mathbf{b}}+\frac{\parallel \mathbf{b}{\parallel}^{2}}{2}\right)\hfill \\ \hfill & {\sigma}_{\mathbf{c}}^{2}\mid \cdot ~\mathcal{\mathcal{I}\mathcal{G}}\left({A}_{\mathbf{c}}+\frac{{K}_{V}}{2},{B}_{\mathbf{b}}+\frac{\parallel \mathbf{c}{\parallel}^{2}}{2}\right)\hfill \\ \hfill & p({\theta}_{V}\mid \cdot )\propto \text{exp}\left[-\frac{1}{2}\left\{\sum _{i=1}^{N}{\mathbf{C}}_{\mathbf{V}i}^{T}{\theta}_{V}+\sum _{i=1}^{N}{({Y}_{i}-{C}_{i}^{T}\theta )}^{2}\text{exp}(-{\mathbf{C}}_{\mathbf{V}i}^{T}{\theta}_{V})+{\theta}_{V}^{T}{\Sigma}_{{\theta}_{V}}^{-1}{\theta}_{V}\right\}\right].\hfill \end{array}$$

(8)

The challenge of fitting this model is due to the non-standard distribution of * θ_{V}* | ·. Crainiceanu et al. (2007) address this problem by introducing a latent error term to the variance model that allows for the sampling from

We now derive an approximation method for fitting the model (7) via variational Bayes. Nott et al. (2012) describe a variational approximation for the standard linear model in the presence of heteroskedasticity of known parametric form. Our approximation applies more broadly, to linear mixed models generally and to semiparametric regression under heteroskedasticity specifically. Unlike Nott et al. (2012), we do not assume a parametric form for the heteroskedasticity, but model it flexibly with penalized splines.

Let $\psi \phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{(\theta ,{\theta}_{V},{\sigma}_{\mathbf{b}}^{2},{\sigma}_{\mathbf{c}}^{2})}^{T}$ be a vector containing all parameters of interest. Using the product density constraint from (3), we assume that the variational density *q*(* ψ*) is of the form

$$q\left(\psi \right)\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{q}_{1}\left(\theta \right){q}_{2}\left({\theta}_{V}\right){q}_{3}\left({\sigma}_{\mathbf{b}}^{2}\right){q}_{4}\left({\sigma}_{\mathbf{c}}^{2}\right).$$

(9)

For notational simplicity, the subscript index on each component density of *q*(* ψ*) will be dropped in future references (e.g.

Define **Γ** = diag(*γ*_{1}*, γ*_{2}, …, *γ _{N}*) where ${\gamma}_{i}\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{E}_{-\theta}\left[\text{exp}(-{\mathbf{C}}_{\mathbf{V}i}^{T}{\theta}_{V})\right]$ corresponds to the moment generating function of

$$\begin{array}{cc}\hfill \theta & \stackrel{{q}^{\ast}}{~}\mathcal{\mathcal{N}}({\mu}_{q\left(\theta \right)},{\Sigma}_{q\left(\theta \right)})\hfill \\ \hfill {\sigma}_{\mathbf{b}}^{2}& \stackrel{{q}^{\ast}}{~}\mathcal{\mathcal{I}\mathcal{G}}\left({A}_{\mathbf{b}}+\frac{K}{2},{B}_{\mathbf{b}}+\frac{1}{2}\parallel {\mu}_{q\left(b\right)}{\parallel}^{2}+\frac{1}{2}\text{trace}\left({\Sigma}_{q\left(\mathbf{b}\right)}\right)\right)\hfill \\ \hfill {\sigma}_{\mathbf{c}}^{2}& \stackrel{{q}^{\ast}}{~}\mathcal{\mathcal{I}\mathcal{G}}\left({A}_{\mathbf{c}}+\frac{{K}_{V}}{2},{B}_{\mathbf{c}}+\frac{1}{2}\parallel {\mu}_{q\left(c\right)}{\parallel}^{2}+\frac{1}{2}\text{trace}\left({\Sigma}_{q\left(\mathbf{c}\right)}\right)\right)\hfill \\ \hfill {q}^{\ast}\left({\theta}_{V}\right)& \propto \text{exp}[-\frac{1}{2}\{\sum _{i=1}^{N}{\mathbf{C}}_{\mathbf{V}i}^{T}{\theta}_{V}+\sum _{i=1}^{N}\left({({y}_{i}-{\mathbf{C}}_{i}^{T}{\mu}_{q\left(\theta \right)})}^{2}+{\mathbf{C}}_{i}^{T}{\Sigma}_{q\left(\theta \right)}^{T}{\mathbf{C}}_{i}\right)\text{exp}\left(-{\mathbf{C}}_{\mathbf{V}i}^{T}{\theta}_{V}\right)\phantom{\}}\phantom{]}\hfill \\ \hfill & \phantom{[}\phantom{\{}+{\theta}_{V}^{T}\text{blockdiag}\left(\frac{1}{{\sigma}_{\delta}^{2}}{\mathcal{\mathcal{I}}}_{r},{\mu}_{q(1\u2215{\sigma}_{\mathbf{c}}^{2})}{\mathcal{\mathcal{I}}}_{{K}_{V}}\right){\theta}_{V}\}]\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}\text{exp}\{-h({\theta}_{V}\mid \cdot )\}.\hfill \end{array}$$

(10)

Here **μ**_{q}_{(b)} and **Σ**_{q(b)} refer to the elements of the variational mean and covariance of * θ* corresponding to the random effects vector

The lack of a conjugate form for * θ_{V}* | · from (8) carries through to the form of

$$\begin{array}{cc}\hfill h({\theta}_{V}\mid \cdot )& \approx h(\alpha \mid \cdot )+{({\theta}_{V}-\alpha )}^{T}{\mathbf{J}}_{{\theta}_{V}}\left(\alpha \right)+\frac{1}{2}{({\theta}_{V}-\alpha )}^{T}{\mathbf{H}}_{{\theta}_{V}}\left(\alpha \right)({\theta}_{V}-\alpha ),\hfill \\ \hfill {\mathbf{J}}_{{\theta}_{V}}\left(\alpha \right)& =\phantom{\rule{thickmathspace}{0ex}}\frac{\partial}{\partial \mathbf{x}}h(\mathbf{x}\mid \cdot ){\mid}_{\mathbf{x}=\alpha},\phantom{\rule{thickmathspace}{0ex}}{\mathbf{H}}_{{\theta}_{V}}\left(\alpha \right)=\phantom{\rule{thickmathspace}{0ex}}\frac{{\partial}^{2}}{{\partial}^{2}\mathbf{x}}h(\mathbf{x}\mid \cdot ){\mid}_{\mathbf{x}=\alpha}.\hfill \end{array}$$

Here **J**_{θV}(* α*) corresponds to the Jacobian vector of first derivatives of

$${\stackrel{~}{q}}^{\ast}\left({\theta}_{V}\right)\propto \text{exp}\left\{-\frac{1}{2}{({\theta}_{V}-\alpha )}^{T}{\mathbf{H}}_{{\theta}_{V}}\left(\alpha \right)({\theta}_{V}-\alpha )\right\}.$$

(11)

This implies that * θ_{V}* has an approximate variational distribution of the form

$${\theta}_{V}\stackrel{{\stackrel{\u2012}{q}}^{\ast}}{~}\mathcal{\mathcal{N}}(\alpha ,{\left({\mathbf{H}}_{{\theta}_{V}}\left(\alpha \right)\right)}^{-1}).$$

(12)

Alternatives to Laplace approximation, particularly the delta method, could be used to address the non-conjugate problem surrounding * θ_{V}* (Wang and Blei, 2013). We choose to use Laplace approximations primarily due to simplicity of implementation, since they require only second- order derivative information whereas the delta method requires third-order.

Using the *q*-densities described in (10) and (12), constructing a variational Bayes approximation of our model consists of updating the parameters associated with each *q*-density until *q*(* ψ*) has sufficiently approximated

The calculation of * α* at each iteration requires the use of a numerical optimization routine. The speed of this optimization procedure determines the overall speed of the algorithm since the other updates have closed forms. For the results presented here, an implementation of the BFGS Quasi-Newton method found in the built-in optim function in R was used for the multivariate optimization problem (Wright and Nocedal, 1999).

The matrices to be inverted in steps 5 and 7 of Algorithm 1 may not be positive definite, perhaps due to poor initial conditions. If an invalid covariance matrix is proposed in a step, we use a ridge adjustment, adding to each diagonal entry of the matrix to be inverted twice the absolute value of its smallest eigenvalue. None of the results presented here required these adjustments.

In fully conjugate variational models such as those in Ormerod and Wand (2010), iteratively updating the variational parameters according to (4) guarantees monotonic increases of log *p*(**y**; *q*) as it converges to a (local) maximum. Embedding a Laplace approximation to handle the non-conjugate structure of *q*(* θ_{V}*) removes this monotonic behavior. Use of an embedded approximation procedure in the context of a variational approximation is generally accepted, however, and can be thought similar to using Metropolis-Hastings within a Gibbs sampler. Nott et al. (2013) formally justify convergence of such an embedded optimization in the context of a fixed form variational Bayes algorithm within a mean field variational Bayes approximation. This is quite similar to our current setting and we therefore conjecture that parameter updates from Algorithm 1 will converge. Empirical results described below support this conjecture.

The evidence lower bound log *p*(**y**; *q*) is the objective function monitored to assess convergence. Wang and Blei (2013) suggest monitoring **μ**_{q(θV)} to assess convergence as well. Appendix B contains the derivation of log *p*(**y**; *q*).

To illustrate our methodology, consider two simulated datasets of *N* = 200 both with a true mean function of *m*(*x*) = −(1/8)(*x* − 5)^{3} + *x* and with true variance functions of *v*_{1}(*x*) = {(1/4)*x* + (1/2)} or *v*_{2}(*x*) = exp {(1/5)(*x* − 5)^{2}}, with *x* [0, 10]. We fit a model of the form described in (7) using a piecewise quadratic basis (*p* = *r* = 2). For both the mean and variance levels, *K* = *K _{V}* = 10 knots were placed at equally spaced quantiles of

Before the algorithm is started, an initial fitting of **y** under a homoskedastic assumption is performed using the same model structure. The corresponding $\widehat{\theta}$ is used as an initial value of **μ**_{q(θ)}. The matrix **Σ**_{q(θ)} is initialized using a diagonal matrix of appropriate scale for the data, and ${B}_{q\left({\sigma}_{\mathbf{b}}^{2}\right)}$ and ${B}_{q\left({\sigma}_{\mathbf{c}}^{2}\right)}$ are initialized to 1/100.

Figure 1 shows the resulting fit of the simulated data under both variance structures using our variational approximation. The approximation closely follows the true mean function in both examples but encounters increased difficulty in areas with higher noise. The shaded areas represent 95% pointwise credible bounds determined by taking a sample of size 10000 from the converged variational distribution of *q*^{}(* θ*). This measure of uncertainty reflects the non-constant variance structure with wider bands being associated with areas of higher variance. The true mean function

Figure 2 shows the estimated log variance functions log(*v*_{1}(*x*)) and log(*v*_{2}(*x*)) associated with the data simulated from *m*(*x*). The estimated curves associated with our variational approximation (solid lines) closely follow the true log variance functions (dashed lines). The 95% pointwise credible bounds calculated using 10000 draws from the variational distribution *q*^{}(* θ_{V}*) cover the true variance function over the range of

We are using an approximate form of the optimal variational density associated with * θ_{V}* and this variational method is itself an approximation. We now consider the quality of our approximation, as compared to the posterior distribution evaluated using a combination of Gibbs sampling and an appropriate multivariate posterior sampling algorithm; in this case, DRAM (see Section 3.1). Figure 3 compares the 95% pointwise credible intervals for both the mean and log variance curves using a simulated sample of 200 points with true curves

The solid light gray regions represent the pointwise 95% credible bounds drawn from the approximating variational distributions *q*_{θ} and ${\stackrel{~}{q}}_{{\theta}_{V}}$ for a simulated data set of 200 observations with *m*(*x*) and *v*_{1}(*x*) as the true mean and variance **...**

When comparing the coverage of the true mean curve, both the variational approximation and the MCMC method are very close. This is not surprising given the conjugate structure of the * θ* parameter that controls the estimate of

Convergence of Algorithm 1 is assessed by monitoring the relative change of log *p*(**y**; *q*). Variational algorithms are known to converge very quickly to appropriate estimates (Ormerod and Wand, 2010). Table 1 contains the run times for the variational approximations associated with each variance curve at increasing sample sizes, *N*. All computations were performed on a MacBook Pro with a 2.3 GHz Intel Core i5 processor and 4 GB of RAM. For all results in Table 1, seven or eight iterations of the algorithm were sufficient to achieve convergence. For comparison, the run time for 10000 iterations of the MCMC procedure described previously are included in Table 1. Our algorithm is slower than those that operate in fully conjugate situations since we have a numerical optimization embedded at each step to estimate * α* from the most current

The model (7) is directly motivated by the analysis of one-dimensional scattering intensity curves from small angle X-ray scattering (SAXS) experiments (reviewed in Putnam et al. (2007)). The variance of SAXS data depends on angle, due to both characteristics of the molecule and characteristics of the detector. The boundary regions of the detector, associated with larger angles of X-ray scattering, have a smaller signal-to-noise ratio since less scattering occurs at those large angles. This causes greater pixel-to-pixel variation for the intensity values in these regions.

Figure 4 shows data from a single SAXS exposure and the resulting variational approximation fits of the nonparametric mean function, using the approach of Ormerod and Wand (2010) for a model with constant errors (top panel) and using our approach for a model with heteroskedastic errors (bottom panel). The data correspond to the log intensity values of a 7-second exposure of a 4 mg/ml sample of the H2AH2B complex (Isenberg, 1979). The *x*-axis of Figure 4 represents the radial distance from the center of the sensor. For this problem, we used *K* = 40 equally-spaced knots *κ*_{1}, …, *κ*_{40} over [0.03, 0.27] to evaluate spline functions for both the mean and log variance curves.

The nonparametric regression model uses a truncated quadratic polynomial basis as in (7), with the exception that there is no linear fixed effect term in the mean (*β*_{1} = 0). This is theoretically justified from physical considerations. Provided the first knot *κ*_{1} is sufficiently close to the origin, a classical analysis yields that expected log intensity at distance *x* is ${\beta}_{0}-(1\u22153)\left({R}_{g}^{2}\right){x}^{2}$, where ${R}_{g}^{2}$ is the squared radius of gyration of the molecule (Guinier, 1939). The use of a truncated quadratic basis is therefore particularly convenient for the estimation of ${R}_{g}^{2}\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}-3{\beta}_{2}$, where *β*_{2} is the coefficient of the *x*^{2} fixed effect. While truncated polynomial basis functions are highly collinear, we did not experience noticeable computational problems arising from our choice of this basis. As mentioned in Eilers and Marx (2010), the singular value decomposition of the random effects matrix **Z** can aid in understanding the numerical stability of this basis. We do not observe any significant computational or numerical problems in using the truncated quadratic basis for the SAXS data.

The mean estimates for both the constant error and heteroskedastic models fit the data equally well, with mean squared errors of 17.36 and 17.02 respectively. As shown in Figure 4, the uncertainty bands associated with the constant error model are fairly uniform across *x*. The heteroskedastic version, on the other hand, has narrow uncertainty bounds at the low to mid regions of *x*, reflecting the pronounced smoothness, while being wider in the large *x* regions. This behavior better reflects the true uncertainty associated with data about the underlying smooth mean function across the covariate *x*. This translates to more appropriate uncertainty estimates for the regression coefficients that comprise * θ*.

Appropriately accounting for the error structure of SAXS data can impact inference on important physical parameters. Here we consider estimation of the radius of gyration, ${R}_{g}^{2}\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}-3{\beta}_{2}$, and the intercept of the log intensity curve, *β*_{0}, which is known to be linearly related to log molecular weight, given fixed concentration and other experimental parameters (Fischer et al., 2009). Our variational method yields the approximate distributions

Variational approximations are increasingly popular tools for approximating computationally difficult problems in an efficient manner. Recent work in the literature has focused on the application of mean field variational Bayes methods to non-conjugate problems. This includes the problem presented here: semiparametric regression with heteroskedastic errors. Our work provides a fast variational approximation by leveraging a Laplace approximation to replace the non-conjugate portion of our model with an appropriate Gaussian distribution. Simulation results show that our approximation performs well at fitting both the mean and variance functions, providing posterior inferences comparable to those of a sophisticated MCMC, at much lower computational cost. The ability to fit variance functions accurately is important in accounting for the true uncertainty in nonparametric fits of mean level data, as we have shown in obtaining corrected interval estimates for key physical parameters of H2AH2B using SAXS imaging data.

Under the mixed model framework, extension of our variational approximation only relies on the inclusion of additional random effect variance terms. While not shown explicitly here, our variational approximation readily handles more complex heteroskedastic models. An area of future interest would be further investigation of how to use our variational method for model comparisons in a semiparametric regression context. Nott et al. (2012) highlight the use of the variational lower bound log *p*(**y**; *q*) for model comparison. We are interested in the use of log *p*(**y**; *q*) in interpreting the significance of parametric-by-nonparametric interaction terms, which have scientific meaning in SAXS data analysis. For example, a significant interaction between parametric exposure time and nonparametric angle is suggestive of radiation damage to the molecule in solution.

Radial variance data are typically available alongside radial average data in a SAXS experiment. We are currently extending our variational approximations to handle joint models of radial average data and radial variance data in a complete SAXS experiment, consisting of repeated samples with varying concentrations and exposure times. The goal is to provide a fast, complete framework for the analysis of SAXS data and provide more appropriate uncertainty estimates for inference on structural features.

This research was supported by the Joint NSF/NIGMS Initiative to Support Research in the Area of Mathematical Biology (R01GM096192). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Sciences or the National Institutes of Health. This work was completed as a part of Bruce Bugbee’s Ph.D. dissertation work while a student at Colorado State University.

The following section contains the derivation of the *q*-densities described in (10).

Using the form of *p*(* θ* | ·) from (8), we can derive

$$\begin{array}{cc}\hfill {q}_{\theta}\phantom{\rule{thickmathspace}{0ex}}& \propto \text{exp}\left[{E}_{-\theta}\left\{\text{log}p(\theta \mid \cdot )\right\}\right]\hfill \\ \hfill & \propto \text{exp}\left[{E}_{-\theta}\left\{-\frac{1}{2}{(\theta -{\mathbf{MC}}^{T}{\Sigma}^{-1}\mathbf{y})}^{T}{\mathbf{M}}^{-1}(\theta -{\mathbf{MC}}^{T}{\Sigma}^{-1}\mathbf{y})\right\}\right]\hfill \\ \hfill & \propto \text{exp}\left\{-\frac{1}{2}{E}_{-\theta}({\theta}^{T}{\mathbf{M}}^{-1}\theta -{\mathbf{y}}^{T}{\Sigma}^{-1}\mathbf{C}\theta -{\theta}^{T}{\mathbf{C}}^{T}{\Sigma}^{-1}\mathbf{y})\right\}.\hfill \end{array}$$

The quadratic term is

$$\begin{array}{cc}\hfill {E}_{-\theta}\left({\theta}^{T}{\mathbf{M}}^{-1}\theta \right)& \phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{E}_{-\theta}\left\{{\theta}^{T}({\Sigma}_{\theta}^{-1}+{\mathbf{C}}^{T}{\Sigma}^{-1}\mathbf{C})\theta \right\}\hfill \\ \hfill & \phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{\theta}^{T}\left\{\text{blockdiag}\left(\frac{1}{{\sigma}_{\beta}^{2}}{\mathcal{\mathcal{I}}}_{p},{\mu}_{q}(1\u2215{\sigma}_{\mathrm{b}}^{2}){\mathcal{\mathcal{I}}}_{K}\right)+{\mathbf{C}}^{T}\Gamma \mathbf{C}\right\}\theta .\hfill \end{array}$$

Define ${\Sigma}_{q\left(\theta \right)}\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{\left\{\text{blockdiag}\left(\frac{1}{{\sigma}_{\beta}^{2}}{\mathcal{\mathcal{I}}}_{p},{\mu}_{q(1\u2215{\sigma}_{\mathbf{b}}^{2})}{\mathcal{\mathcal{I}}}_{K}\right)+{\mathbf{C}}^{T}\Gamma \mathbf{C}\right\}}^{-1}$. Here **Γ** = diag(*γ*_{1}, *γ*_{2}, …, *γ*_{N}). In this context ${\gamma}_{i}\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{E}_{-\theta}\left\{\text{exp}(-{\mathbf{C}}_{\mathbf{V}i}^{T}{\theta}_{V})\right\}$ corresponds to the moment generating function of *q _{θV}* evaluated at the vector $-{\mathbf{C}}_{\mathbf{V}i}^{T}$.

The expected value of the linear term is

$${E}_{-\theta}\left({\mathbf{y}}^{T}{\Sigma}^{-1}\mathbf{C}\theta \right)\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{E}_{-\theta}\left\{\sum _{i=1}^{N}{y}_{i}\left({\mathbf{C}}_{i}^{T}\theta \right)\frac{1}{{\sigma}_{i}^{2}}\right\}\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{\mathbf{y}}^{T}\Gamma \mathbf{C}\theta .$$

Using these results and completing the square, we arrive at the form of the optimal *q _{θ}*:

$$\begin{array}{cc}\hfill {q}_{\theta}& \propto \text{exp}\left\{-\frac{1}{2}{(\theta -{\Sigma}_{q\left(\theta \right)}{\mathbf{C}}^{T}\Gamma \mathbf{y})}^{T}{{\Sigma}_{q\left(\theta \right)}}^{-1}(\theta -{\Sigma}_{q\left(\theta \right)}{\mathbf{C}}^{T}\Gamma \mathbf{y})\right\}\hfill \\ \hfill \theta & \stackrel{q}{~}\mathcal{\mathcal{N}}\left(\underset{{\mu}_{q\left(\theta \right)}}{\overset{{\Sigma}_{q\left(\theta \right)}{\mathbf{C}}^{T}\Gamma \mathbf{y}}{\underbrace{}}},{\Sigma}_{q\left(\theta \right)}\right).\hfill \end{array}$$

The variational density for ${\sigma}_{\mathbf{b}}^{2}$ is

$$\begin{array}{cc}\hfill {q}_{{\sigma}_{\mathbf{b}}^{2}}& \propto \text{exp}\left[{E}_{-{\sigma}_{\mathbf{b}}^{2}}\left\{\text{log}p({\sigma}_{\mathbf{b}}^{2}\mid \cdot )\right\}\right]\hfill \\ \hfill & \propto \text{exp}\left[{E}_{-{\sigma}_{\mathbf{b}}^{2}}\left\{-\left({A}_{\mathbf{b}}+\frac{K}{2}\right)\text{log}\left({\sigma}_{\mathbf{b}}^{2}\right)-\frac{1}{{\sigma}_{\mathbf{b}}^{2}}({B}_{\mathbf{b}}+\parallel \mathbf{b}{\parallel}^{2})\right\}\right]\hfill \\ \hfill & \propto {\left({\sigma}_{\mathbf{b}}^{2}\right)}^{-{A}_{\mathbf{b}}-K\u22152-1}\text{exp}\left[-\frac{1}{{\sigma}_{\mathbf{b}}^{2}}\left\{{B}_{\mathbf{b}}+{E}_{-{\sigma}_{\mathbf{b}}^{2}}(\parallel \mathbf{b}{\parallel}^{2})\right\}\right]\hfill \\ \hfill & \propto {\left({\sigma}_{\mathbf{b}}^{2}\right)}^{-{A}_{\mathbf{b}}-K\u22152-1}\text{exp}\left[-\frac{1}{{\sigma}_{\mathbf{b}}^{2}}\left\{{B}_{\mathbf{b}}+\parallel {\mu}_{q\left(\mathbf{b}\right)}{\parallel}^{2}+\text{trace}\left({\Sigma}_{q\left(\mathbf{b}\right)}\right)\right\}\right]\hfill \\ \hfill {\sigma}_{\mathbf{b}}^{2}& \stackrel{q}{~}\mathcal{\mathcal{I}\mathcal{G}}\left({A}_{\mathbf{b}}+\frac{{K}_{V}}{2},\underset{{B}_{q\left({\sigma}_{\mathbf{b}}^{2}\right)}}{\overset{{B}_{\mathbf{b}}+\frac{1}{2}\{\parallel {\mu}_{q\left(\mathbf{b}\right)}{\parallel}^{2}+\text{trace}\left({\Sigma}_{q\left(\mathrm{b}\right)}\right)\}}{\underbrace{}}}\right).\hfill \end{array}$$

Here **μ**_{q(b)} and **Σ**_{q(b)} refer to the portion of **μ**_{q(θ)} and **Σ**_{q(θ)} associated with the random effects parameters. Under this variational distribution,

$$E\left[\frac{1}{{\sigma}_{\mathbf{b}}^{2}}\right]\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}\frac{{A}_{\mathbf{b}}+K\u22152}{{B}_{q\left({\sigma}_{\mathbf{b}}^{2}\right)}}.$$

Given the similar structure between *p*(${\sigma}_{\mathbf{b}}^{2}$ | ·) and *p*(${\sigma}_{\mathbf{c}}^{2}$ | ·), the derivation of ${q}_{{\sigma}_{\mathbf{c}}^{2}}$ is the same:

$$\begin{array}{cc}\hfill {q}_{{\sigma}_{\mathbf{c}}^{2}}& \propto {\left({\sigma}_{\mathbf{c}}^{2}\right)}^{-{A}_{\mathbf{c}}-{K}_{V}\u22152-1}\text{exp}\left[-\frac{1}{{\sigma}_{\mathbf{c}}^{2}}\{{B}_{\mathbf{c}}+\parallel {\mu}_{q\left(\mathbf{c}\right)}{\parallel}^{2}+\text{trace}\left({\Sigma}_{q\left(\mathbf{c}\right)}\right)\}\right]\hfill \\ \hfill {\sigma}_{\mathbf{c}}^{2}& \stackrel{q}{~}\mathcal{\mathcal{I}\mathcal{G}}\left({A}_{\mathbf{c}}+\frac{{K}_{V}}{2},\underset{{B}_{q\left({\sigma}_{\mathbf{c}}^{2}\right)}}{\overset{{B}_{\mathbf{c}}+\frac{1}{2}(\parallel {\mu}_{q\left(\mathbf{c}\right)}{\parallel}^{2}+\text{trace}\left({\Sigma}_{q\left(\mathrm{c}\right)}\right))}{\underbrace{}}}\right).\hfill \end{array}$$

As before, **μ**_{q(c)} and **Σ**_{q(c)} refer to the portion of **μ**_{q(θV)} and **Σ**_{q(θV)} associated with the random effects parameters. As with $1\u2215{\sigma}_{\mathbf{b}}^{2}$,

$$E\left[\frac{1}{{\sigma}_{\mathbf{c}}^{2}}\right]\phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}\frac{{A}_{\mathbf{c}}+K\u22152}{{B}_{q\left({\sigma}_{\mathbf{c}}^{2}\right)}}.$$

As highlighted in (8), the posterior conditional *p*(* θ_{V}* | ·) does not have a known form. Using (4), the optimal

$$\begin{array}{cc}\hfill {q}_{{\theta}_{V}}& \propto \text{exp}\left[{E}_{-{\theta}_{V}}\left\{\text{log}p({\theta}_{V}\mid \cdot )\right\}\right]\hfill \\ \hfill & \propto \text{exp}(-\frac{1}{2}[\sum _{i=1}^{N}{\mathbf{C}}_{{\mathbf{V}}_{i}}^{T}\phantom{]}{\theta}_{V}+\sum _{i=1}^{N}\left\{{({y}_{i}-{{\mathbf{C}}_{i}}^{T}{\mu}_{q\left(\theta \right)})}^{2}+{\mathbf{C}}_{i}^{T}{\Sigma}_{q\left(\theta \right)}{\mathbf{C}}_{i}\right\}\text{exp}(-{\mathbf{C}}_{\mathbf{V}i}^{T}{\theta}_{V})\phantom{)}\hfill \\ \hfill & \phantom{(}\phantom{[}+{\theta}_{V}^{T}\text{blockdiag}\left(\frac{1}{{\sigma}_{\delta}^{2}}{\mathcal{\mathcal{I}}}_{r},{\mu}_{q(1\u2215{\sigma}_{\mathbf{c}}^{2})}{\mathcal{\mathcal{I}}}_{{K}_{V}}\right){\theta}_{V}]).\hfill \end{array}$$

Let Γ(*x*) denote the standard gamma function. Recall that the density *q*(* ψ*) is assumed to have the product density structure described in Section 4. Then

$$\begin{array}{cc}\hfill \text{log}\underset{\u2012}{p}(\mathbf{y};q)& \phantom{\rule{thickmathspace}{0ex}}=\phantom{\rule{thickmathspace}{0ex}}{\int}_{\Psi}q\left(\psi \right)[\text{log}p(\mathbf{y},\psi )-\text{log}q\left(\psi \right)]d\psi \hfill \\ \hfill & ={\int}_{\Psi}q\left(\psi \right)[\text{log}p(\mathbf{y}\mid \psi )+\text{log}p\left(\psi \right)-\text{log}q\left(\psi \right)]d\psi \hfill \\ \hfill & ={\int}_{\Psi}q\left(\psi \right)[\text{log}p(\mathbf{y}\mid \psi )+\text{log}p\left(\theta \right)+\text{log}p\left({\sigma}_{\mathbf{b}}^{2}\right)+\text{log}p\left({\theta}_{V}\right)+\text{log}p\left({\sigma}_{\mathbf{c}}^{2}\right)-\text{log}q\left(\psi \right)]d\psi \hfill \\ \hfill & =-\frac{1}{2}\text{log}\left(2\pi \right)N-\frac{p}{2}\text{log}\left({\sigma}_{\beta}^{2}\right)-\frac{r}{2}\text{log}\left({\sigma}_{\delta}^{2}\right)+{A}_{\mathbf{b}}\text{log}\left({B}_{\mathbf{b}}\right)-\text{log}(\Gamma \left({A}_{\mathbf{b}}\right))\hfill \\ \hfill & +{A}_{\mathbf{c}}\text{log}\left({B}_{\mathbf{b}}\right)-\text{log}(\Gamma \left({A}_{\mathbf{c}}\right))+\frac{1}{2}\text{log}(\mid {\Sigma}_{q\left(\theta \right)}\mid )+\frac{1}{2}\text{log}(\mid {\Sigma}_{q\left({\theta}_{V}\right)}\mid )\hfill \\ \hfill & -\left({A}_{\mathbf{b}}+\frac{K}{2}\right)\text{log}\left({B}_{q}\left({\sigma}_{\mathbf{b}}^{2}\right)\right)-\left({A}_{c}+\frac{{K}_{V}}{2}\text{log}\left({B}_{q\left({\sigma}_{\mathbf{c}}^{2}\right)}\right)\right)\hfill \\ \hfill & +\text{log}\left(\Gamma \left({A}_{\mathbf{c}}+\frac{{K}_{V}}{2}\right)\right)+\text{log}\left(\Gamma \left({A}_{\mathbf{b}}+\frac{K}{2}\right)\right)\hfill \\ \hfill & +\frac{1}{2{\sigma}_{\beta}^{2}}(\parallel {\mu}_{q\left(\beta \right)}{\parallel}^{2}+\text{trace}\left({\Sigma}_{q\left(\beta \right)}\right))+\frac{1}{2{\sigma}_{\delta}^{2}}(\parallel {\mu}_{q\left(\delta \right)}{\parallel}^{2}+\text{trace}\left({\Sigma}_{q\left(\delta \right)}\right))\hfill \\ \hfill & -\frac{1}{2}\sum _{i=1}^{N}\left\{{({y}_{i}-{\mathbf{C}}_{i}^{T}{\mu}_{q\left(\theta \right)})}^{2}+{\mathbf{C}}_{i}^{T}{\Sigma}_{q\left(\theta \right)}{\mathbf{C}}_{i}\right\}\text{exp}\left(-{\mathbf{C}}_{\mathbf{V}i}^{T}{\mu}_{q\left({\theta}_{V}\right)}+\frac{1}{2}{\mathbf{C}}_{\mathbf{V}i}^{T}{\Sigma}_{q\left(\theta \right)}{\mathbf{C}}_{\mathbf{V}i}^{T}\right)\hfill \\ \hfill & +\frac{p+K}{2}+\frac{r+{K}_{V}}{2}.\hfill \end{array}$$

**Supplemental Materials: Variational Method Code.** R code for performing the variational approximation method described in this paper is included. Also with this code are instructions for creating the simulated data sets described here as well as the experimental SAXS data used. (Zip file)

- Attias H. A Variational Bayesian Framework for Graphical Models. Advances in Neural Information Processing Systems. 2000;12:209–215.
- Bishop CM. Pattern Recognition and Machine Learning. Information Science and Statistics, Springer; 2006.
- Blei DM, Lafferty JD. A Correlated Topic Model of Science. The Annals of Applied Statistics. 2007;1:17–35.
- Breidt FJ, Erciulescu A, van der Woerd M. Autocovariance Structures for Radial Averages in Small-Angle X-Ray Scattering Experiments. Journal of Time Series Analysis. 2012;33:704–717. [PMC free article] [PubMed]
- Carlin BP, Polson NG. Inference for Nonconjugate Bayesian Models Using the Gibbs Sampler. Canadian Journal of Statistics. 1991;19:399–405.
- Casella G, George EI. Explaining the Gibbs Sampler. The American Statistician. 1992;46:167–174.
- Chan D, Kohn R, Nott D, Kirby C. Locally Adaptive Semiparametric Estimation of the Mean and Variance Functions in Regression Models. Journal of Computational and Graphical Statistics. 2006;15:915–936.
- Crainiceanu CM, Ruppert D, Carroll RJ, Joshi A, Goodner B. Spatially Adaptive Bayesian Penalized Splines with Heteroscedastic Errors. Journal of Computational and Graphical Statistics. 2007;16:265–288.
- Eilers PH, Marx BD. Splines, Knots, and Penalties. Wiley Interdisciplinary Reviews: Computational Statistics. 2010;2:637–653.
- Eilers PHC, Marx BD. Flexible Smoothing with B-splines and Penalties. Statistical Science. 1996;11:89–102.
- Faes C, Ormerod J, Wand M. Variational Bayesian Inference for Parametric and Nonparametric Regression with Missing Data. Journal of the American Statistical Association. 2011;106:959–971.
- Fischer H, Oliveira Neto M. d., Napolitano H, Polikarpov I, Craievich A. Determination of the Molecular Weight of Proteins in Solution from a Single Small-Angle X-Ray Scattering Measurement on a Relative Scale. Journal of Applied Crystallography. 2009;43:101–109.
- Guinier A. La Diffraction des Rayons X aux Très Petits Angles: Application a l’É tude de Phénoménes Ultramicrosopiques. Annales de Physique. 1939;12:161–237.
- Haario H, Laine M, Mira A, Saksman E. DRAM: Efficient Adaptive MCMC. Statistics and Computing. 2006;16:339–354.
- Isenberg I. Histones. Annual Review of Biochemistry. 1979;48:159–191. [PubMed]
- Jaakkola T, Jordan M. A Variational Approach to Bayesian Logistic Regression Models and Their Extensions. Sixth International Workshop on Artificial Intelligence and Statistics.1997.
- Knowles DA, Minka T. Non-Conjugate Variational Message Passing for Multinomial and Binary Regression. Advances in Neural Information Processing Systems. 2011:1701–1709.
- Kullback S, Leibler RA. On Information and Sufficiency. The Annals of Mathematical Statistics. 1951;22:79–86.
- Minka T, Winn J, Guiver J, Knowles D. Infer.NET 2.4. Microsoft Research Cambridge. 2010 http://research.microsoft.com/infernet.
- Nott DJ, Tran M-N, Kuk AY, Kohn R. Efficient Variational Inference for Generalized Linear Mixed Models with Large Datasets. arXiv preprint arXiv:1307.7963. 2013
- Nott DJ, Tran M-N, Leng C. Variational Approximation for Heteroscedastic Linear Models and Matching Pursuit Algorithms. Statistics and Computing. 2012;22:497–512.
- Ormerod J, Wand M. Explaining Variational Approximations. The American Statistician. 2010;64:140–153.
- Pham TH, Ormerod JT, Wand M. Mean Field Variational Bayesian Inference for Nonparametric Regression with Measurement Error. Computational Statistics & Data Analysis. 2013;68:375–387.
- Putnam C, Hammel M, Hura G, Tainer J. X-Ray Solution Scattering (SAXS) Combined with Crystallography and Computation: Defining Accurate Macromolecular Structures, Conformations and Assemblies in Solution. Quarterly Reviews of Biophysics. 2007;40:191–285. [PubMed]
- Rue H, Martino S, Chopin N. Approximate Bayesian Inference for Latent Gaussian models by Using Integrated Nested Laplace Approximations. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2009;71:319–392.
- Ruppert D, Wand MP, Carroll RJ. Semiparametric Regression. Vol. 12. Cambridge University Press; 2003.
- Salimans T, Knowles DA. Fixed-Form Variational Posterior Approximation Through Stochastic Linear Regression. Bayesian Analysis. 2013;8:837–882.
- Tan LS, Nott DJ. Variational Inference for Generalized Linear Mixed Models Using Partially Noncentered Parametrizations. Statistical Science. 2013;28:168–188.
- Titterington D. Bayesian Methods for Neural Networks and Related Models. Statistical Science. 2004;19:128–139.
- Wainwright MJ, Jordan MI. Graphical Models, Exponential Families, and Variational Inference. Foundations and Trends in Machine Learning. 2008;1:1–305.
- Wand M, Ormerod J. On Semiparametric Regression with O’Sullivan Penalized Splines. Australian & New Zealand Journal of Statistics. 2008;50:179–198.
- Wand MP, Ormerod JT, Padoan SA, Frührwirth R. Mean Field Variational Bayes for Elaborate Distributions. Bayesian Analysis. 2011;6:1–48. [PubMed]
- Wang C, Blei DM. Variational Inference in Nonconjugate Models. The Journal of Machine Learning Research. 2013;14:1005–1031.
- Wang S, Wand M. Using Infer.NET for Statistical Analyses. The American Statistician. 2011;65:115.
- Wright S, Nocedal J. Numerical Optimization. Vol. 2. Springer; New York: 1999.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |