Longitudinal data are routinely collected in biomedical research studies. A natural model describing longitudinal data decomposes an individual’s outcome as the sum of a population mean function and random subject-specific deviations. When parametric assumptions are too restrictive, methods modeling the population mean function and the random subject-specific functions nonparametrically are in demand. In some applications, it is desirable to estimate a covariance function of random subject-specific deviations. In this work, flexible yet computationally efficient methods are developed for a general class of semiparametric mixed effects models, where the functional forms of the population mean and the subject-specific curves are unspecified. We estimate nonparametric components of the model by penalized spline (P-spline, ), and reparametrize the random curve covariance function by a modified Cholesky decomposition  which allows for unconstrained estimation of a positive semidefinite matrix. To provide smooth estimates, we penalize roughness of fitted curves and derive closed form solutions in the maximization step of an EM algorithm. In addition, we present models and methods for longitudinal family data where subjects in a family are correlated and we decompose the covariance function into a subject-level source and observation-level source. We apply these methods to the multi-level Framingham Heart Study data to estimate age-specific heritability of systolic blood pressure (SBP) nonparametrically.
Multi-level functional data; Cholesky decomposition; Age-specific heritability; Framingham Heart Study
In this work, we propose penalized spline based methods for functional mixed effects models with varying coefficients. We decompose longitudinal outcomes as a sum of several terms: a population mean function, covariates with time-varying coefficients, functional subject-specific random effects and residual measurement error processes. Using penalized splines, we propose nonparametric estimation of the population mean function, varying-coefficient, random subject-specific curves and the associated covariance function which represents between-subject variation and the variance function of the residual measurement errors which represents within-subject variation. Proposed methods offer flexible estimation of both the population-level and subject-level curves. In addition, decomposing variability of the outcomes as a between-subject and a within-subject source is useful in identifying the dominant variance component therefore optimally model a covariance function. We use a likelihood based method to select multiple smoothing parameters. Furthermore, we study the asymptotics of the baseline P-spline estimator with longitudinal data. We conduct simulation studies to investigate performance of the proposed methods. The benefit of the between- and within-subject covariance decomposition is illustrated through an analysis of Berkeley growth data where we identified clearly distinct patterns of the between- and within-subject covariance functions of children's heights. We also apply the proposed methods to estimate the effect of anti-hypertensive treatment from the Framingham Heart Study data.
Multi-level functional data; Functional random effects; Semiparametric longitudinal data analysis
Large point referenced datasets occur frequently in the environmental and natural sciences. Use of Bayesian hierarchical spatial models for analyzing these datasets is undermined by onerous computational burdens associated with parameter estimation. Low-rank spatial process models attempt to resolve this problem by projecting spatial effects to a lower-dimensional subspace. This subspace is determined by a judicious choice of “knots” or locations that are fixed a priori. One such representation yields a class of predictive process models (e.g., Banerjee et al., 2008) for spatial and spatial-temporal data. Our contribution here expands upon predictive process models with fixed knots to models that accommodate stochastic modeling of the knots. We view the knots as emerging from a point pattern and investigate how such adaptive specifications can yield more flexible hierarchical frameworks that lead to automated knot selection and substantial computational benefits.
Bayesian hierarchical models; Gaussian process; Intensity surfaces; Low-rank models; Markov chain Monte Carlo; Predictive process
With many predictors, choosing an appropriate subset of the covariates is a crucial, and difficult, step in nonparametric regression. We propose a Bayesian nonparametric regression model for curve-fitting and variable selection. We use the smoothing spline ANOVA framework to decompose the regression function into interpretable main effect and interaction functions. Stochastic search variable selection via MCMC sampling is used to search for models that fit the data well. Also, we show that variable selection is highly-sensitive to hyperparameter choice and develop a technique to select hyperparameters that control the long-run false positive rate. The method is used to build an emulator for a complex computer model for two-phase fluid flow.
Bayesian hierarchical modeling; Nonparametric regression; Markov Chain Monte Carlo; Smoothing splines ANOVA; Variable selection
Latent class models (LCMs) are used increasingly for addressing a broad variety of problems, including sparse modeling of multivariate and longitudinal data, model-based clustering, and flexible inferences on predictor effects. Typical frequentist LCMs require estimation of a single finite number of classes, which does not increase with the sample size, and have a well-known sensitivity to parametric assumptions on the distributions within a class. Bayesian nonparametric methods have been developed to allow an infinite number of classes in the general population, with the number represented in a sample increasing with sample size. In this article, we propose a new nonparametric Bayes model that allows predictors to flexibly impact the allocation to latent classes, while limiting sensitivity to parametric assumptions by allowing class-specific distributions to be unknown subject to a stochastic ordering constraint. An efficient MCMC algorithm is developed for posterior computation. The methods are validated using simulation studies and applied to the problem of ranking medical procedures in terms of the distribution of patient morbidity.
Factor analysis; Latent variables; Mixture model; Model-based clustering; Nested Dirichlet process; Order restriction; Random probability measure; Stick breaking
We present a novel cosine series representation for encoding fiber bundles consisting of multiple 3D curves. The coordinates of curves are parameterized as coefficients of cosine series expansion. We address the issue of registration, averaging and statistical inference on curves in a unified Hilbert space framework. Unlike traditional splines, the proposed method does not have internal knots and explicitly represents curves as a linear combination of cosine basis. This simplicity in the representation enables us to design statistical models, register curves and perform subsequent analysis in a more unified statistical framework than splines.
The proposed representation is applied in characterizing abnormal shape of white matter fiber tracts passing through the splenium of the corpus callosum in autistic subjects. For an arbitrary tract, a 19 degree expansion is usually found to be sufficient to reconstruct the tract with 60 parameters.
Cosine series representation; Curve registration; Curve modeling; Fourier descriptor; Diffusion tensor imaging; White matter tracts
We previously developed a flexible specification of the UNAIDS Estimation and Projection Package (EPP) that relied on splines to generate time-varying values for the force of infection parameter. Here, we test the feasibility of this approach for concentrated HIV/AIDS epidemics with very sparse data and compare two methods for making short-term future projections with the spline-based model.
Penalised B-splines are used to model the average infection risk over time within the EPP 2011 modelling framework, which includes antiretroviral treatment effects and CD4 cell count progression, and is fit to sentinel surveillance prevalence data with a Bayesian algorithm. We compare two approaches for future projections: (1) an informative prior related to equilibrium prevalence and (2) a random walk formulation.
The spline-based model produced plausible fits across a range of epidemics, which included 87 subpopulations from 14 countries with concentrated epidemics and 75 subpopulations from 33 countries with generalised epidemics. The equilibrium prior and random walk approaches to future projections yielded similar prevalence estimates, and both performed well in tests of out-of-sample predictive validity for prevalence. In contrast, in some cases the two approaches varied substantially in estimates of incidence, with the random walk formulation avoiding extreme changes in incidence.
A spline-based approach to allowing the force of infection parameter to vary over time within EPP 2011 is robust across a diverse array of epidemics, including concentrated ones with limited surveillance data. Future work on the EPP model should consider the impact that different modelling approaches have on estimates of HIV incidence.
HIV; Surveillance; Mathematical Model
An overview is provided of the methodologies used in determining the time to steady state for Phase 1 multiple dose studies. These methods include NOSTASOT (no-statistical-significance-of-trend), Helmert contrasts, spline (quadratic) regression, effective half life for accumulation, nonlinear mixed effects modeling, and Bayesian approach using Markov Chain Monte Carlo (MCMC) methods. For each methodology we describe its advantages and disadvantages. The first two methods do not require any distributional assumptions for the pharmacokinetic (PK) parameters and are limited to average assessment of steady state. Also spline regression which provides both average and individual assessment of time to steady state does not require any distributional assumptions for the PK parameters. On the other hand, nonlinear mixed effects modeling and Bayesian hierarchical modeling which allow for the estimation of both population and subject-specific estimates of time to steady state do require distributional assumptions on PK parameters. The current investigation presents eight case studies for which the time to steady state was assessed using the above mentioned methodologies. The time to steady state estimates obtained from nonlinear mixed effects modeling, Bayesian hierarchal approach, effective half life, and spline regression were generally similar.
effective half-life; Helmert contrasts; nonlinear mixed effect modeling; no-statistical-significance-of-trend; steady state
We sough to investigate the effect of serum uric acid (SUA) levels on risk of cancer incidence in men and to flexibly determine the shape of this association by using a novel analytical approach.
A population-based cohort of 78,850 Austrian men who received 264,347 serial SUA measurements was prospectively followed-up for a median of 12.4 years. Data were collected between 1985 and 2003. Penalized splines (P-splines) in extended Cox-type additive hazard regression were used to flexibly model the association between SUA, as a time-dependent covariate, and risk of overall and site-specific cancer incidence and to calculate adjusted hazard ratios with their 95% confidence intervals.
During follow-up 5189 incident cancers were observed. Restricted maximum-likelihood optimizing P-spline models revealed a moderately J-shaped effect of SUA on risk of overall cancer incidence, with statistically significantly increased hazard ratios in the upper third of the SUA distribution. Increased SUA (≥8.00 mg/dL) further significantly increased risk for several site-specific malignancies, with P-spline analyses providing detailed insight about the shape of the association with these outcomes.
Our study is the first to demonstrate a dose–response association between SUA and cancer incidence in men, simultaneously reporting on the usefulness of a novel methodological framework in epidemiologic research.
Cancer incidence; Epidemiology; Extended Cox-type additive hazard regression; Men; Penalized splines; Risk factor; Serum uric acid
Flexible multilevel models are proposed to allow for cluster-specific smooth estimation of growth curves in a mixed-effects modeling format that includes subject-specific random effects on the growth parameters. Attention is then focused on models that examine between-cluster comparisons of the effects of an ecologic covariate of interest (e.g. air pollution) on nonlinear functionals of growth curves (e.g. maximum rate of growth). A Gibbs sampling approach is used to get posterior mean estimates of nonlinear functionals along with their uncertainty estimates. A second-stage ecologic random-effects model is used to examine the association between a covariate of interest (e.g. air pollution) and the nonlinear functionals. A unified estimation procedure is presented along with its computational and theoretical details. The models are motivated by, and illustrated with, lung function and air pollution data from the Southern California Children's Health Study.
Air pollution; Correlated data; Growth curves; Mixed-effects; Splines
When model parameters in systems biology are not available from experiments, they need to be inferred so that the resulting simulation reproduces the experimentally known phenomena. For the purpose, Bayesian statistics with Markov chain Monte Carlo (MCMC) is a useful method. Conventional MCMC needs likelihood to evaluate a posterior distribution of acceptable parameters, while the approximate Bayesian computation (ABC) MCMC evaluates posterior distribution with use of qualitative fitness measure. However, none of these algorithms can deal with mixture of quantitative, i.e., likelihood, and qualitative fitness measures simultaneously. Here, to deal with this mixture, we formulated Bayesian formula for hybrid fitness measures (HFM). Then we implemented it to MCMC (MCMC-HFM). We tested MCMC-HFM first for a kinetic toy model with a positive feedback. Inferring kinetic parameters mainly related to the positive feedback, we found that MCMC-HFM reliably infer them using both qualitative and quantitative fitness measures. Then, we applied the MCMC-HFM to an apoptosis signal transduction network previously proposed. For kinetic parameters related to implicit positive feedbacks, which are important for bistability and irreversibility of the output, the MCMC-HFM reliably inferred these kinetic parameters. In particular, some kinetic parameters that have experimental estimates were inferred without using these data and the results were consistent with experiments. Moreover, for some parameters, the mixed use of quantitative and qualitative fitness measures narrowed down the acceptable range of parameters.
The recovery of gradients of sparsely observed functional data is a challenging ill-posed inverse problem. Given observations of smooth curves (e.g., growth curves) at isolated time points, the aim is to provide estimates of the underlying gradients (or growth velocities). To address this problem, we develop a Bayesian inversion approach that models the gradient in the gaps between the observation times by a tied-down Brownian motion, conditionally on its values at the observation times. The posterior mean and covariance kernel of the growth velocities are then found to have explicit and computationally tractable representations in terms of quadratic splines. The hyperparameters in the prior are specified via nonparametric empirical Bayes, with the prior precision matrix at the observation times estimated by constrained ℓ1 minimization. The infinitessimal variance of the Brownian motion prior is selected by cross-validation. The approach is illustrated using both simulated and real data examples.
Growth trajectories; Functional data analysis; Ill-posed inverse problem; Nonparametric Empirical Bayes; Tied-down Brownian motion
Gene duplication with subsequent interaction divergence is one of the primary driving forces in the evolution of genetic systems. Yet little is known about the precise mechanisms and the role of duplication divergence in the evolution of protein networks from the prokaryote and eukaryote domains. We developed a novel, model-based approach for Bayesian inference on biological network data that centres on approximate Bayesian computation, or likelihood-free inference. Instead of computing the intractable likelihood of the protein network topology, our method summarizes key features of the network and, based on these, uses a MCMC algorithm to approximate the posterior distribution of the model parameters. This allowed us to reliably fit a flexible mixture model that captures hallmarks of evolution by gene duplication and subfunctionalization to protein interaction network data of Helicobacter pylori and Plasmodium falciparum. The 80% credible intervals for the duplication–divergence component are [0.64, 0.98] for H. pylori and [0.87, 0.99] for P. falciparum. The remaining parameter estimates are not inconsistent with sequence data. An extensive sensitivity analysis showed that incompleteness of PIN data does not largely affect the analysis of models of protein network evolution, and that the degree sequence alone barely captures the evolutionary footprints of protein networks relative to other statistics. Our likelihood-free inference approach enables a fully Bayesian analysis of a complex and highly stochastic system that is otherwise intractable at present. Modelling the evolutionary history of PIN data, it transpires that only the simultaneous analysis of several global aspects of protein networks enables credible and consistent inference to be made from available datasets. Our results indicate that gene duplication has played a larger part in the network evolution of the eukaryote than in the prokaryote, and suggests that single gene duplications with immediate divergence alone may explain more than 60% of biological network data in both domains.
The importance of gene duplication to biological evolution has been recognized since the 1930s. For more than a decade, substantial evidence has been collected from genomic sequence data in order to elucidate the importance and the mechanisms of gene duplication; however, most biological characteristics arise from complex interactions between the cell's numerous constituents. Recently, preliminary descriptions of the protein interaction networks have become available for species of different domains. Adapting novel techniques in stochastic simulation, the authors demonstrate that evolutionary inferences can be drawn from large-scale, incomplete network data by fitting a stochastic model of network growth that captures hallmarks of evolution by duplication and divergence. They have also analyzed the effect of summarizing protein networks in different ways, and show that a reliable and consistent analysis requires many aspects of network data to be considered jointly; in contrast to what is commonly done in practice. Their results indicate that duplication and divergence has played a larger role in the network evolution of the eukaryote P. falciparum than in the prokaryote H. pylori, and emphasize at least for the eukaryote the potential importance of subfunctionalization in network evolution.
The problem of simultaneous covariate selection and parameter inference for spatial regression models is considered. Previous research has shown that failure to take spatial correlation into account can influence the outcome of standard model selection methods. A Markov chain Monte Carlo (MCMC) method is investigated for the calculation of parameter estimates and posterior model probabilities for spatial regression models. The method can accommodate normal and non-normal response data and a large number of covariates. Thus the method is very flexible and can be used to fit spatial linear models, spatial linear mixed models, and spatial generalized linear mixed models (GLMMs). The Bayesian MCMC method also allows a priori unequal weighting of covariates, which is not possible with many model selection methods such as Akaike's information criterion (AIC). The proposed method is demonstrated on two data sets. The first is the whiptail lizard data set which has been previously analyzed by other researchers investigating model selection methods. Our results confirmed the previous analysis suggesting that sandy soil and ant abundance were strongly associated with lizard abundance. The second data set concerned pollution tolerant fish abundance in relation to several environmental factors. Results indicate that abundance is positively related to Strahler stream order and a habitat quality index. Abundance is negatively related to percent watershed disturbance.
Mapping multiple quantitative trait loci (QTL) is commonly viewed as a problem of model selection. Various model selection criteria have been proposed, primarily in the non-Bayesian framework. The deviance information criterion (DIC) is the most popular criterion for Bayesian model selection and model comparison but has not been applied to Bayesian multiple QTL mapping. A derivation of the DIC is presented for multiple interacting QTL models and calculation of the DIC is demonstrated using posterior samples generated by Markov chain Monte Carlo (MCMC) algorithms. The DIC measures posterior predictive error by penalizing the fit of a model (deviance) by its complexity, determined by the effective number of parameters. The effective number of parameters simultaneously accounts for the sample size, the cross design, the number and lengths of chromosomes, covariates, the number of QTL, the type of QTL effects, and QTL effect sizes. The DIC provides a computationally efficient way to perform sensitivity analysis and can be used to quantitatively evaluate if including environmental effects, gene-gene interactions, and/or gene-environment interactions in the prior specification is worth the extra parameterization. The DIC has been implemented in the freely available package R/qtlbim, which greatly facilitates the general usage of Bayesian methodology for genome-wide interacting QTL analysis.
complex trait; deviance; DIC; model selection and comparison; quantitative trait loci
The assumption of proportional hazards (PH) fundamental to the Cox PH model sometimes may not hold in practice. In this paper, we propose a generalization of the Cox PH model in terms of the cumulative hazard function taking a form similar to the Cox PH model, with the extension that the baseline cumulative hazard function is raised to a power function. Our model allows for interaction between covariates and the baseline hazard and it also includes, for the two sample problem, the case of two Weibull distributions and two extreme value distributions differing in both scale and shape parameters. The partial likelihood approach can not be applied here to estimate the model parameters. We use the full likelihood approach via a cubic B-spline approximation for the baseline hazard to estimate the model parameters. A semi-automatic procedure for knot selection based on Akaike’s Information Criterion is developed. We illustrate the applicability of our approach using real-life data.
censored survival data analysis; crossing hazards; Frailty model; maximum likelihood; regression; spline function; Akaike information criterion; Weibull distribution; extreme value distribution
Regression on the basis function of B-splines has been advocated as an alternative to orthogonal polynomials in random regression analyses. Basic theory of splines in mixed model analyses is reviewed, and estimates from analyses of weights of Australian Angus cattle from birth to 820 days of age are presented. Data comprised 84 533 records on 20 731 animals in 43 herds, with a high proportion of animals with 4 or more weights recorded. Changes in weights with age were modelled through B-splines of age at recording. A total of thirteen analyses, considering different combinations of linear, quadratic and cubic B-splines and up to six knots, were carried out. Results showed good agreement for all ages with many records, but fluctuated where data were sparse. On the whole, analyses using B-splines appeared more robust against "end-of-range" problems and yielded more consistent and accurate estimates of the first eigenfunctions than previous, polynomial analyses. A model fitting quadratic B-splines, with knots at 0, 200, 400, 600 and 821 days and a total of 91 covariance components, appeared to be a good compromise between detailedness of the model, number of parameters to be estimated, plausibility of results, and fit, measured as residual mean square error.
covariance function; growth; beef cattle; random regression; B-splines
Covariate-specific ROC curves are often used to evaluate the classification accuracy of a medical diagnostic test or a biomarker, when the accuracy of the test is associated with certain covariates. In many large-scale screening tests, the gold standard is subject to missingness due to high cost or harmfulness to the patient. In this paper, we propose a semiparametric estimation of the covariate-specific ROC curves with a partial missing gold standard. A location-scale model is constructed for the test result to model the covariates’ effect, but the residual distributions are left unspecified. Thus the baseline and link functions of the ROC curve both have flexible shapes. With the gold standard missing at random (MAR) assumption, we consider weighted estimating equations for the location-scale parameters, and weighted kernel estimating equations for the residual distributions. Three ROC curve estimators are proposed and compared, namely, imputation-based, inverse probability weighted and doubly robust estimators. We derive the asymptotic normality of the estimated ROC curve, as well as the analytical form the standard error estimator. The proposed method is motivated and applied to the data in an Alzheimer's disease research.
Alzheimer's disease; covariate-specific ROC curve; ignorable missingness; verification bias; weighted estimating equations
Nonparametric regression models are proposed in the framework of ecological inference for exploratory modeling of disease prevalence rates adjusted for variables, such as age, ethnicity/race, and socio-economic status. Ecological inference is needed when a response variable and covariate are not available at the subject level because only summary statistics are available for the reporting unit, for example, in the form of R × C tables. In this article, only the marginal counts are assumed available in the sample of R × C contingency tables for modeling the joint distribution of counts. A general form for the ecological regression model is proposed, whereby certain covariates are included as a varying coefficient regression model, whereas others are included as a functional linear model. The nonparametric regression curves are modeled as splines fit by penalized weighted least squares. A data-driven selection of the smoothing parameter is proposed using the pointwise maximum squared bias computed from averaging kernels (explained by O’Sullivan, 1986, Statistical Science 1, 502–517). Analytic expressions for bias and variance are provided that could be used to study the rates of convergence of the estimators. Instead, this article focuses on demonstrating the utility of the estimators in a study of disparity in health outcomes by ethnicity/race.
Ecological inference; Incomplete R × C tables; P-splines; Randomized response
Increasingly, scientific studies yield functional data, in which the ideal units of observation are curves and the observed data consist of sets of curves that are sampled on a fine grid. We present new methodology that generalizes the linear mixed model to the functional mixed model framework, with model fitting done by using a Bayesian wavelet-based approach. This method is flexible, allowing functions of arbitrary form and the full range of fixed effects structures and between-curve covariance structures that are available in the mixed model framework. It yields nonparametric estimates of the fixed and random-effects functions as well as the various between-curve and within-curve covariance matrices. The functional fixed effects are adaptively regularized as a result of the non-linear shrinkage prior that is imposed on the fixed effects’ wavelet coefficients, and the random-effect functions experience a form of adaptive regularization because of the separately estimated variance components for each wavelet coefficient. Because we have posterior samples for all model quantities, we can perform pointwise or joint Bayesian inference or prediction on the quantities of the model. The adaptiveness of the method makes it especially appropriate for modelling irregular functional data that are characterized by numerous local features like peaks.
Bayesian methods; Functional data analysis; Mixed models; Model averaging; Nonparametric regression; Proteomics; Wavelets
In clinical studies, longitudinal biomarkers are often used to monitor disease progression and failure time. Joint modeling of longitudinal and survival data has certain advantages and has emerged as an effective way to mutually enhance information. Typically, a parametric longitudinal model is assumed to facilitate the likelihood approach. However, the choice of a proper parametric model turns out to be more elusive than models for standard longitudinal studies in which no survival endpoint occurs. In this article, we propose a nonparametric multiplicative random effects model for the longitudinal process, which has many applications and leads to a flexible yet parsimonious nonparametric random effects model. A proportional hazards model is then used to link the biomarkers and event time. We use B-splines to represent the nonparametric longitudinal process, and select the number of knots and degrees based on a version of the Akaike information criterion (AIC). Unknown model parameters are estimated through maximizing the observed joint likelihood, which is iteratively maximized by the Monte Carlo Expectation Maximization (MCEM) algorithm. Due to the simplicity of the model structure, the proposed approach has good numerical stability and compares well with the competing parametric longitudinal approaches. The new approach is illustrated with primary biliary cirrhosis (PBC) data, aiming to capture nonlinear patterns of serum bilirubin time courses and their relationship with survival time of PBC patients.
B-splines; EM algorithm; Functional data analysis; Missing data; Monte Carlo integration
The multinomial probit model has emerged as a useful framework for modeling nominal categorical data, but extending such models to multivariate measures presents computational challenges. Following a Bayesian paradigm, we use a Markov chain Monte Carlo (MCMC) method to analyze multivariate nominal measures through multivariate multinomial probit models. As with a univariate version of the model, identification of model parameters requires restrictions on the covariance matrix of the latent variables that are introduced to define the probit specification. To sample the covariance matrix with restrictions within the MCMC procedure, we use a parameter-extended Metropolis-Hastings algorithm that incorporates artificial variance parameters to transform the problem into a set of simpler tasks including sampling an unrestricted covariance matrix. The parameter-extended algorithm also allows for flexible prior distributions on covariance matrices. The prior specification in the method described here generalizes earlier approaches to analyzing univariate nominal data, and the multivariate correlation structure in the method described here generalizes the autoregressive structure proposed in previous multiperiod multinomial probit models. Our methodology is illustrated through a simulated example and an application to a cancer-control study aiming to achieve early detection of breast cancer.
multinomial multiperiod probit model; MCMC; Metropolis-Hastings; covariance matrix; breast cancer
We consider a random effects quantile regression analysis of clustered data and propose a semiparametric approach using empirical likelihood. The random regression coefficients are assumed independent with a common mean, following parametrically specified distributions. The common mean corresponds to the population-average effects of explanatory variables on the conditional quantile of interest, while the random coefficients represent cluster specific deviations in the covariate effects. We formulate the estimation of the random coefficients as an estimating equations problem and use empirical likelihood to incorporate the parametric likelihood of the random coefficients. A likelihood-like statistical criterion function is yield, which we show is asymptotically concave in a neighborhood of the true parameter value and motivates its maximizer as a natural estimator. We use Markov Chain Monte Carlo (MCMC) samplers in the Bayesian framework, and propose the resulting quasi-posterior mean as an estimator. We show that the proposed estimator of the population-level parameter is asymptotically normal and the estimators of the random coefficients are shrunk toward the population-level parameter in the first order asymptotic sense. These asymptotic results do not require Gaussian random effects, and the empirical likelihood based likelihood-like criterion function is free of parameters related to the error densities. This makes the proposed approach both flexible and computationally simple. We illustrate the methodology with two real data examples.
Empirical likelihood; Markov Chain Monte Carlo; Quasi-posterior distribution
The projected normal distribution is an under-utilized model for explaining directional data. In particular, the general version provides flexibility, e.g., asymmetry and possible bimodality along with convenient regression specification. Here, we clarify the properties of this general class. We also develop fully Bayesian hierarchical models for analyzing circular data using this class. We show how they can be fit using MCMC methods with suitable latent variables. We show how posterior inference for distributional features such as the angular mean direction and concentration can be implemented as well as how prediction within the regression setting can be handled. With regard to model comparison, we argue for an out-of-sample approach using both a predictive likelihood scoring loss criterion and a cumulative rank probability score criterion.
bivariate normal distribution; circular data; concentration; latent variables; Markov chain Monte Carlo; mean direction
It was predicted recently that sufficiently complex knots on a linear wormlike chain can have a metastable size, preventing their spontaneous expansion. We tested this prediction via computer simulations for 71 and 10151 knots. We calculated the equilibrium distributions of knot size S for both knots. By using the umbrella sampling we were able to obtain the distributions over a wide range of S values. The distributions were converted into the dependencies of knot free energy on S. The obtained free energy profiles have no pronounced local minima, so there are no metastable knot sizes for these knots. We also performed Brownian dynamics simulation of 71 knot relaxation that started from a very tight knot conformation. The simulation showed that knot expansion is a fast process compared to knot displacement along the chain contour by diffusion.
knots in polymers; knot relaxation; Brownian dynamics of polymer