Gaussian latent factor models are routinely used for modeling of dependence in continuous, binary, and ordered categorical data. For unordered categorical variables, Gaussian latent factor models lead to challenging computation and complex modeling structures. As an alternative, we propose a novel class of simplex factor models. In the single-factor case, the model treats the different categorical outcomes as independent with unknown marginals. The model can characterize flexible dependence structures parsimoniously with few factors, and as factors are added, any multivariate categorical data distribution can be accurately approximated. Using a Bayesian approach for computation and inferences, a Markov chain Monte Carlo (MCMC) algorithm is proposed that scales well with increasing dimension, with the number of factors treated as unknown. We develop an efficient proposal for updating the base probability vector in hierarchical Dirichlet models. Theoretical properties are described, and we evaluate the approach through simulation examples. Applications are described for modeling dependence in nucleotide sequences and prediction from high-dimensional categorical features.
Classification; Contingency table; Factor analysis; Latent variable; Nonparametric Bayes; Nonnegative tensor factorization; Mutual information; Polytomous regression
This article expands upon recent interest in Bayesian hierarchical models in quantitative genetics by developing spatial process models for inference on additive and dominance genetic variance within the context of large spatially referenced trial datasets of multiple traits of interest. Direct application of such multivariate models to large spatial datasets is often computationally infeasible because of cubic order matrix algorithms involved in estimation. The situation is even worse in Markov chain Monte Carlo (MCMC) contexts where such computations are performed for several thousand iterations. Here, we discuss approaches that help obviate these hurdles without sacrificing the richness in modeling. For genetic effects, we demonstrate how an initial spectral decomposition of the relationship matrices negates the expensive matrix inversions required in previously proposed MCMC methods. For spatial effects we discuss a multivariate predictive process that reduces the computational burden by projecting the original process onto a subspace generated by realizations of the original process at a specified set of locations (or knots). We illustrate the proposed methods using a synthetic dataset with multivariate additive and dominant genetic effects and anisotropic spatial residuals, and a large dataset from a scots pine (Pinus sylvestris L.) progeny study conducted in northern Sweden. Our approaches enable us to provide a comprehensive analysis of this large trial which amply demonstrates that, in addition to violating basic assumptions of the linear model, ignoring spatial effects can result in downwardly biased measures of heritability.
Bayesian inference; Cross-covariance functions; Genetic trait models; Heredity; Hierarchical spatial models; Markov chain Monte Carlo; Multivariate spatial process; Spatial predictive process
This article expands upon recent interest in Bayesian hierarchical models in quantitative genetics by developing spatial process models for inference on additive and dominance genetic variance within the context of large spatially referenced trial datasets. Direct application of such models to large spatial datasets are, however, computationally infeasible because of cubic-order matrix algorithms involved in estimation. The situation is even worse in Markov chain Monte Carlo (MCMC) contexts where such computations are performed for several iterations. Here, we discuss approaches that help obviate these hurdles without sacrificing the richness in modeling. For genetic effects, we demonstrate how an initial spectral decomposition of the relationship matrices negate the expensive matrix inversions required in previously proposed MCMC methods. For spatial effects, we outline two approaches for circumventing the prohibitively expensive matrix decompositions: the first leverages analytical results from Ornstein–Uhlenbeck processes that yield computationally efficient tridiagonal structures, whereas the second derives a modified predictive process model from the original model by projecting its realizations to a lower-dimensional subspace, thereby reducing the computational burden. We illustrate the proposed methods using a synthetic dataset with additive, dominance, genetic effects and anisotropic spatial residuals, and a large dataset from a Scots pine (Pinus sylvestris L.) progeny study conducted in northern Sweden. Our approaches enable us to provide a comprehensive analysis of this large trial, which amply demonstrates that, in addition to violating basic assumptions of the linear model, ignoring spatial effects can result in downwardly biased measures of heritability.
Bayesian inference; Genetic variance; Markov chain Monte Carlo; Ornstein-Uhlenbeck process; Spatial predictive process; Spatial process
When model parameters in systems biology are not available from experiments, they need to be inferred so that the resulting simulation reproduces the experimentally known phenomena. For the purpose, Bayesian statistics with Markov chain Monte Carlo (MCMC) is a useful method. Conventional MCMC needs likelihood to evaluate a posterior distribution of acceptable parameters, while the approximate Bayesian computation (ABC) MCMC evaluates posterior distribution with use of qualitative fitness measure. However, none of these algorithms can deal with mixture of quantitative, i.e., likelihood, and qualitative fitness measures simultaneously. Here, to deal with this mixture, we formulated Bayesian formula for hybrid fitness measures (HFM). Then we implemented it to MCMC (MCMC-HFM). We tested MCMC-HFM first for a kinetic toy model with a positive feedback. Inferring kinetic parameters mainly related to the positive feedback, we found that MCMC-HFM reliably infer them using both qualitative and quantitative fitness measures. Then, we applied the MCMC-HFM to an apoptosis signal transduction network previously proposed. For kinetic parameters related to implicit positive feedbacks, which are important for bistability and irreversibility of the output, the MCMC-HFM reliably inferred these kinetic parameters. In particular, some kinetic parameters that have experimental estimates were inferred without using these data and the results were consistent with experiments. Moreover, for some parameters, the mixed use of quantitative and qualitative fitness measures narrowed down the acceptable range of parameters.
The MCMC procedure in SAS (called PROC MCMC) is particularly designed for Bayesian analysis using the Markov chain Monte Carlo (MCMC) algorithm. The program is sufficiently general to handle very complicated statistical models and arbitrary prior distributions. This study introduces the SAS/MCMC procedure and demonstrates the application of the program to quantitative trait locus (QTL) mapping. A real life QTL mapping experiment in wheat female fertility trait was used as an example for the demonstration. The fertility trait phenotypes were described under three different models: (1) the Poisson model, (2) the Bernoulli model and (3) the zero-truncated Poisson model. One QTL was identified on the second chromosome. This QTL appears to control the switch of seed-producing ability of female plants but does not affect the number of seeds produced once the switch is turned on.
Bayes; Markov chain Monte Carlo; quantitative trait locus; SAS
The ideal observer (IO) employs complete knowledge of the available data statistics and sets an upper limit on observer performance on a binary classification task. However, the IO test statistic cannot be calculated analytically, except for cases where object statistics are extremely simple. Kupinski et al. have developed a Markov chain Monte Carlo (MCMC) based technique to compute the IO test statistic for, in principle, arbitrarily complex objects and imaging systems. In this work, we applied MCMC to estimate the IO test statistic in the context of myocardial perfusion SPECT (MPS). We modeled the imaging system using an analytic SPECT projector with attenuation, distant-dependent detector-response modeling and Poisson noise statistics. The object is a family of parameterized torso phantoms with variable geometric and organ uptake parameters. To accelerate the imaging simulation process and thus enable the MCMC IO estimation, we used discretized anatomic parameters and continuous uptake parameters in defining the objects. The imaging process simulation was modeled by precomputing projections for each organ for a finite number of discretely-parameterized anatomic parameters and taking linear combinations of the organ projections based on continuous sampling of the organ uptake parameters. The proposed method greatly reduces the computational burden and allows MCMC IO estimation for a realistic MPS imaging simulation. We validated the proposed IO estimation technique by estimating IO test statistics for a large number of input objects. The properties of the first- and second-order statistics of the IO test statistics estimated using the MCMC IO estimation technique agreed well with theoretical predictions. Further, as expected, the IO had better performance, as measured by the receiver operating characteristic (ROC) curve, than the Hotelling observer. This method is developed for SPECT imaging. However, it can be adapted to any linear imaging system.
Ideal observer; Markov chain Monte Carlo (MCMC)
GEE and mixed models are powerful tools to compare treatment effects in longitudinal smoking cessation trials. However, they are not capable of assessing the relapse (from abstinent back to smoking) simultaneously with cessation, which can be studied by transition models.
We apply a first-order Markov chain model to analyze the transition of smoking status measured every 6 months in a 2-year randomized smoking cessation trial, and to identify what factors are associated with the transition from smoking to abstinent and from abstinent to smoking. Missing values due to non-response are assumed non-ignorable and handled by the selection modeling approach.
Smokers receiving high-intensity disease management (HDM), of male gender, lower daily cigarette consumption, higher motivation and confidence to quit, and having serious attempts to quit were more likely to become abstinent (OR = 1.48, 1.66, 1.03, 1.15, 1.09 and 1.34, respectively) in the next 6 months. Among those who were abstinent, lower income and stronger nicotine dependence (OR = 1.72 for ≤ vs. > 40 K and OR = 1.75 for first cigarette ≤ vs. > 5 min) were more likely to have relapse in the next 6 months.
Markov chain models allow investigation of dynamic smoking-abstinence behavior and suggest that relapse is influenced by different factors than cessation. The knowledge of treatments and covariates in transitions in both directions may provide guidance for designing more effective interventions on smoking cessation and relapse prevention.
clinicaltrials.gov identifier: NCT00440115
Sewall Wright's threshold model has been used in modelling discrete traits that may have a continuous trait underlying them, but it has proven difficult to make efficient statistical inferences with it. The availability of Markov chain Monte Carlo (MCMC) methods makes possible likelihood and Bayesian inference using this model. This paper discusses prospects for the use of the threshold model in morphological systematics to model the evolution of discrete all-or-none traits. There the threshold model has the advantage over 0/1 Markov process models in that it not only accommodates polymorphism within species, but can also allow for correlated evolution of traits with far fewer parameters that need to be inferred. The MCMC importance sampling methods needed to evaluate likelihood ratios for the threshold model are introduced and described in some detail.
threshold model; quantitative genetics; morphology; systematics
Model-based analysis of data from quantitative reverse-transcription PCR (qRT-PCR) is potentially more powerful and versatile than traditional methods. Yet existing model-based approaches cannot properly deal with the higher sampling variances associated with low-abundant targets, nor do they provide a natural way to incorporate assumptions about the stability of control genes directly into the model-fitting process.
In our method, raw qPCR data are represented as molecule counts, and described using generalized linear mixed models under Poisson-lognormal error. A Markov Chain Monte Carlo (MCMC) algorithm is used to sample from the joint posterior distribution over all model parameters, thereby estimating the effects of all experimental factors on the expression of every gene. The Poisson-based model allows for the correct specification of the mean-variance relationship of the PCR amplification process, and can also glean information from instances of no amplification (zero counts). Our method is very flexible with respect to control genes: any prior knowledge about the expected degree of their stability can be directly incorporated into the model. Yet the method provides sensible answers without such assumptions, or even in the complete absence of control genes. We also present a natural Bayesian analogue of the “classic” analysis, which uses standard data pre-processing steps (logarithmic transformation and multi-gene normalization) but estimates all gene expression changes jointly within a single model. The new methods are considerably more flexible and powerful than the standard delta-delta Ct analysis based on pairwise t-tests.
Our methodology expands the applicability of the relative-quantification analysis protocol all the way to the lowest-abundance targets, and provides a novel opportunity to analyze qRT-PCR data without making any assumptions concerning target stability. These procedures have been implemented as the MCMC.qpcr package in R.
The analysis of point-level (geostatistical) data has historically been plagued by computational difficulties, owing to the high dimension of the nondiagonal spatial covariance matrices that need to be inverted. This problem is greatly compounded in hierarchical Bayesian settings, since these inversions need to take place at every iteration of the associated Markov chain Monte Carlo (MCMC) algorithm. This paper offers an approach for modeling the spatial correlation at two separate scales. This reduces the computational problem to a collection of lower-dimensional inversions that remain feasible within the MCMC framework. The approach yields full posterior inference for the model parameters of interest, as well as the fitted spatial response surface itself. We illustrate the importance and applicability of our methods using a collection of dense point-referenced breast cancer data collected over the mostly rural northern part of the state of Minnesota. Substantively, we wish to discover whether women who live more than a 60-mile drive from the nearest radiation treatment facility tend to opt for mastectomy over breast conserving surgery (BCS, or “lumpectomy”), which is less disfiguring but requires 6 weeks of follow-up radiation therapy. Our hierarchical multiresolution approach resolves this question while still properly accounting for all sources of spatial association in the data.
Aggregated geographic data; Big N problem; Breast cancer; Conditionally autoregressive (CAR) model; Hierarchical modeling; Kriging
Patients who were previously treated for prostate cancer with radiation therapy are monitored at regular intervals using a laboratory test called Prostate Specific Antigen (PSA). If the value of the PSA test starts to rise, this is an indication that the prostate cancer is more likely to recur, and the patient may wish to initiate new treatments. Such patients could be helped in making medical decisions by an accurate estimate of the probability of recurrence of the cancer in the next few years. In this paper, we describe the methodology for giving the probability of recurrence for a new patient, as implemented on a web-based calculator. The methods use a joint longitudinal survival model. The model is developed on a training dataset of 2,386 patients and tested on a dataset of 846 patients. Bayesian estimation methods are used with one Markov chain Monte Carlo (MCMC) algorithm developed for estimation of the parameters from the training dataset and a second quick MCMC developed for prediction of the risk of recurrence that uses the longitudinal PSA measures from a new patient.
Joint longitudinal-survival model; Online calculator; Predicted probability; Prostate cancer; PSA
Response times on test items are easily collected in modern computerized testing. When collecting both (binary) responses and (continuous) response times on test items, it is possible to measure the accuracy and speed of test takers. To study the relationships between these two constructs, the model is extended with a multivariate multilevel regression structure which allows the incorporation of covariates to explain the variance in speed and accuracy between individuals and groups of test takers. A Bayesian approach with Markov chain Monte Carlo (MCMC) computation enables straightforward estimation of all model parameters. Model-specific implementations of a Bayes factor (BF) and deviance information criterium (DIC) for model selection are proposed which are easily calculated as byproducts of the MCMC computation. Both results from simulation studies and real-data examples are given to illustrate several novel analyses possible with this modeling framework.
speed; accuracy; IRT; response times
Understanding human sexual behaviors is essential for the effective prevention of sexually transmitted infections. Analysis of longitudinally measured sexual behavioral data, however, is often complicated by zero-inflation of event counts, nonlinear time trend, time-varying covariates, and informative dropouts. Ignoring these complicating factors could undermine the validity of the study findings. In this paper, we put forth a unified joint modeling structure that accommodates these features of the data. Specifically, we propose a pair of simultaneous models for the zero-inflated event counts: Each of these models contains an auto-regressive structure for the accommodation of the effect of recent event history, and a nonparametric component for the modeling of nonlinear time effect. Informative dropout and time varying covariates are modeled explicitly in the process. Model fitting and parameter estimation are carried out in a Bayesian paradigm by the use of a Markov Chain Monte Carlo (MCMC) method. Analytical results showed that adolescent sexual behaviors tended to evolve nonlinearly over time and they were strongly influenced by the day-to-day variations in mood and sexual interests. These findings suggest that adolescent sex is to a large extent driven by intrinsic factors rather than being compelled by circumstances, thus highlighting the need of education on self protective measures against infection risks.
Joint modeling; Markov Chain Monte Carlo; Mood; Sexually transmitted infections; Zero-inflated Poisson
Joint modeling of longitudinal and survival data has been increasingly considered in clinical trials, notably in cancer and AIDS. In critically ill patients admitted to an intensive care unit (ICU), such models also appear to be of interest in the investigation of the effect of treatment on severity scores due to the likely association between the longitudinal score and the dropout process, either caused by deaths or live discharges from the ICU. However, in this competing risk setting, only cause-specific hazard sub-models for the multiple failure types data have been used.
We propose a joint model that consists of a linear mixed effects submodel for the longitudinal outcome, and a proportional subdistribution hazards submodel for the competing risks survival data, linked together by latent random effects. We use Markov chain Monte Carlo technique of Gibbs sampling to estimate the joint posterior distribution of the unknown parameters of the model. The proposed method is studied and compared to joint model with cause-specific hazards submodel in simulations and applied to a data set that consisted of repeated measurements of severity score and time of discharge and death for 1,401 ICU patients.
Time by treatment interaction was observed on the evolution of the mean SOFA score when ignoring potentially informative dropouts due to ICU deaths and live discharges from the ICU. In contrast, this was no longer significant when modeling the cause-specific hazards of informative dropouts. Such a time by treatment interaction persisted together with an evidence of treatment effect on the hazard of death when modeling dropout processes through the use of the Fine and Gray model for sub-distribution hazards.
In the joint modeling of competing risks with longitudinal response, differences in the handling of competing risk outcomes appear to translate into the estimated difference in treatment effect on the longitudinal outcome. Such a modeling strategy should be carefully defined prior to analysis.
Coalescent-based Bayesian Markov chain Monte Carlo (MCMC) inference generates estimates of evolutionary parameters and their posterior probability distributions. As the number of sequences increases, the length of time taken to complete an MCMC analysis increases as well. Here, we investigate an approach to distribute the MCMC analysis across a cluster of computers. To do this, we use bootstrapped topologies as fixed genealogies, perform a single MCMC analysis on each genealogy without topological rearrangements, and pool the results across all MCMC analyses. We show, through simulations, that although the standard MCMC performs better than the bootstrap-MCMC at estimating the effective population size (scaled by mutation rate), the bootstrap-MCMC returns better estimates of growth rates. Additionally, we find that our bootstrap-MCMC analyses are, on average, 37 times faster for equivalent effective sample sizes.
coalescent; Bayesian inference; MCMC; bootstrap; effective population size
In this paper, various Bayesian Monte Carlo Markov Chain (MCMC) methods and the proposed algorithm, Gibbs maximum a posteriori (GMAP) algorithm, are compared for implementing the nonlinear mixed-effects model in pharmacokinetics (PK) studies. An intravenous two-compartmental PK model is adopted to fit the PK data from the midazolam (MDZ) studies, which recruited 24 individuals with 9 different time points per subject. The three-stage hierarchical nonlinear mixed model is constructed. Data analysis and model performance comparisons show that GMAP converges the fastest, and provides reliable results. At the mean time, data augmentation (DA) methods are used for the Random-walk Metropolis method. Data analysis shows that the speed of the convergence of Random-walk Metropolis can be improved by DA, but all of them are not as fast as GMAP. The performance of GMAP and various MCMC algorithms are compared through Midazolam data analysis and simulation.
Bayesian model; Monte Carlo Markov Chain (MCMC); Pharmacokinetics (PK); Gibbs Maximum a Posteriori (GMAP)
Markov Chain Monte Carlo (MCMC) methods are often used to sample from intractable target distributions. Some MCMC variants aim to improve the performance by running a population of MCMC chains. In this paper, we investigate the use of techniques from Evolutionary Computation (EC) to design population-based MCMC algorithms that exchange useful information between the individual chains. We investigate how one can ensure that the resulting class of algorithms, called Evolutionary MCMC (EMCMC), samples from the target distribution as expected from any MCMC algorithm. We analytically and experimentally show—using examples from discrete search spaces—that the proposed EMCMCs can outperform standard MCMCs by exploiting common partial structures between the more likely individual states. The MCMC chains in the population interact through recombination and selection. We analyze the required properties of recombination operators and acceptance (or selection) rules in EMCMCs. An important issue is how to preserve the detailed balance property which is a sufficient condition for an irreducible and aperiodic EMCMC to converge to a given target distribution. Transferring EC techniques to population-based MCMCs should be done with care. For instance, we prove that EMCMC algorithms with an elitist acceptance rule do not sample the target distribution correctly.
Evolutionary Markov chain Monte Carlo; Detailed balance; Recombination; Acceptance rules
A Markov chain Monte Carlo (MCMC) algorithm to sample an exchangeable covariance matrix, such as the one of the error terms (R0) in a multiple trait animal model with missing records under normal-inverted Wishart priors is presented. The algorithm (FCG) is based on a conjugate form of the inverted Wishart density that avoids sampling the missing error terms. Normal prior densities are assumed for the 'fixed' effects and breeding values, whereas the covariance matrices are assumed to follow inverted Wishart distributions. The inverted Wishart prior for the environmental covariance matrix is a product density of all patterns of missing data. The resulting MCMC scheme eliminates the correlation between the sampled missing residuals and the sampled R0, which in turn has the effect of decreasing the total amount of samples needed to reach convergence. The use of the FCG algorithm in a multiple trait data set with an extreme pattern of missing records produced a dramatic reduction in the size of the autocorrelations among samples for all lags from 1 to 50, and this increased the effective sample size from 2.5 to 7 times and reduced the number of samples needed to attain convergence, when compared with the 'data augmentation' algorithm.
FCG algorithm; multiple traits; missing data; conjugate priors; normal-inverted Wishart
The multinomial probit model has emerged as a useful framework for modeling nominal categorical data, but extending such models to multivariate measures presents computational challenges. Following a Bayesian paradigm, we use a Markov chain Monte Carlo (MCMC) method to analyze multivariate nominal measures through multivariate multinomial probit models. As with a univariate version of the model, identification of model parameters requires restrictions on the covariance matrix of the latent variables that are introduced to define the probit specification. To sample the covariance matrix with restrictions within the MCMC procedure, we use a parameter-extended Metropolis-Hastings algorithm that incorporates artificial variance parameters to transform the problem into a set of simpler tasks including sampling an unrestricted covariance matrix. The parameter-extended algorithm also allows for flexible prior distributions on covariance matrices. The prior specification in the method described here generalizes earlier approaches to analyzing univariate nominal data, and the multivariate correlation structure in the method described here generalizes the autoregressive structure proposed in previous multiperiod multinomial probit models. Our methodology is illustrated through a simulated example and an application to a cancer-control study aiming to achieve early detection of breast cancer.
multinomial multiperiod probit model; MCMC; Metropolis-Hastings; covariance matrix; breast cancer
RNA exosomes are large multi-subunit protein complexes involved in controlled and processive 3′ to 5′ RNA degradation. Exosomes form large molecular chambers and harbor multiple nuclease sites as well as RNA binding regions. This makes a quantitative kinetic analysis of RNA degradation with reliable parameter and error estimates challenging. For instance, recent quantitative biochemical assays revealed that degradation speed and efficiency depend on various factors, such as the type of RNA binding caps and the RNA length. We propose the combination of a differential equation model with Bayesian Markov Chain Monte Carlo (MCMC) sampling for a more robust and reliable analysis of such complex kinetic systems. Using the exosome as a paradigm, it is shown that conventional “best fit” approaches to parameter estimation are outperformed by the MCMC method. The parameter distribution returned by MCMC sampling allows for a reliable and meaningful comparison of the data from different time series. In the case of the exosome, we find that the cap structures of the exosome have a direct effect on the recruitment and degradation of RNA, and that these effects are RNA length-dependent. The described approach can be widely applied to any processive reaction with a similar kinetics like the XRN1-dependent RNA degradation, RNA/DNA synthesis by polymerases and protein synthesis by the ribosome.
RNA degradation; exosome structure; processive enzymes; enzyme kinetics; bayesian inference; parameter estimation; Markov Chain Monte Carlo sampling
The analysis of genetic and environmental contributions to preterm birth is not straightforward in family studies, as etiology could involve both maternal and fetal genes. Markov Chain Monte Carlo (MCMC) methods are presented as a flexible approach for defining user-specified covariance structures to handle multiple random effects and hierarchical dependencies inherent in children of twin (COT) studies of pregnancy outcomes. The proposed method is easily modified to allow for the study of gestational age as a continuous trait and as a binary outcome reflecting the presence or absence of preterm birth. Estimation of fetal and maternal genetic factors and the effect of the environment are demonstrated using MCMC methods implemented in WinBUGS and maximum likelihood methods in a Virginia COT sample comprising 7,061 births. In summary, although the contribution of maternal and fetal genetic factors was supported using both outcomes, additional births and/or extended relationships are required to precisely estimate both genetic effects simultaneously. We anticipate the flexibility of MCMC methods to handle increasingly complex models to be of particular relevance for the study of birth outcomes.
preterm birth; fetal; maternal; genetic; environment; MCMC; ML
In this paper, we study Bayesian analysis of nonlinear hierarchical mixture models with a finite but unknown number of components. Our approach is based on Markov chain Monte Carlo (MCMC) methods. One of the applications of our method is directed to the clustering problem in gene expression analysis. From a mathematical and statistical point of view, we discuss the following topics: theoretical and practical convergence problems of the MCMC method; determination of the number of components in the mixture; and computational problems associated with likelihood calculations. In the existing literature, these problems have mainly been addressed in the linear case. One of the main contributions of this paper is developing a method for the nonlinear case. Our approach is based on a combination of methods including Gibbs sampling, random permutation sampling, birth-death MCMC, and Kullback-Leibler distance.
Gibbs sampling; permutation sampling; birth-death MCMC; Kullback-Leibler distance; gene expression; clustering; nonlinear mixture models
A significant interest in spatial epidemiology lies in identifying associated risk factors which enhances the risk of infection. Most studies, however, make no, or limited use of the spatial structure of the data, as well as possible nonlinear effects of the risk factors.
We develop a Bayesian Structured Additive Regression model for cholera epidemic data. Model estimation and inference is based on fully Bayesian approach via Markov Chain Monte Carlo (MCMC) simulations. The model is applied to cholera epidemic data in the Kumasi Metropolis, Ghana. Proximity to refuse dumps, density of refuse dumps, and proximity to potential cholera reservoirs were modeled as continuous functions; presence of slum settlers and population density were modeled as fixed effects, whereas spatial references to the communities were modeled as structured and unstructured spatial effects.
We observe that the risk of cholera is associated with slum settlements and high population density. The risk of cholera is equal and lower for communities with fewer refuse dumps, but variable and higher for communities with more refuse dumps. The risk is also lower for communities distant from refuse dumps and potential cholera reservoirs. The results also indicate distinct spatial variation in the risk of cholera infection.
The study highlights the usefulness of Bayesian semi-parametric regression model analyzing public health data. These findings could serve as novel information to help health planners and policy makers in making effective decisions to control or prevent cholera epidemics.
Bayesian; Cholera; Cholera reservoir; Refuse dumps; Slums
There has been great interest in developing nonlinear structural equation models and associated statistical inference procedures, including estimation and model selection methods. In this paper a general semiparametric structural equation model (SSEM) is developed in which the structural equation is composed of nonparametric functions of exogenous latent variables and fixed covariates on a set of latent endogenous variables. A basis representation is used to approximate these nonparametric functions in the structural equation and the Bayesian Lasso method coupled with a Markov Chain Monte Carlo (MCMC) algorithm is used for simultaneous estimation and model selection. The proposed method is illustrated using a simulation study and data from the Affective Dynamics and Individual Differences (ADID) study. Results demonstrate that our method can accurately estimate the unknown parameters and correctly identify the true underlying model.
Bayesian Lasso; Latent variable; Spline; Structural equation model
Prediction accuracies of estimated breeding values for economically important traits are expected to benefit from genomic information. Single nucleotide polymorphism (SNP) panels used in genomic prediction are increasing in density, but the Markov Chain Monte Carlo (MCMC) estimation of SNP effects can be quite time consuming or slow to converge when a large number of SNPs are fitted simultaneously in a linear mixed model. Here we present an EM algorithm (termed “fastBayesA”) without MCMC. This fastBayesA approach treats the variances of SNP effects as missing data and uses a joint posterior mode of effects compared to the commonly used BayesA which bases predictions on posterior means of effects. In each EM iteration, SNP effects are predicted as a linear combination of best linear unbiased predictions of breeding values from a mixed linear animal model that incorporates a weighted marker-based realized relationship matrix. Method fastBayesA converges after a few iterations to a joint posterior mode of SNP effects under the BayesA model. When applied to simulated quantitative traits with a range of genetic architectures, fastBayesA is shown to predict GEBV as accurately as BayesA but with less computing effort per SNP than BayesA. Method fastBayesA can be used as a computationally efficient substitute for BayesA, especially when an increasing number of markers bring unreasonable computational burden or slow convergence to MCMC approaches.