Search tips
Search criteria

Results 1-25 (1313460)

Clipboard (0)

Related Articles

1.  A joint-modeling approach to assess the impact of biomarker variability on the risk of developing clinical outcome 
In some clinical trials and epidemiologic studies, investigators are interested in knowing whether the variability of a biomarker is independently predictive of clinical outcomes. This question is often addressed via a naïve approach where a sample-based estimate (e.g., standard deviation) is calculated as a surrogate for the “true” variability and then used in regression models as a covariate assumed to be free of measurement error. However, it is well known that the measurement error in covariates causes underestimation of the true association. The issue of underestimation can be substantial when the precision is low because of limited number of measures per subject. The joint analysis of survival data and longitudinal data enables one to account for the measurement error in longitudinal data and has received substantial attention in recent years. In this paper we propose a joint model to assess the predictive effect of biomarker variability. The joint model consists of two linked sub-models, a linear mixed model with patient-specific variance for longitudinal data and a full parametric Weibull distribution for survival data, and the association between two models is induced by a latent Gaussian process. Parameters in the joint model are estimated under Bayesian framework and implemented using Markov chain Monte Carlo (MCMC) methods with WinBUGS software. The method is illustrated in the Ocular Hypertension Treatment Study to assess whether the variability of intraocular pressure is an independent risk of primary open-angle glaucoma. The performance of the method is also assessed by simulation studies.
PMCID: PMC3039885  PMID: 21339862
Patient-specific variance; Survival data; Longitudinal data; Joint model; Markov chain Monte Carlo (MCMC); WinBUGS
2.  Toward Realistic and Practical Ideal Observer (IO) Estimation for the Optimization of Medical Imaging Systems 
Ieee Transactions on Medical Imaging  2008;27(10):1535-1543.
The ideal observer (IO) employs complete knowledge of the available data statistics and sets an upper limit on observer performance on a binary classification task. However, the IO test statistic cannot be calculated analytically, except for cases where object statistics are extremely simple. Kupinski et al. have developed a Markov chain Monte Carlo (MCMC) based technique to compute the IO test statistic for, in principle, arbitrarily complex objects and imaging systems. In this work, we applied MCMC to estimate the IO test statistic in the context of myocardial perfusion SPECT (MPS). We modeled the imaging system using an analytic SPECT projector with attenuation, distant-dependent detector-response modeling and Poisson noise statistics. The object is a family of parameterized torso phantoms with variable geometric and organ uptake parameters. To accelerate the imaging simulation process and thus enable the MCMC IO estimation, we used discretized anatomic parameters and continuous uptake parameters in defining the objects. The imaging process simulation was modeled by precomputing projections for each organ for a finite number of discretely-parameterized anatomic parameters and taking linear combinations of the organ projections based on continuous sampling of the organ uptake parameters. The proposed method greatly reduces the computational burden and allows MCMC IO estimation for a realistic MPS imaging simulation. We validated the proposed IO estimation technique by estimating IO test statistics for a large number of input objects. The properties of the first- and second-order statistics of the IO test statistics estimated using the MCMC IO estimation technique agreed well with theoretical predictions. Further, as expected, the IO had better performance, as measured by the receiver operating characteristic (ROC) curve, than the Hotelling observer. This method is developed for SPECT imaging. However, it can be adapted to any linear imaging system.
PMCID: PMC2739397  PMID: 18815105
Ideal observer; Markov chain Monte Carlo (MCMC)
3.  Prediction of transplant-free survival in idiopathic pulmonary fibrosis patients using joint models for event times and mixed multivariate longitudinal data 
Journal of applied statistics  2014;41(10):2192-2205.
We implement a joint model for mixed multivariate longitudinal measurements, applied to the prediction of time until lung transplant or death in idiopathic pulmonary fibrosis. Specifically, we formulate a unified Bayesian joint model for the mixed longitudinal responses and time-to-event outcomes. For the longitudinal model of continuous and binary responses, we investigate multivariate generalized linear mixed models using shared random effects. Longitudinal and time-to-event data are assumed to be independent conditional on available covariates and shared parameters. A Markov chain Monte Carlo (MCMC) algorithm, implemented in OpenBUGS, is used for parameter estimation. To illustrate practical considerations in choosing a final model, we fit 37 different candidate models using all possible combinations of random effects and employ a Deviance Information Criterion (DIC) to select a best fitting model. We demonstrate the prediction of future event probabilities within a fixed time interval for patients utilizing baseline data, post-baseline longitudinal responses, and the time-to-event outcome. The performance of our joint model is also evaluated in simulation studies.
PMCID: PMC4157686  PMID: 25214700
Idiopathic Pulmonary Fibrosis; Joint model; Mixed continuous and binary data; Multivariate longitudinal data; Prediction model; Shared parameter model; Survival analysis
4.  Imputation strategies for missing binary outcomes in cluster randomized trials 
Attrition, which leads to missing data, is a common problem in cluster randomized trials (CRTs), where groups of patients rather than individuals are randomized. Standard multiple imputation (MI) strategies may not be appropriate to impute missing data from CRTs since they assume independent data. In this paper, under the assumption of missing completely at random and covariate dependent missing, we compared six MI strategies which account for the intra-cluster correlation for missing binary outcomes in CRTs with the standard imputation strategies and complete case analysis approach using a simulation study.
We considered three within-cluster and three across-cluster MI strategies for missing binary outcomes in CRTs. The three within-cluster MI strategies are logistic regression method, propensity score method, and Markov chain Monte Carlo (MCMC) method, which apply standard MI strategies within each cluster. The three across-cluster MI strategies are propensity score method, random-effects (RE) logistic regression approach, and logistic regression with cluster as a fixed effect. Based on the community hypertension assessment trial (CHAT) which has complete data, we designed a simulation study to investigate the performance of above MI strategies.
The estimated treatment effect and its 95% confidence interval (CI) from generalized estimating equations (GEE) model based on the CHAT complete dataset are 1.14 (0.76 1.70). When 30% of binary outcome are missing completely at random, a simulation study shows that the estimated treatment effects and the corresponding 95% CIs from GEE model are 1.15 (0.76 1.75) if complete case analysis is used, 1.12 (0.72 1.73) if within-cluster MCMC method is used, 1.21 (0.80 1.81) if across-cluster RE logistic regression is used, and 1.16 (0.82 1.64) if standard logistic regression which does not account for clustering is used.
When the percentage of missing data is low or intra-cluster correlation coefficient is small, different approaches for handling missing binary outcome data generate quite similar results. When the percentage of missing data is large, standard MI strategies, which do not take into account the intra-cluster correlation, underestimate the variance of the treatment effect. Within-cluster and across-cluster MI strategies (except for random-effects logistic regression MI strategy), which take the intra-cluster correlation into account, seem to be more appropriate to handle the missing outcome from CRTs. Under the same imputation strategy and percentage of missingness, the estimates of the treatment effect from GEE and RE logistic regression models are similar.
PMCID: PMC3055218  PMID: 21324148
5.  Collaborative Automation Reliably Remediating Erroneous Conclusion Threats (CARRECT) 
The objective of the CARRECT software is to make cutting edge statistical methods for reducing bias in epidemiological studies easy to use and useful for both novice and expert users.
Analyses produced by epidemiologists and public health practitioners are susceptible to bias from a number of sources including missing data, confounding variables, and statistical model selection. It often requires a great deal of expertise to understand and apply the multitude of tests, corrections, and selection rules, and these tasks can be time-consuming and burdensome. To address this challenge, Aptima began development of CARRECT, the Collaborative Automation Reliably Remediating Erroneous Conclusion Threats system. When complete, CARRECT will provide an expert system that can be embedded in an analyst’s workflow. CARRECT will support statistical bias reduction and improved analyses and decision making by engaging the user in a collaborative process in which the technology is transparent to the analyst.
Older approaches to imputing missing data, including mean imputation and single imputation regression methods, have steadily given way to a class of methods known as “multiple imputation” (hereafter “MI”; Rubin 1987). Rather than making the restrictive assumption that the data are missing completely at random (MCAR), MI typically assumes the data are missing at random (MAR).
There are two key innovations behind MI. First, the observed values can be useful in predicting the missing cells, and thus specifying a joint distribution of the data is the first step in implementing the models. Second, single imputation methods will likely fail not only because of the inherent uncertainty in the missing values but also because of the estimation uncertainty associated with generating the parameters in the imputation procedure itself. By contrast, drawing the missing values multiple times, thereby generating m complete datasets along with the estimated parameters of the model properly accounts for both types of uncertainty (Rubin 1987; King et al. 2001). As a result, MI will lead to valid standard errors and confidence intervals along with unbiased point estimates.
In order to compute the joint distribution, CARRECT uses a bootstrapping-based algorithm that gives essentially the same answers as the standard Bayesian Markov Chain Monte Carlo (MCMC) or Expectation Maximization (EM) approaches, is usually considerably faster than existing approaches and can handle many more variables.
Tests were conducted on one of the proposed methods with an epidemiological dataset from the Integrated Health Interview Series (IHIS) producing verifiably unbiased results despite high missingness rates. In addition, mockups (Figure 1) were created of an intuitive data wizard that guides the user through the analysis processes by analyzing key features of a given dataset. The mockups also show prompts for the user to provide additional substantive knowledge to improve the handling of imperfect datasets, as well as the selection of the most appropriate algorithms and models.
Our approach and program were designed to make bias mitigation much more accessible to much more than only the statistical elite. We hope that it will have a wide impact on reducing bias in epidemiological studies and provide more accurate information to policymakers.
PMCID: PMC3692841
Bias reduction; Missing data; Statistical model selection
6.  A scalable, knowledge-based analysis framework for genetic association studies 
BMC Bioinformatics  2013;14:312.
Testing for marginal associations between numerous genetic variants and disease may miss complex relationships among variables (e.g., gene-gene interactions). Bayesian approaches can model multiple variables together and offer advantages over conventional model building strategies, including using existing biological evidence as modeling priors and acknowledging that many models may fit the data well. With many candidate variables, Bayesian approaches to variable selection rely on algorithms to approximate the posterior distribution of models, such as Markov-Chain Monte Carlo (MCMC). Unfortunately, MCMC is difficult to parallelize and requires many iterations to adequately sample the posterior. We introduce a scalable algorithm called PEAK that improves the efficiency of MCMC by dividing a large set of variables into related groups using a rooted graph that resembles a mountain peak. Our algorithm takes advantage of parallel computing and existing biological databases when available.
By using graphs to manage a model space with more than 500,000 candidate variables, we were able to improve MCMC efficiency and uncover the true simulated causal variables, including a gene-gene interaction. We applied PEAK to a case-control study of childhood asthma with 2,521 genetic variants. We used an informative graph for oxidative stress derived from Gene Ontology and identified several variants in ERBB4, OXR1, and BCL2 with strong evidence for associations with childhood asthma.
We introduced an extremely flexible analysis framework capable of efficiently performing Bayesian variable selection on many candidate variables. The PEAK algorithm can be provided with an informative graph, which can be advantageous when considering gene-gene interactions, or a symmetric graph, which simply divides the model space into manageable regions. The PEAK framework is compatible with various model forms, allowing for the algorithm to be configured for different study designs and applications, such as pathway or rare-variant analyses, by simple modifications to the model likelihood and proposal functions.
PMCID: PMC4015032  PMID: 24152222
7.  Gradient-based MCMC samplers for dynamic causal modelling 
Neuroimage  2016;125:1107-1118.
In this technical note, we derive two MCMC (Markov chain Monte Carlo) samplers for dynamic causal models (DCMs). Specifically, we use (a) Hamiltonian MCMC (HMC-E) where sampling is simulated using Hamilton’s equation of motion and (b) Langevin Monte Carlo algorithm (LMC-R and LMC-E) that simulates the Langevin diffusion of samples using gradients either on a Euclidean (E) or on a Riemannian (R) manifold. While LMC-R requires minimal tuning, the implementation of HMC-E is heavily dependent on its tuning parameters. These parameters are therefore optimised by learning a Gaussian process model of the time-normalised sample correlation matrix. This allows one to formulate an objective function that balances tuning parameter exploration and exploitation, furnishing an intervention-free inference scheme. Using neural mass models (NMMs)—a class of biophysically motivated DCMs—we find that HMC-E is statistically more efficient than LMC-R (with a Riemannian metric); yet both gradient-based samplers are far superior to the random walk Metropolis algorithm, which proves inadequate to steer away from dynamical instability.
•We compare 2 gradient-based MCMC methods for parameter inference in neural mass models.•These methods are Hamiltonian and Langevin Monte Carlo (HMC-E and LMC-R).•LMC-R is computationally more efficient while HMC-E proves to be statistically most efficient.•We propose combining gradient-based and gradient-free samplers for efficient Bayesian inference.
PMCID: PMC4692453  PMID: 26213349
8.  A Spatio-Temporal Nonparametric Bayesian Variable Selection Model of fMRI Data for Clustering Correlated Time Courses 
NeuroImage  2014;95:162-175.
In this paper we present a novel wavelet-based Bayesian nonparametric regression model for the analysis of functional magnetic resonance imaging (fMRI) data. Our goal is to provide a joint analytical framework that allows to detect regions of the brain which exhibit neuronal activity in response to a stimulus and, simultaneously, infer the association, or clustering, of spatially remote voxels that exhibit fMRI time series with similar characteristics. We start by modeling the data with an hemodynamic response function (HRF) with a voxel-dependent shape parameter. We detect regions of the brain activated in response to a given stimulus by using mixture priors with a spike at zero on the coefficients of the regression model. We account for the complex spatial correlation structure of the brain by using a Markov Random Field (MRF) prior on the parameters guiding the selection of the activated voxels, therefore capturing correlation among nearby voxels. In order to infer association of the voxel time courses, we assume correlated errors, in particular long memory, and exploit the whitening properties of discrete wavelet transforms. Furthermore, we achieve clustering of the voxels by imposing a Dirichlet Process (DP) prior on the parameters of the long memory process. For inference, we use Markov Chain Monte Carlo (MCMC) sampling techniques that combine Metropolis- Hastings schemes employed in Bayesian variable selection with sampling algorithms for nonparametric DP models. We explore the performance of the proposed model on simulated data, with both block- and event-related design, and on real fMRI data.
PMCID: PMC4076058  PMID: 24650600
Bayesian nonparametric; Dirichlet process prior; Discrete wavelet transform; fMRI; Long memory errors; Markov random field prior
9.  A Method for Efficiently Sampling From Distributions With Correlated Dimensions 
Psychological methods  2013;18(3):368-384.
Bayesian estimation has played a pivotal role in the understanding of individual differences. However, for many models in psychology, Bayesian estimation of model parameters can be difficult. One reason for this difficulty is that conventional sampling algorithms, such as Markov chain Monte Carlo (MCMC), can be inefficient and impractical when little is known about the target distribution—particularly the target distribution’s covariance structure. In this article, we highlight some reasons for this inefficiency and advocate the use of a population MCMC algorithm, called differential evolution Markov chain Monte Carlo (DE-MCMC), as a means of efficient proposal generation. We demonstrate in a simulation study that the performance of the DE-MCMC algorithm is unaffected by the correlation of the target distribution, whereas conventional MCMC performs substantially worse as the correlation increases. We then show that the DE-MCMC algorithm can be used to efficiently fit a hierarchical version of the linear ballistic accumulator model to response time data, which has proven to be a difficult task when conventional MCMC is used.
PMCID: PMC4140408  PMID: 23646991
differential evolution; optimal transition kernel; hierarchical Bayesian estimation; linear ballistic accumulator model; response time
10.  Joint Analysis of Stochastic Processes with Application to Smoking Patterns and Insomnia 
Statistics in medicine  2013;32(29):10.1002/sim.5906.
This article proposes a joint modeling framework for longitudinal insomnia measurements and a stochastic smoking cessation process in the presence of a latent permanent quitting state (i.e., “cure”). A generalized linear mixed-effects model is used for the longitudinal measurements of insomnia symptom and a stochastic mixed-effects model is used for the smoking cessation process. These two models are linked together via the latent random effects. A Bayesian framework and Markov Chain Monte Carlo algorithm are developed to obtain the parameter estimates. The likelihood functions involving time-dependent covariates are formulated and computed. The within-subject correlation between insomnia and smoking processes is explored. The proposed methodology is applied to simulation studies and the motivating dataset, i.e., the Alpha-Tocopherol, Beta-Carotene (ATBC) Lung Cancer Prevention study, a large longitudinal cohort study of smokers from Finland.
PMCID: PMC3856619  PMID: 23913574
Cure Model; MCMC; Mixed-effects Model; Joint Modeling; Recurrent Events; Bayes
11.  No Control Genes Required: Bayesian Analysis of qRT-PCR Data 
PLoS ONE  2013;8(8):e71448.
Model-based analysis of data from quantitative reverse-transcription PCR (qRT-PCR) is potentially more powerful and versatile than traditional methods. Yet existing model-based approaches cannot properly deal with the higher sampling variances associated with low-abundant targets, nor do they provide a natural way to incorporate assumptions about the stability of control genes directly into the model-fitting process.
In our method, raw qPCR data are represented as molecule counts, and described using generalized linear mixed models under Poisson-lognormal error. A Markov Chain Monte Carlo (MCMC) algorithm is used to sample from the joint posterior distribution over all model parameters, thereby estimating the effects of all experimental factors on the expression of every gene. The Poisson-based model allows for the correct specification of the mean-variance relationship of the PCR amplification process, and can also glean information from instances of no amplification (zero counts). Our method is very flexible with respect to control genes: any prior knowledge about the expected degree of their stability can be directly incorporated into the model. Yet the method provides sensible answers without such assumptions, or even in the complete absence of control genes. We also present a natural Bayesian analogue of the “classic” analysis, which uses standard data pre-processing steps (logarithmic transformation and multi-gene normalization) but estimates all gene expression changes jointly within a single model. The new methods are considerably more flexible and powerful than the standard delta-delta Ct analysis based on pairwise t-tests.
Our methodology expands the applicability of the relative-quantification analysis protocol all the way to the lowest-abundance targets, and provides a novel opportunity to analyze qRT-PCR data without making any assumptions concerning target stability. These procedures have been implemented as the MCMC.qpcr package in R.
PMCID: PMC3747227  PMID: 23977043
12.  Neural Dynamics as Sampling: A Model for Stochastic Computation in Recurrent Networks of Spiking Neurons 
PLoS Computational Biology  2011;7(11):e1002211.
The organization of computations in networks of spiking neurons in the brain is still largely unknown, in particular in view of the inherently stochastic features of their firing activity and the experimentally observed trial-to-trial variability of neural systems in the brain. In principle there exists a powerful computational framework for stochastic computations, probabilistic inference by sampling, which can explain a large number of macroscopic experimental data in neuroscience and cognitive science. But it has turned out to be surprisingly difficult to create a link between these abstract models for stochastic computations and more detailed models of the dynamics of networks of spiking neurons. Here we create such a link and show that under some conditions the stochastic firing activity of networks of spiking neurons can be interpreted as probabilistic inference via Markov chain Monte Carlo (MCMC) sampling. Since common methods for MCMC sampling in distributed systems, such as Gibbs sampling, are inconsistent with the dynamics of spiking neurons, we introduce a different approach based on non-reversible Markov chains that is able to reflect inherent temporal processes of spiking neuronal activity through a suitable choice of random variables. We propose a neural network model and show by a rigorous theoretical analysis that its neural activity implements MCMC sampling of a given distribution, both for the case of discrete and continuous time. This provides a step towards closing the gap between abstract functional models of cortical computation and more detailed models of networks of spiking neurons.
Author Summary
It is well-known that neurons communicate with short electric pulses, called action potentials or spikes. But how can spiking networks implement complex computations? Attempts to relate spiking network activity to results of deterministic computation steps, like the output bits of a processor in a digital computer, are conflicting with findings from cognitive science and neuroscience, the latter indicating the neural spike output in identical experiments changes from trial to trial, i.e., neurons are “unreliable”. Therefore, it has been recently proposed that neural activity should rather be regarded as samples from an underlying probability distribution over many variables which, e.g., represent a model of the external world incorporating prior knowledge, memories as well as sensory input. This hypothesis assumes that networks of stochastically spiking neurons are able to emulate powerful algorithms for reasoning in the face of uncertainty, i.e., to carry out probabilistic inference. In this work we propose a detailed neural network model that indeed fulfills these computational requirements and we relate the spiking dynamics of the network to concrete probabilistic computations. Our model suggests that neural systems are suitable to carry out probabilistic inference by using stochastic, rather than deterministic, computing elements.
PMCID: PMC3207943  PMID: 22096452
13.  Smoking Cessation for Patients With Chronic Obstructive Pulmonary Disease (COPD) 
Executive Summary
In July 2010, the Medical Advisory Secretariat (MAS) began work on a Chronic Obstructive Pulmonary Disease (COPD) evidentiary framework, an evidence-based review of the literature surrounding treatment strategies for patients with COPD. This project emerged from a request by the Health System Strategy Division of the Ministry of Health and Long-Term Care that MAS provide them with an evidentiary platform on the effectiveness and cost-effectiveness of COPD interventions.
After an initial review of health technology assessments and systematic reviews of COPD literature, and consultation with experts, MAS identified the following topics for analysis: vaccinations (influenza and pneumococcal), smoking cessation, multidisciplinary care, pulmonary rehabilitation, long-term oxygen therapy, noninvasive positive pressure ventilation for acute and chronic respiratory failure, hospital-at-home for acute exacerbations of COPD, and telehealth (including telemonitoring and telephone support). Evidence-based analyses were prepared for each of these topics. For each technology, an economic analysis was also completed where appropriate. In addition, a review of the qualitative literature on patient, caregiver, and provider perspectives on living and dying with COPD was conducted, as were reviews of the qualitative literature on each of the technologies included in these analyses.
The Chronic Obstructive Pulmonary Disease Mega-Analysis series is made up of the following reports, which can be publicly accessed at the MAS website at:
Chronic Obstructive Pulmonary Disease (COPD) Evidentiary Framework
Influenza and Pneumococcal Vaccinations for Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Smoking Cessation for Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Community-Based Multidisciplinary Care for Patients With Stable Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Pulmonary Rehabilitation for Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Long-term Oxygen Therapy for Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Noninvasive Positive Pressure Ventilation for Acute Respiratory Failure Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Noninvasive Positive Pressure Ventilation for Chronic Respiratory Failure Patients With Stable Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Hospital-at-Home Programs for Patients With Acute Exacerbations of Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Home Telehealth for Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Cost-Effectiveness of Interventions for Chronic Obstructive Pulmonary Disease Using an Ontario Policy Model
Experiences of Living and Dying With COPD: A Systematic Review and Synthesis of the Qualitative Empirical Literature
For more information on the qualitative review, please contact Mita Giacomini at: member_giacomini.htm.
For more information on the economic analysis, please visit the PATH website:
The Toronto Health Economics and Technology Assessment (THETA) collaborative has produced an associated report on patient preference for mechanical ventilation. For more information, please visit the THETA website:
The objective of this evidence-based analysis was to determine the effectiveness and cost-effectiveness of smoking cessation interventions in the management of chronic obstructive pulmonary disease (COPD).
Clinical Need: Condition and Target Population
Tobacco smoking is the main risk factor for COPD. It is estimated that 50% of older smokers develop COPD and more than 80% of COPD-associated morbidity is attributed to tobacco smoking. According to the Canadian Community Health Survey, 38.5% of Ontarians who smoke have COPD. In patients with a significant history of smoking, COPD is usually present with symptoms of progressive dyspnea (shortness of breath), cough, and sputum production. Patients with COPD who smoke have a particularly high level of nicotine dependence, and about 30.4% to 43% of patients with moderate to severe COPD continue to smoke. Despite the severe symptoms that COPD patients suffer, the majority of patients with COPD are unable to quit smoking on their own; each year only about 1% of smokers succeed in quitting on their own initiative.
Smoking cessation is the process of discontinuing the practice of inhaling a smoked substance. Smoking cessation can help to slow or halt the progression of COPD. Smoking cessation programs mainly target tobacco smoking, but may also encompass other substances that can be difficult to stop smoking due to the development of strong physical addictions or psychological dependencies resulting from their habitual use.
Smoking cessation strategies include both pharmacological and nonpharmacological (behavioural or psychosocial) approaches. The basic components of smoking cessation interventions include simple advice, written self-help materials, individual and group behavioural support, telephone quit lines, nicotine replacement therapy (NRT), and antidepressants. As nicotine addiction is a chronic, relapsing condition that usually requires several attempts to overcome, cessation support is often tailored to individual needs, while recognizing that in general, the more intensive the support, the greater the chance of success. Success at quitting smoking decreases in relation to:
a lack of motivation to quit,
a history of smoking more than a pack of cigarettes a day for more than 10 years,
a lack of social support, such as from family and friends, and
the presence of mental health disorders (such as depression).
Research Question
What are the effectiveness and cost-effectiveness of smoking cessation interventions compared with usual care for patients with COPD?
Research Methods
Literature Search
Search Strategy
A literature search was performed on June 24, 2010 using OVID MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations (1950 to June Week 3 2010), EMBASE (1980 to 2010 Week 24), the Cumulative Index to Nursing and Allied Health Literature (CINAHL), the Cochrane Library, and the Centre for Reviews and Dissemination for studies published between 1950 and June 2010. A single reviewer reviewed the abstracts and obtained full-text articles for those studies meeting the eligibility criteria. Reference lists were also examined for any additional relevant studies not identified through the search. Data were extracted using a standardized data abstraction form.
Inclusion Criteria
English-language, full reports from 1950 to week 3 of June, 2010;
either randomized controlled trials (RCTs), systematic reviews and meta-analyses, or non-RCTs with controls;
a proven diagnosis of COPD;
adult patients (≥ 18 years);
a smoking cessation intervention that comprised at least one of the treatment arms;
≥ 6 months’ abstinence as an outcome; and
patients followed for ≥ 6 months.
Exclusion Criteria
case reports
case series
Outcomes of Interest
≥ 6 months’ abstinence
Quality of Evidence
The quality of each included study was assessed taking into consideration allocation concealment, randomization, blinding, power/sample size, withdrawals/dropouts, and intention-to-treat analyses.
The quality of the body of evidence was assessed as high, moderate, low, or very low according to the GRADE Working Group criteria. The following definitions of quality were used in grading the quality of the evidence:
Summary of Findings
Nine RCTs were identified from the literature search. The sample sizes ranged from 74 to 5,887 participants. A total of 8,291 participants were included in the nine studies. The mean age of the patients in the studies ranged from 54 to 64 years. The majority of studies used the Global Initiative for Chronic Obstructive Lung Disease (GOLD) COPD staging criteria to stage the disease in study subjects. Studies included patients with mild COPD (2 studies), mild-moderate COPD (3 studies), moderate–severe COPD (1 study) and severe–very severe COPD (1 study). One study included persons at risk of COPD in addition to those with mild, moderate, or severe COPD, and 1 study did not define the stages of COPD. The individual quality of the studies was high. Smoking cessation interventions varied across studies and included counselling or pharmacotherapy or a combination of both. Two studies were delivered in a hospital setting, whereas the remaining 7 studies were delivered in an outpatient setting. All studies reported a usual care group or a placebo-controlled group (for the drug-only trials). The follow-up periods ranged from 6 months to 5 years. Due to excessive clinical heterogeneity in the interventions, studies were first grouped into categories of similar interventions; statistical pooling was subsequently performed, where appropriate. When possible, pooled estimates using relative risks for abstinence rates with 95% confidence intervals were calculated. The remaining studies were reported separately.
Abstinence Rates
Table ES1 provides a summary of the pooled estimates for abstinence, at longest follow-up, from the trials included in this review. It also shows the respective GRADE qualities of evidence.
Summary of Results*
Abbreviations: CI, confidence interval; NRT, nicotine replacement therapy.
Statistically significant (P < 0.05).
One trial used in this comparison had 2 treatment arms each examining a different antidepressant.
Based on a moderate quality of evidence, compared with usual care, abstinence rates are significantly higher in COPD patients receiving intensive counselling or a combination of intensive counselling and NRT.
Based on limited and moderate quality of evidence, abstinence rates are significantly higher in COPD patients receiving NRT compared with placebo.
Based on a moderate quality of evidence, abstinence rates are significantly higher in COPD patients receiving the antidepressant bupropion compared to placebo.
PMCID: PMC3384371  PMID: 23074432
14.  Inference of regulatory networks with a convergence improved MCMC sampler 
BMC Bioinformatics  2015;16(1):306.
One of the goals of the Systems Biology community is to have a detailed map of all biological interactions in an organism. One small yet important step in this direction is the creation of biological networks from post-genomic data. Bayesian networks are a very promising model for the inference of regulatory networks in Systems Biology. Usually, Bayesian networks are sampled with a Markov Chain Monte Carlo (MCMC) sampler in the structure space. Unfortunately, conventional MCMC sampling schemes are often slow in mixing and convergence. To improve MCMC convergence, an alternative method is proposed and tested with different sets of data. Moreover, the proposed method is compared with the traditional MCMC sampling scheme.
In the proposed method, a simpler and faster method for the inference of regulatory networks, Graphical Gaussian Models (GGMs), is integrated into the Bayesian network inference, trough a Hierarchical Bayesian model. In this manner, information about the structure obtained from the data with GGMs is taken into account in the MCMC scheme, thus improving mixing and convergence. The proposed method is tested with three types of data, two from simulated models and one from real data. The results are compared with the results of the traditional MCMC sampling scheme in terms of network recovery accuracy and convergence. The results show that when compared with a traditional MCMC scheme, the proposed method presents improved convergence leading to better network reconstruction with less MCMC iterations.
The proposed method is a viable alternative to improve mixing and convergence of traditional MCMC schemes. It allows the use of Bayesian networks with an MCMC sampler with less iterations. The proposed method has always converged earlier than the traditional MCMC scheme. We observe an improvement in accuracy of the recovered networks for the Gaussian simulated data, but this improvement is absent for both real data and data simulated from ODE.
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0734-6) contains supplementary material, which is available to authorized users.
PMCID: PMC4581096  PMID: 26399857
Bayesian networks; Genetic regulatory networks; Hierarchical bayesian modelling
15.  Profile-Based LC-MS Data Alignment—A Bayesian Approach 
A Bayesian alignment model (BAM) is proposed for alignment of liquid chromatography-mass spectrometry (LC-MS) data. BAM belongs to the category of profile-based approaches, which are composed of two major components: a prototype function and a set of mapping functions. Appropriate estimation of these functions is crucial for good alignment results. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler and 2) an adaptive selection of knots. A block Metropolis-Hastings algorithm that mitigates the problem of the MCMC sampler getting stuck at local modes of the posterior distribution is used for the update of the mapping function coefficients. In addition, a stochastic search variable selection (SSVS) methodology is used to determine the number and positions of knots. We applied BAM to a simulated data set, an LC-MS proteomic data set, and two LC-MS metabolomic data sets, and compared its performance with the Bayesian hierarchical curve registration (BHCR) model, the dynamic time-warping (DTW) model, and the continuous profile model (CPM). The advantage of applying appropriate profile-based retention time correction prior to performing a feature-based approach is also demonstrated through the metabolomic data sets.
PMCID: PMC3993096  PMID: 23929872
Alignment; Bayesian inference; block Metropolis-Hastings algorithm; liquid chromatography-mass spectrometry (LC-MS); Markov chain Monte Carlo (MCMC); stochastic search variable selection (SSVS)
16.  A computationally efficient algorithm for genomic prediction using a Bayesian model 
Genomic prediction of breeding values from dense single nucleotide polymorphisms (SNP) genotypes is used for livestock and crop breeding, and can also be used to predict disease risk in humans. For some traits, the most accurate genomic predictions are achieved with non-linear estimates of SNP effects from Bayesian methods that treat SNP effects as random effects from a heavy tailed prior distribution. These Bayesian methods are usually implemented via Markov chain Monte Carlo (MCMC) schemes to sample from the posterior distribution of SNP effects, which is computationally expensive. Our aim was to develop an efficient expectation–maximisation algorithm (emBayesR) that gives similar estimates of SNP effects and accuracies of genomic prediction than the MCMC implementation of BayesR (a Bayesian method for genomic prediction), but with greatly reduced computation time.
emBayesR is an approximate EM algorithm that retains the BayesR model assumption with SNP effects sampled from a mixture of normal distributions with increasing variance. emBayesR differs from other proposed non-MCMC implementations of Bayesian methods for genomic prediction in that it estimates the effect of each SNP while allowing for the error associated with estimation of all other SNP effects. emBayesR was compared to BayesR using simulated data, and real dairy cattle data with 632 003 SNPs genotyped, to determine if the MCMC and the expectation-maximisation approaches give similar accuracies of genomic prediction.
We were able to demonstrate that allowing for the error associated with estimation of other SNP effects when estimating the effect of each SNP in emBayesR improved the accuracy of genomic prediction over emBayesR without including this error correction, with both simulated and real data. When averaged over nine dairy traits, the accuracy of genomic prediction with emBayesR was only 0.5% lower than that from BayesR. However, emBayesR reduced computing time up to 8-fold compared to BayesR.
The emBayesR algorithm described here achieved similar accuracies of genomic prediction to BayesR for a range of simulated and real 630 K dairy SNP data. emBayesR needs less computing time than BayesR, which will allow it to be applied to larger datasets.
Electronic supplementary material
The online version of this article (doi:10.1186/s12711-014-0082-4) contains supplementary material, which is available to authorized users.
PMCID: PMC4415253  PMID: 25926276
17.  Fast joint detection-estimation of evoked brain activity in event-related FMRI using a variational approach 
In standard within-subject analyses of event-related fMRI data, two steps are usually performed separately: detection of brain activity and estimation of the hemodynamic response. Because these two steps are inherently linked, we adopt the so-called region-based Joint Detection-Estimation (JDE) framework that addresses this joint issue using a multivariate inference for detection and estimation. JDE is built by making use of a regional bilinear generative model of the BOLD response and constraining the parameter estimation by physiological priors using temporal and spatial information in a Markovian model. In contrast to previous works that use Markov Chain Monte Carlo (MCMC) techniques to sample the resulting intractable posterior distribution, we recast the JDE into a missing data framework and derive a Variational Expectation-Maximization (VEM) algorithm for its inference. A variational approximation is used to approximate the Markovian model in the unsupervised spatially adaptive JDE inference, which allows automatic fine-tuning of spatial regularization parameters. It provides a new algorithm that exhibits interesting properties in terms of estimation error and computational cost compared to the previously used MCMC-based approach. Experiments on artificial and real data show that VEM-JDE is robust to model mis-specification and provides computational gain while maintaining good performance in terms of activation detection and hemodynamic shape recovery.
PMCID: PMC4020803  PMID: 23096056
Biomedical signal detection-estimation; functional MRI; brain imaging; Joint Detection-Estimation; Markov random field; EM algorithm; Variational approximation; fMRI; VEM; Mean-field
18.  Model Discrimination in Dynamic Molecular Systems: Application to Parotid De-differentiation Network 
Journal of Computational Biology  2013;20(7):524-539.
In modern systems biology the modeling of longitudinal data, such as changes in mRNA concentrations, is often of interest. Fully parametric, ordinary differential equations (ODE)-based models are typically developed for the purpose, but their lack of fit in some examples indicates that more flexible Bayesian models may be beneficial, particularly when there are relatively few data points available. However, under such sparse data scenarios it is often difficult to identify the most suitable model. The process of falsifying inappropriate candidate models is called model discrimination. We propose here a formal method of discrimination between competing Bayesian mixture-type longitudinal models that is both sensitive and sufficiently flexible to account for the complex variability of the longitudinal molecular data. The ideas from the field of Bayesian analysis of computer model validation are applied, along with modern Markov Chain Monte Carlo (MCMC) algorithms, in order to derive an appropriate Bayes discriminant rule. We restrict attention to the two-model comparison problem and present the application of the proposed rule to the mRNA data in the de-differentiation network of three mRNA concentrations in mammalian salivary glands as well as to a large synthetic dataset derived from the model used in the recent DREAM6 competition.
PMCID: PMC3704053  PMID: 23829652
parotid dedifferentiation; ODE model; parameter estimation; Bayesian factor
19.  Joint modeling of multivariate longitudinal data and the dropout process in a competing risk setting: application to ICU data 
Joint modeling of longitudinal and survival data has been increasingly considered in clinical trials, notably in cancer and AIDS. In critically ill patients admitted to an intensive care unit (ICU), such models also appear to be of interest in the investigation of the effect of treatment on severity scores due to the likely association between the longitudinal score and the dropout process, either caused by deaths or live discharges from the ICU. However, in this competing risk setting, only cause-specific hazard sub-models for the multiple failure types data have been used.
We propose a joint model that consists of a linear mixed effects submodel for the longitudinal outcome, and a proportional subdistribution hazards submodel for the competing risks survival data, linked together by latent random effects. We use Markov chain Monte Carlo technique of Gibbs sampling to estimate the joint posterior distribution of the unknown parameters of the model. The proposed method is studied and compared to joint model with cause-specific hazards submodel in simulations and applied to a data set that consisted of repeated measurements of severity score and time of discharge and death for 1,401 ICU patients.
Time by treatment interaction was observed on the evolution of the mean SOFA score when ignoring potentially informative dropouts due to ICU deaths and live discharges from the ICU. In contrast, this was no longer significant when modeling the cause-specific hazards of informative dropouts. Such a time by treatment interaction persisted together with an evidence of treatment effect on the hazard of death when modeling dropout processes through the use of the Fine and Gray model for sub-distribution hazards.
In the joint modeling of competing risks with longitudinal response, differences in the handling of competing risk outcomes appear to translate into the estimated difference in treatment effect on the longitudinal outcome. Such a modeling strategy should be carefully defined prior to analysis.
PMCID: PMC2923158  PMID: 20670425
20.  Simplex Factor Models for Multivariate Unordered Categorical Data 
Gaussian latent factor models are routinely used for modeling of dependence in continuous, binary, and ordered categorical data. For unordered categorical variables, Gaussian latent factor models lead to challenging computation and complex modeling structures. As an alternative, we propose a novel class of simplex factor models. In the single-factor case, the model treats the different categorical outcomes as independent with unknown marginals. The model can characterize flexible dependence structures parsimoniously with few factors, and as factors are added, any multivariate categorical data distribution can be accurately approximated. Using a Bayesian approach for computation and inferences, a Markov chain Monte Carlo (MCMC) algorithm is proposed that scales well with increasing dimension, with the number of factors treated as unknown. We develop an efficient proposal for updating the base probability vector in hierarchical Dirichlet models. Theoretical properties are described, and we evaluate the approach through simulation examples. Applications are described for modeling dependence in nucleotide sequences and prediction from high-dimensional categorical features.
PMCID: PMC3728016  PMID: 23908561
Classification; Contingency table; Factor analysis; Latent variable; Nonparametric Bayes; Nonnegative tensor factorization; Mutual information; Polytomous regression
21.  Hierarchical Spatial Process Models for Multiple Traits in Large Genetic Trials 
This article expands upon recent interest in Bayesian hierarchical models in quantitative genetics by developing spatial process models for inference on additive and dominance genetic variance within the context of large spatially referenced trial datasets of multiple traits of interest. Direct application of such multivariate models to large spatial datasets is often computationally infeasible because of cubic order matrix algorithms involved in estimation. The situation is even worse in Markov chain Monte Carlo (MCMC) contexts where such computations are performed for several thousand iterations. Here, we discuss approaches that help obviate these hurdles without sacrificing the richness in modeling. For genetic effects, we demonstrate how an initial spectral decomposition of the relationship matrices negates the expensive matrix inversions required in previously proposed MCMC methods. For spatial effects we discuss a multivariate predictive process that reduces the computational burden by projecting the original process onto a subspace generated by realizations of the original process at a specified set of locations (or knots). We illustrate the proposed methods using a synthetic dataset with multivariate additive and dominant genetic effects and anisotropic spatial residuals, and a large dataset from a scots pine (Pinus sylvestris L.) progeny study conducted in northern Sweden. Our approaches enable us to provide a comprehensive analysis of this large trial which amply demonstrates that, in addition to violating basic assumptions of the linear model, ignoring spatial effects can result in downwardly biased measures of heritability.
PMCID: PMC2911798  PMID: 20676229
Bayesian inference; Cross-covariance functions; Genetic trait models; Heredity; Hierarchical spatial models; Markov chain Monte Carlo; Multivariate spatial process; Spatial predictive process
22.  Hierarchical Spatial Modeling of Additive and Dominance Genetic Variance for Large Spatial Trial Datasets 
Biometrics  2009;65(2):441-451.
This article expands upon recent interest in Bayesian hierarchical models in quantitative genetics by developing spatial process models for inference on additive and dominance genetic variance within the context of large spatially referenced trial datasets. Direct application of such models to large spatial datasets are, however, computationally infeasible because of cubic-order matrix algorithms involved in estimation. The situation is even worse in Markov chain Monte Carlo (MCMC) contexts where such computations are performed for several iterations. Here, we discuss approaches that help obviate these hurdles without sacrificing the richness in modeling. For genetic effects, we demonstrate how an initial spectral decomposition of the relationship matrices negate the expensive matrix inversions required in previously proposed MCMC methods. For spatial effects, we outline two approaches for circumventing the prohibitively expensive matrix decompositions: the first leverages analytical results from Ornstein–Uhlenbeck processes that yield computationally efficient tridiagonal structures, whereas the second derives a modified predictive process model from the original model by projecting its realizations to a lower-dimensional subspace, thereby reducing the computational burden. We illustrate the proposed methods using a synthetic dataset with additive, dominance, genetic effects and anisotropic spatial residuals, and a large dataset from a Scots pine (Pinus sylvestris L.) progeny study conducted in northern Sweden. Our approaches enable us to provide a comprehensive analysis of this large trial, which amply demonstrates that, in addition to violating basic assumptions of the linear model, ignoring spatial effects can result in downwardly biased measures of heritability.
PMCID: PMC2775095  PMID: 18759829
Bayesian inference; Genetic variance; Markov chain Monte Carlo; Ornstein-Uhlenbeck process; Spatial predictive process; Spatial process
23.  On the stability of the Bayenv method in assessing human SNP-environment associations 
Human Genomics  2014;8(1):1.
Phenotypic variation along environmental gradients has been documented among and within many species, and in some cases, genetic variation has been shown to be associated with these gradients. Bayenv is a relatively new method developed to detect patterns of polymorphisms associated with environmental gradients. Using a Bayesian Markov Chain Monte Carlo (MCMC) approach, Bayenv evaluates whether a linear model relating population allele frequencies to environmental variables is more probable than a null model based on observed frequencies of neutral markers. Although this method has been used to detect environmental adaptation in a number of species, including humans, plants, fish, and mosquitoes, stability between independent runs of this MCMC algorithm has not been characterized. In this paper, we explore the variability of results between runs and the factors contributing to it.
Independent runs of the Bayenv program were carried out using genome-wide single-nucleotide polymorphism (SNP) data from samples from 60 worldwide human populations following previous applications of the Bayenv method. To assess factors contributing to the method's stability, we used varying numbers of MCMC iterations and also analyzed a second modified data set that excluded two Siberian populations with extreme climate variables. Between any two runs, correlations between Bayes factors and the overlap of SNPs in the empirical p value tails were surprisingly low. Enrichments of genic versus non-genic SNPs in the empirical tails were more robust than the empirical p values; however, the significance of the enrichments for some environmental variables still varied among runs, contradicting previously published conclusions. Runs with a greater number of MCMC iterations slightly reduced run-to-run variability, and excluding the Siberian populations did not have a large effect on the stability of the runs.
Because of high run-to-run variability, we advise against making conclusions about genome-wide patterns of adaptation based on only one run of the Bayenv algorithm and recommend caution in interpreting previous studies that have used only one run. Moving forward, we suggest carrying out multiple independent runs of Bayenv and averaging Bayes factors between runs to produce more stable and reliable results. With these modifications, future discoveries of environmental adaptation within species using the Bayenv method will be more accurate, interpretable, and easily compared between studies.
PMCID: PMC3896655  PMID: 24405978
Environmental adaptation; Positive selection; Genome-wide scans; Human adaptation; Markov chain monte carlo; Natural selection
24.  Hamiltonian Monte Carlo methods for efficient parameter estimation in steady state dynamical systems 
BMC Bioinformatics  2014;15(1):253.
Parameter estimation for differential equation models of intracellular processes is a highly relevant bu challenging task. The available experimental data do not usually contain enough information to identify all parameters uniquely, resulting in ill-posed estimation problems with often highly correlated parameters. Sampling-based Bayesian statistical approaches are appropriate for tackling this problem. The samples are typically generated via Markov chain Monte Carlo, however such methods are computationally expensive and their convergence may be slow, especially if there are strong correlations between parameters. Monte Carlo methods based on Euclidean or Riemannian Hamiltonian dynamics have been shown to outperform other samplers by making proposal moves that take the local sensitivities of the system’s states into account and accepting these moves with high probability. However, the high computational cost involved with calculating the Hamiltonian trajectories prevents their widespread use for all but the smallest differential equation models. The further development of efficient sampling algorithms is therefore an important step towards improving the statistical analysis of predictive models of intracellular processes.
We show how state of the art Hamiltonian Monte Carlo methods may be significantly improved for steady state dynamical models. We present a novel approach for efficiently calculating the required geometric quantities by tracking steady states across the Hamiltonian trajectories using a Newton-Raphson method and employing local sensitivity information. Using our approach, we compare both Euclidean and Riemannian versions of Hamiltonian Monte Carlo on three models for intracellular processes with real data and demonstrate at least an order of magnitude improvement in the effective sampling speed. We further demonstrate the wider applicability of our approach to other gradient based MCMC methods, such as those based on Langevin diffusions.
Our approach is strictly benefitial in all test cases. The Matlab sources implementing our MCMC methodology is available from
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-253) contains supplementary material, which is available to authorized users.
PMCID: PMC4262080  PMID: 25066046
MCMC methods; Parameter estimation; Hybrid monte carlo; Steady state data; Systems biology
25.  Markov Chain Monte Carlo: an introduction for epidemiologists 
Markov Chain Monte Carlo (MCMC) methods are increasingly popular among epidemiologists. The reason for this may in part be that MCMC offers an appealing approach to handling some difficult types of analyses. Additionally, MCMC methods are those most commonly used for Bayesian analysis. However, epidemiologists are still largely unfamiliar with MCMC. They may lack familiarity either with he implementation of MCMC or with interpretation of the resultant output. As with tutorials outlining the calculus behind maximum likelihood in previous decades, a simple description of the machinery of MCMC is needed. We provide an introduction to conducting analyses with MCMC, and show that, given the same data and under certain model specifications, the results of an MCMC simulation match those of methods based on standard maximum-likelihood estimation (MLE). In addition, we highlight examples of instances in which MCMC approaches to data analysis provide a clear advantage over MLE. We hope that this brief tutorial will encourage epidemiologists to consider MCMC approaches as part of their analytic tool-kit.
PMCID: PMC3619958  PMID: 23569196

Results 1-25 (1313460)