PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1034426)

Clipboard (0)
None

Related Articles

1.  A joint-modeling approach to assess the impact of biomarker variability on the risk of developing clinical outcome 
In some clinical trials and epidemiologic studies, investigators are interested in knowing whether the variability of a biomarker is independently predictive of clinical outcomes. This question is often addressed via a naïve approach where a sample-based estimate (e.g., standard deviation) is calculated as a surrogate for the “true” variability and then used in regression models as a covariate assumed to be free of measurement error. However, it is well known that the measurement error in covariates causes underestimation of the true association. The issue of underestimation can be substantial when the precision is low because of limited number of measures per subject. The joint analysis of survival data and longitudinal data enables one to account for the measurement error in longitudinal data and has received substantial attention in recent years. In this paper we propose a joint model to assess the predictive effect of biomarker variability. The joint model consists of two linked sub-models, a linear mixed model with patient-specific variance for longitudinal data and a full parametric Weibull distribution for survival data, and the association between two models is induced by a latent Gaussian process. Parameters in the joint model are estimated under Bayesian framework and implemented using Markov chain Monte Carlo (MCMC) methods with WinBUGS software. The method is illustrated in the Ocular Hypertension Treatment Study to assess whether the variability of intraocular pressure is an independent risk of primary open-angle glaucoma. The performance of the method is also assessed by simulation studies.
doi:10.1007/s10260-010-0150-z
PMCID: PMC3039885  PMID: 21339862
Patient-specific variance; Survival data; Longitudinal data; Joint model; Markov chain Monte Carlo (MCMC); WinBUGS
2.  Toward Realistic and Practical Ideal Observer (IO) Estimation for the Optimization of Medical Imaging Systems 
Ieee Transactions on Medical Imaging  2008;27(10):1535-1543.
The ideal observer (IO) employs complete knowledge of the available data statistics and sets an upper limit on observer performance on a binary classification task. However, the IO test statistic cannot be calculated analytically, except for cases where object statistics are extremely simple. Kupinski et al. have developed a Markov chain Monte Carlo (MCMC) based technique to compute the IO test statistic for, in principle, arbitrarily complex objects and imaging systems. In this work, we applied MCMC to estimate the IO test statistic in the context of myocardial perfusion SPECT (MPS). We modeled the imaging system using an analytic SPECT projector with attenuation, distant-dependent detector-response modeling and Poisson noise statistics. The object is a family of parameterized torso phantoms with variable geometric and organ uptake parameters. To accelerate the imaging simulation process and thus enable the MCMC IO estimation, we used discretized anatomic parameters and continuous uptake parameters in defining the objects. The imaging process simulation was modeled by precomputing projections for each organ for a finite number of discretely-parameterized anatomic parameters and taking linear combinations of the organ projections based on continuous sampling of the organ uptake parameters. The proposed method greatly reduces the computational burden and allows MCMC IO estimation for a realistic MPS imaging simulation. We validated the proposed IO estimation technique by estimating IO test statistics for a large number of input objects. The properties of the first- and second-order statistics of the IO test statistics estimated using the MCMC IO estimation technique agreed well with theoretical predictions. Further, as expected, the IO had better performance, as measured by the receiver operating characteristic (ROC) curve, than the Hotelling observer. This method is developed for SPECT imaging. However, it can be adapted to any linear imaging system.
doi:10.1109/TMI.2008.924641
PMCID: PMC2739397  PMID: 18815105
Ideal observer; Markov chain Monte Carlo (MCMC)
3.  Prediction of transplant-free survival in idiopathic pulmonary fibrosis patients using joint models for event times and mixed multivariate longitudinal data 
Journal of applied statistics  2014;41(10):2192-2205.
SUMMARY
We implement a joint model for mixed multivariate longitudinal measurements, applied to the prediction of time until lung transplant or death in idiopathic pulmonary fibrosis. Specifically, we formulate a unified Bayesian joint model for the mixed longitudinal responses and time-to-event outcomes. For the longitudinal model of continuous and binary responses, we investigate multivariate generalized linear mixed models using shared random effects. Longitudinal and time-to-event data are assumed to be independent conditional on available covariates and shared parameters. A Markov chain Monte Carlo (MCMC) algorithm, implemented in OpenBUGS, is used for parameter estimation. To illustrate practical considerations in choosing a final model, we fit 37 different candidate models using all possible combinations of random effects and employ a Deviance Information Criterion (DIC) to select a best fitting model. We demonstrate the prediction of future event probabilities within a fixed time interval for patients utilizing baseline data, post-baseline longitudinal responses, and the time-to-event outcome. The performance of our joint model is also evaluated in simulation studies.
doi:10.1080/02664763.2014.909784
PMCID: PMC4157686  PMID: 25214700
Idiopathic Pulmonary Fibrosis; Joint model; Mixed continuous and binary data; Multivariate longitudinal data; Prediction model; Shared parameter model; Survival analysis
4.  Imputation strategies for missing binary outcomes in cluster randomized trials 
Background
Attrition, which leads to missing data, is a common problem in cluster randomized trials (CRTs), where groups of patients rather than individuals are randomized. Standard multiple imputation (MI) strategies may not be appropriate to impute missing data from CRTs since they assume independent data. In this paper, under the assumption of missing completely at random and covariate dependent missing, we compared six MI strategies which account for the intra-cluster correlation for missing binary outcomes in CRTs with the standard imputation strategies and complete case analysis approach using a simulation study.
Method
We considered three within-cluster and three across-cluster MI strategies for missing binary outcomes in CRTs. The three within-cluster MI strategies are logistic regression method, propensity score method, and Markov chain Monte Carlo (MCMC) method, which apply standard MI strategies within each cluster. The three across-cluster MI strategies are propensity score method, random-effects (RE) logistic regression approach, and logistic regression with cluster as a fixed effect. Based on the community hypertension assessment trial (CHAT) which has complete data, we designed a simulation study to investigate the performance of above MI strategies.
Results
The estimated treatment effect and its 95% confidence interval (CI) from generalized estimating equations (GEE) model based on the CHAT complete dataset are 1.14 (0.76 1.70). When 30% of binary outcome are missing completely at random, a simulation study shows that the estimated treatment effects and the corresponding 95% CIs from GEE model are 1.15 (0.76 1.75) if complete case analysis is used, 1.12 (0.72 1.73) if within-cluster MCMC method is used, 1.21 (0.80 1.81) if across-cluster RE logistic regression is used, and 1.16 (0.82 1.64) if standard logistic regression which does not account for clustering is used.
Conclusion
When the percentage of missing data is low or intra-cluster correlation coefficient is small, different approaches for handling missing binary outcome data generate quite similar results. When the percentage of missing data is large, standard MI strategies, which do not take into account the intra-cluster correlation, underestimate the variance of the treatment effect. Within-cluster and across-cluster MI strategies (except for random-effects logistic regression MI strategy), which take the intra-cluster correlation into account, seem to be more appropriate to handle the missing outcome from CRTs. Under the same imputation strategy and percentage of missingness, the estimates of the treatment effect from GEE and RE logistic regression models are similar.
doi:10.1186/1471-2288-11-18
PMCID: PMC3055218  PMID: 21324148
5.  Collaborative Automation Reliably Remediating Erroneous Conclusion Threats (CARRECT) 
Objective
The objective of the CARRECT software is to make cutting edge statistical methods for reducing bias in epidemiological studies easy to use and useful for both novice and expert users.
Introduction
Analyses produced by epidemiologists and public health practitioners are susceptible to bias from a number of sources including missing data, confounding variables, and statistical model selection. It often requires a great deal of expertise to understand and apply the multitude of tests, corrections, and selection rules, and these tasks can be time-consuming and burdensome. To address this challenge, Aptima began development of CARRECT, the Collaborative Automation Reliably Remediating Erroneous Conclusion Threats system. When complete, CARRECT will provide an expert system that can be embedded in an analyst’s workflow. CARRECT will support statistical bias reduction and improved analyses and decision making by engaging the user in a collaborative process in which the technology is transparent to the analyst.
Methods
Older approaches to imputing missing data, including mean imputation and single imputation regression methods, have steadily given way to a class of methods known as “multiple imputation” (hereafter “MI”; Rubin 1987). Rather than making the restrictive assumption that the data are missing completely at random (MCAR), MI typically assumes the data are missing at random (MAR).
There are two key innovations behind MI. First, the observed values can be useful in predicting the missing cells, and thus specifying a joint distribution of the data is the first step in implementing the models. Second, single imputation methods will likely fail not only because of the inherent uncertainty in the missing values but also because of the estimation uncertainty associated with generating the parameters in the imputation procedure itself. By contrast, drawing the missing values multiple times, thereby generating m complete datasets along with the estimated parameters of the model properly accounts for both types of uncertainty (Rubin 1987; King et al. 2001). As a result, MI will lead to valid standard errors and confidence intervals along with unbiased point estimates.
In order to compute the joint distribution, CARRECT uses a bootstrapping-based algorithm that gives essentially the same answers as the standard Bayesian Markov Chain Monte Carlo (MCMC) or Expectation Maximization (EM) approaches, is usually considerably faster than existing approaches and can handle many more variables.
Results
Tests were conducted on one of the proposed methods with an epidemiological dataset from the Integrated Health Interview Series (IHIS) producing verifiably unbiased results despite high missingness rates. In addition, mockups (Figure 1) were created of an intuitive data wizard that guides the user through the analysis processes by analyzing key features of a given dataset. The mockups also show prompts for the user to provide additional substantive knowledge to improve the handling of imperfect datasets, as well as the selection of the most appropriate algorithms and models.
Conclusions
Our approach and program were designed to make bias mitigation much more accessible to much more than only the statistical elite. We hope that it will have a wide impact on reducing bias in epidemiological studies and provide more accurate information to policymakers.
PMCID: PMC3692841
Bias reduction; Missing data; Statistical model selection
6.  A scalable, knowledge-based analysis framework for genetic association studies 
BMC Bioinformatics  2013;14:312.
Background
Testing for marginal associations between numerous genetic variants and disease may miss complex relationships among variables (e.g., gene-gene interactions). Bayesian approaches can model multiple variables together and offer advantages over conventional model building strategies, including using existing biological evidence as modeling priors and acknowledging that many models may fit the data well. With many candidate variables, Bayesian approaches to variable selection rely on algorithms to approximate the posterior distribution of models, such as Markov-Chain Monte Carlo (MCMC). Unfortunately, MCMC is difficult to parallelize and requires many iterations to adequately sample the posterior. We introduce a scalable algorithm called PEAK that improves the efficiency of MCMC by dividing a large set of variables into related groups using a rooted graph that resembles a mountain peak. Our algorithm takes advantage of parallel computing and existing biological databases when available.
Results
By using graphs to manage a model space with more than 500,000 candidate variables, we were able to improve MCMC efficiency and uncover the true simulated causal variables, including a gene-gene interaction. We applied PEAK to a case-control study of childhood asthma with 2,521 genetic variants. We used an informative graph for oxidative stress derived from Gene Ontology and identified several variants in ERBB4, OXR1, and BCL2 with strong evidence for associations with childhood asthma.
Conclusions
We introduced an extremely flexible analysis framework capable of efficiently performing Bayesian variable selection on many candidate variables. The PEAK algorithm can be provided with an informative graph, which can be advantageous when considering gene-gene interactions, or a symmetric graph, which simply divides the model space into manageable regions. The PEAK framework is compatible with various model forms, allowing for the algorithm to be configured for different study designs and applications, such as pathway or rare-variant analyses, by simple modifications to the model likelihood and proposal functions.
doi:10.1186/1471-2105-14-312
PMCID: PMC4015032  PMID: 24152222
7.  A Method for Efficiently Sampling From Distributions With Correlated Dimensions 
Psychological methods  2013;18(3):368-384.
Bayesian estimation has played a pivotal role in the understanding of individual differences. However, for many models in psychology, Bayesian estimation of model parameters can be difficult. One reason for this difficulty is that conventional sampling algorithms, such as Markov chain Monte Carlo (MCMC), can be inefficient and impractical when little is known about the target distribution—particularly the target distribution’s covariance structure. In this article, we highlight some reasons for this inefficiency and advocate the use of a population MCMC algorithm, called differential evolution Markov chain Monte Carlo (DE-MCMC), as a means of efficient proposal generation. We demonstrate in a simulation study that the performance of the DE-MCMC algorithm is unaffected by the correlation of the target distribution, whereas conventional MCMC performs substantially worse as the correlation increases. We then show that the DE-MCMC algorithm can be used to efficiently fit a hierarchical version of the linear ballistic accumulator model to response time data, which has proven to be a difficult task when conventional MCMC is used.
doi:10.1037/a0032222
PMCID: PMC4140408  PMID: 23646991
differential evolution; optimal transition kernel; hierarchical Bayesian estimation; linear ballistic accumulator model; response time
8.  Joint Analysis of Stochastic Processes with Application to Smoking Patterns and Insomnia 
Statistics in medicine  2013;32(29):10.1002/sim.5906.
This article proposes a joint modeling framework for longitudinal insomnia measurements and a stochastic smoking cessation process in the presence of a latent permanent quitting state (i.e., “cure”). A generalized linear mixed-effects model is used for the longitudinal measurements of insomnia symptom and a stochastic mixed-effects model is used for the smoking cessation process. These two models are linked together via the latent random effects. A Bayesian framework and Markov Chain Monte Carlo algorithm are developed to obtain the parameter estimates. The likelihood functions involving time-dependent covariates are formulated and computed. The within-subject correlation between insomnia and smoking processes is explored. The proposed methodology is applied to simulation studies and the motivating dataset, i.e., the Alpha-Tocopherol, Beta-Carotene (ATBC) Lung Cancer Prevention study, a large longitudinal cohort study of smokers from Finland.
doi:10.1002/sim.5906
PMCID: PMC3856619  PMID: 23913574
Cure Model; MCMC; Mixed-effects Model; Joint Modeling; Recurrent Events; Bayes
9.  No Control Genes Required: Bayesian Analysis of qRT-PCR Data 
PLoS ONE  2013;8(8):e71448.
Background
Model-based analysis of data from quantitative reverse-transcription PCR (qRT-PCR) is potentially more powerful and versatile than traditional methods. Yet existing model-based approaches cannot properly deal with the higher sampling variances associated with low-abundant targets, nor do they provide a natural way to incorporate assumptions about the stability of control genes directly into the model-fitting process.
Results
In our method, raw qPCR data are represented as molecule counts, and described using generalized linear mixed models under Poisson-lognormal error. A Markov Chain Monte Carlo (MCMC) algorithm is used to sample from the joint posterior distribution over all model parameters, thereby estimating the effects of all experimental factors on the expression of every gene. The Poisson-based model allows for the correct specification of the mean-variance relationship of the PCR amplification process, and can also glean information from instances of no amplification (zero counts). Our method is very flexible with respect to control genes: any prior knowledge about the expected degree of their stability can be directly incorporated into the model. Yet the method provides sensible answers without such assumptions, or even in the complete absence of control genes. We also present a natural Bayesian analogue of the “classic” analysis, which uses standard data pre-processing steps (logarithmic transformation and multi-gene normalization) but estimates all gene expression changes jointly within a single model. The new methods are considerably more flexible and powerful than the standard delta-delta Ct analysis based on pairwise t-tests.
Conclusions
Our methodology expands the applicability of the relative-quantification analysis protocol all the way to the lowest-abundance targets, and provides a novel opportunity to analyze qRT-PCR data without making any assumptions concerning target stability. These procedures have been implemented as the MCMC.qpcr package in R.
doi:10.1371/journal.pone.0071448
PMCID: PMC3747227  PMID: 23977043
10.  Neural Dynamics as Sampling: A Model for Stochastic Computation in Recurrent Networks of Spiking Neurons 
PLoS Computational Biology  2011;7(11):e1002211.
The organization of computations in networks of spiking neurons in the brain is still largely unknown, in particular in view of the inherently stochastic features of their firing activity and the experimentally observed trial-to-trial variability of neural systems in the brain. In principle there exists a powerful computational framework for stochastic computations, probabilistic inference by sampling, which can explain a large number of macroscopic experimental data in neuroscience and cognitive science. But it has turned out to be surprisingly difficult to create a link between these abstract models for stochastic computations and more detailed models of the dynamics of networks of spiking neurons. Here we create such a link and show that under some conditions the stochastic firing activity of networks of spiking neurons can be interpreted as probabilistic inference via Markov chain Monte Carlo (MCMC) sampling. Since common methods for MCMC sampling in distributed systems, such as Gibbs sampling, are inconsistent with the dynamics of spiking neurons, we introduce a different approach based on non-reversible Markov chains that is able to reflect inherent temporal processes of spiking neuronal activity through a suitable choice of random variables. We propose a neural network model and show by a rigorous theoretical analysis that its neural activity implements MCMC sampling of a given distribution, both for the case of discrete and continuous time. This provides a step towards closing the gap between abstract functional models of cortical computation and more detailed models of networks of spiking neurons.
Author Summary
It is well-known that neurons communicate with short electric pulses, called action potentials or spikes. But how can spiking networks implement complex computations? Attempts to relate spiking network activity to results of deterministic computation steps, like the output bits of a processor in a digital computer, are conflicting with findings from cognitive science and neuroscience, the latter indicating the neural spike output in identical experiments changes from trial to trial, i.e., neurons are “unreliable”. Therefore, it has been recently proposed that neural activity should rather be regarded as samples from an underlying probability distribution over many variables which, e.g., represent a model of the external world incorporating prior knowledge, memories as well as sensory input. This hypothesis assumes that networks of stochastically spiking neurons are able to emulate powerful algorithms for reasoning in the face of uncertainty, i.e., to carry out probabilistic inference. In this work we propose a detailed neural network model that indeed fulfills these computational requirements and we relate the spiking dynamics of the network to concrete probabilistic computations. Our model suggests that neural systems are suitable to carry out probabilistic inference by using stochastic, rather than deterministic, computing elements.
doi:10.1371/journal.pcbi.1002211
PMCID: PMC3207943  PMID: 22096452
11.  Smoking Cessation for Patients With Chronic Obstructive Pulmonary Disease (COPD) 
Executive Summary
In July 2010, the Medical Advisory Secretariat (MAS) began work on a Chronic Obstructive Pulmonary Disease (COPD) evidentiary framework, an evidence-based review of the literature surrounding treatment strategies for patients with COPD. This project emerged from a request by the Health System Strategy Division of the Ministry of Health and Long-Term Care that MAS provide them with an evidentiary platform on the effectiveness and cost-effectiveness of COPD interventions.
After an initial review of health technology assessments and systematic reviews of COPD literature, and consultation with experts, MAS identified the following topics for analysis: vaccinations (influenza and pneumococcal), smoking cessation, multidisciplinary care, pulmonary rehabilitation, long-term oxygen therapy, noninvasive positive pressure ventilation for acute and chronic respiratory failure, hospital-at-home for acute exacerbations of COPD, and telehealth (including telemonitoring and telephone support). Evidence-based analyses were prepared for each of these topics. For each technology, an economic analysis was also completed where appropriate. In addition, a review of the qualitative literature on patient, caregiver, and provider perspectives on living and dying with COPD was conducted, as were reviews of the qualitative literature on each of the technologies included in these analyses.
The Chronic Obstructive Pulmonary Disease Mega-Analysis series is made up of the following reports, which can be publicly accessed at the MAS website at: http://www.hqontario.ca/en/mas/mas_ohtas_mn.html.
Chronic Obstructive Pulmonary Disease (COPD) Evidentiary Framework
Influenza and Pneumococcal Vaccinations for Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Smoking Cessation for Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Community-Based Multidisciplinary Care for Patients With Stable Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Pulmonary Rehabilitation for Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Long-term Oxygen Therapy for Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Noninvasive Positive Pressure Ventilation for Acute Respiratory Failure Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Noninvasive Positive Pressure Ventilation for Chronic Respiratory Failure Patients With Stable Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Hospital-at-Home Programs for Patients With Acute Exacerbations of Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Home Telehealth for Patients With Chronic Obstructive Pulmonary Disease (COPD): An Evidence-Based Analysis
Cost-Effectiveness of Interventions for Chronic Obstructive Pulmonary Disease Using an Ontario Policy Model
Experiences of Living and Dying With COPD: A Systematic Review and Synthesis of the Qualitative Empirical Literature
For more information on the qualitative review, please contact Mita Giacomini at: http://fhs.mcmaster.ca/ceb/faculty member_giacomini.htm.
For more information on the economic analysis, please visit the PATH website: http://www.path-hta.ca/About-Us/Contact-Us.aspx.
The Toronto Health Economics and Technology Assessment (THETA) collaborative has produced an associated report on patient preference for mechanical ventilation. For more information, please visit the THETA website: http://theta.utoronto.ca/static/contact.
Objective
The objective of this evidence-based analysis was to determine the effectiveness and cost-effectiveness of smoking cessation interventions in the management of chronic obstructive pulmonary disease (COPD).
Clinical Need: Condition and Target Population
Tobacco smoking is the main risk factor for COPD. It is estimated that 50% of older smokers develop COPD and more than 80% of COPD-associated morbidity is attributed to tobacco smoking. According to the Canadian Community Health Survey, 38.5% of Ontarians who smoke have COPD. In patients with a significant history of smoking, COPD is usually present with symptoms of progressive dyspnea (shortness of breath), cough, and sputum production. Patients with COPD who smoke have a particularly high level of nicotine dependence, and about 30.4% to 43% of patients with moderate to severe COPD continue to smoke. Despite the severe symptoms that COPD patients suffer, the majority of patients with COPD are unable to quit smoking on their own; each year only about 1% of smokers succeed in quitting on their own initiative.
Technology
Smoking cessation is the process of discontinuing the practice of inhaling a smoked substance. Smoking cessation can help to slow or halt the progression of COPD. Smoking cessation programs mainly target tobacco smoking, but may also encompass other substances that can be difficult to stop smoking due to the development of strong physical addictions or psychological dependencies resulting from their habitual use.
Smoking cessation strategies include both pharmacological and nonpharmacological (behavioural or psychosocial) approaches. The basic components of smoking cessation interventions include simple advice, written self-help materials, individual and group behavioural support, telephone quit lines, nicotine replacement therapy (NRT), and antidepressants. As nicotine addiction is a chronic, relapsing condition that usually requires several attempts to overcome, cessation support is often tailored to individual needs, while recognizing that in general, the more intensive the support, the greater the chance of success. Success at quitting smoking decreases in relation to:
a lack of motivation to quit,
a history of smoking more than a pack of cigarettes a day for more than 10 years,
a lack of social support, such as from family and friends, and
the presence of mental health disorders (such as depression).
Research Question
What are the effectiveness and cost-effectiveness of smoking cessation interventions compared with usual care for patients with COPD?
Research Methods
Literature Search
Search Strategy
A literature search was performed on June 24, 2010 using OVID MEDLINE, MEDLINE In-Process and Other Non-Indexed Citations (1950 to June Week 3 2010), EMBASE (1980 to 2010 Week 24), the Cumulative Index to Nursing and Allied Health Literature (CINAHL), the Cochrane Library, and the Centre for Reviews and Dissemination for studies published between 1950 and June 2010. A single reviewer reviewed the abstracts and obtained full-text articles for those studies meeting the eligibility criteria. Reference lists were also examined for any additional relevant studies not identified through the search. Data were extracted using a standardized data abstraction form.
Inclusion Criteria
English-language, full reports from 1950 to week 3 of June, 2010;
either randomized controlled trials (RCTs), systematic reviews and meta-analyses, or non-RCTs with controls;
a proven diagnosis of COPD;
adult patients (≥ 18 years);
a smoking cessation intervention that comprised at least one of the treatment arms;
≥ 6 months’ abstinence as an outcome; and
patients followed for ≥ 6 months.
Exclusion Criteria
case reports
case series
Outcomes of Interest
≥ 6 months’ abstinence
Quality of Evidence
The quality of each included study was assessed taking into consideration allocation concealment, randomization, blinding, power/sample size, withdrawals/dropouts, and intention-to-treat analyses.
The quality of the body of evidence was assessed as high, moderate, low, or very low according to the GRADE Working Group criteria. The following definitions of quality were used in grading the quality of the evidence:
Summary of Findings
Nine RCTs were identified from the literature search. The sample sizes ranged from 74 to 5,887 participants. A total of 8,291 participants were included in the nine studies. The mean age of the patients in the studies ranged from 54 to 64 years. The majority of studies used the Global Initiative for Chronic Obstructive Lung Disease (GOLD) COPD staging criteria to stage the disease in study subjects. Studies included patients with mild COPD (2 studies), mild-moderate COPD (3 studies), moderate–severe COPD (1 study) and severe–very severe COPD (1 study). One study included persons at risk of COPD in addition to those with mild, moderate, or severe COPD, and 1 study did not define the stages of COPD. The individual quality of the studies was high. Smoking cessation interventions varied across studies and included counselling or pharmacotherapy or a combination of both. Two studies were delivered in a hospital setting, whereas the remaining 7 studies were delivered in an outpatient setting. All studies reported a usual care group or a placebo-controlled group (for the drug-only trials). The follow-up periods ranged from 6 months to 5 years. Due to excessive clinical heterogeneity in the interventions, studies were first grouped into categories of similar interventions; statistical pooling was subsequently performed, where appropriate. When possible, pooled estimates using relative risks for abstinence rates with 95% confidence intervals were calculated. The remaining studies were reported separately.
Abstinence Rates
Table ES1 provides a summary of the pooled estimates for abstinence, at longest follow-up, from the trials included in this review. It also shows the respective GRADE qualities of evidence.
Summary of Results*
Abbreviations: CI, confidence interval; NRT, nicotine replacement therapy.
Statistically significant (P < 0.05).
One trial used in this comparison had 2 treatment arms each examining a different antidepressant.
Conclusions
Based on a moderate quality of evidence, compared with usual care, abstinence rates are significantly higher in COPD patients receiving intensive counselling or a combination of intensive counselling and NRT.
Based on limited and moderate quality of evidence, abstinence rates are significantly higher in COPD patients receiving NRT compared with placebo.
Based on a moderate quality of evidence, abstinence rates are significantly higher in COPD patients receiving the antidepressant bupropion compared to placebo.
PMCID: PMC3384371  PMID: 23074432
12.  Profile-Based LC-MS Data Alignment—A Bayesian Approach 
A Bayesian alignment model (BAM) is proposed for alignment of liquid chromatography-mass spectrometry (LC-MS) data. BAM belongs to the category of profile-based approaches, which are composed of two major components: a prototype function and a set of mapping functions. Appropriate estimation of these functions is crucial for good alignment results. BAM uses Markov chain Monte Carlo (MCMC) methods to draw inference on the model parameters and improves on existing MCMC-based alignment methods through 1) the implementation of an efficient MCMC sampler and 2) an adaptive selection of knots. A block Metropolis-Hastings algorithm that mitigates the problem of the MCMC sampler getting stuck at local modes of the posterior distribution is used for the update of the mapping function coefficients. In addition, a stochastic search variable selection (SSVS) methodology is used to determine the number and positions of knots. We applied BAM to a simulated data set, an LC-MS proteomic data set, and two LC-MS metabolomic data sets, and compared its performance with the Bayesian hierarchical curve registration (BHCR) model, the dynamic time-warping (DTW) model, and the continuous profile model (CPM). The advantage of applying appropriate profile-based retention time correction prior to performing a feature-based approach is also demonstrated through the metabolomic data sets.
doi:10.1109/TCBB.2013.25
PMCID: PMC3993096  PMID: 23929872
Alignment; Bayesian inference; block Metropolis-Hastings algorithm; liquid chromatography-mass spectrometry (LC-MS); Markov chain Monte Carlo (MCMC); stochastic search variable selection (SSVS)
13.  Joint modeling of multivariate longitudinal data and the dropout process in a competing risk setting: application to ICU data 
Background
Joint modeling of longitudinal and survival data has been increasingly considered in clinical trials, notably in cancer and AIDS. In critically ill patients admitted to an intensive care unit (ICU), such models also appear to be of interest in the investigation of the effect of treatment on severity scores due to the likely association between the longitudinal score and the dropout process, either caused by deaths or live discharges from the ICU. However, in this competing risk setting, only cause-specific hazard sub-models for the multiple failure types data have been used.
Methods
We propose a joint model that consists of a linear mixed effects submodel for the longitudinal outcome, and a proportional subdistribution hazards submodel for the competing risks survival data, linked together by latent random effects. We use Markov chain Monte Carlo technique of Gibbs sampling to estimate the joint posterior distribution of the unknown parameters of the model. The proposed method is studied and compared to joint model with cause-specific hazards submodel in simulations and applied to a data set that consisted of repeated measurements of severity score and time of discharge and death for 1,401 ICU patients.
Results
Time by treatment interaction was observed on the evolution of the mean SOFA score when ignoring potentially informative dropouts due to ICU deaths and live discharges from the ICU. In contrast, this was no longer significant when modeling the cause-specific hazards of informative dropouts. Such a time by treatment interaction persisted together with an evidence of treatment effect on the hazard of death when modeling dropout processes through the use of the Fine and Gray model for sub-distribution hazards.
Conclusions
In the joint modeling of competing risks with longitudinal response, differences in the handling of competing risk outcomes appear to translate into the estimated difference in treatment effect on the longitudinal outcome. Such a modeling strategy should be carefully defined prior to analysis.
doi:10.1186/1471-2288-10-69
PMCID: PMC2923158  PMID: 20670425
14.  Model Discrimination in Dynamic Molecular Systems: Application to Parotid De-differentiation Network 
Journal of Computational Biology  2013;20(7):524-539.
Abstract
In modern systems biology the modeling of longitudinal data, such as changes in mRNA concentrations, is often of interest. Fully parametric, ordinary differential equations (ODE)-based models are typically developed for the purpose, but their lack of fit in some examples indicates that more flexible Bayesian models may be beneficial, particularly when there are relatively few data points available. However, under such sparse data scenarios it is often difficult to identify the most suitable model. The process of falsifying inappropriate candidate models is called model discrimination. We propose here a formal method of discrimination between competing Bayesian mixture-type longitudinal models that is both sensitive and sufficiently flexible to account for the complex variability of the longitudinal molecular data. The ideas from the field of Bayesian analysis of computer model validation are applied, along with modern Markov Chain Monte Carlo (MCMC) algorithms, in order to derive an appropriate Bayes discriminant rule. We restrict attention to the two-model comparison problem and present the application of the proposed rule to the mRNA data in the de-differentiation network of three mRNA concentrations in mammalian salivary glands as well as to a large synthetic dataset derived from the model used in the recent DREAM6 competition.
doi:10.1089/cmb.2011.0222
PMCID: PMC3704053  PMID: 23829652
parotid dedifferentiation; ODE model; parameter estimation; Bayesian factor
15.  Fast joint detection-estimation of evoked brain activity in event-related FMRI using a variational approach 
In standard within-subject analyses of event-related fMRI data, two steps are usually performed separately: detection of brain activity and estimation of the hemodynamic response. Because these two steps are inherently linked, we adopt the so-called region-based Joint Detection-Estimation (JDE) framework that addresses this joint issue using a multivariate inference for detection and estimation. JDE is built by making use of a regional bilinear generative model of the BOLD response and constraining the parameter estimation by physiological priors using temporal and spatial information in a Markovian model. In contrast to previous works that use Markov Chain Monte Carlo (MCMC) techniques to sample the resulting intractable posterior distribution, we recast the JDE into a missing data framework and derive a Variational Expectation-Maximization (VEM) algorithm for its inference. A variational approximation is used to approximate the Markovian model in the unsupervised spatially adaptive JDE inference, which allows automatic fine-tuning of spatial regularization parameters. It provides a new algorithm that exhibits interesting properties in terms of estimation error and computational cost compared to the previously used MCMC-based approach. Experiments on artificial and real data show that VEM-JDE is robust to model mis-specification and provides computational gain while maintaining good performance in terms of activation detection and hemodynamic shape recovery.
doi:10.1109/TMI.2012.2225636
PMCID: PMC4020803  PMID: 23096056
Biomedical signal detection-estimation; functional MRI; brain imaging; Joint Detection-Estimation; Markov random field; EM algorithm; Variational approximation; fMRI; VEM; Mean-field
16.  Simplex Factor Models for Multivariate Unordered Categorical Data 
Gaussian latent factor models are routinely used for modeling of dependence in continuous, binary, and ordered categorical data. For unordered categorical variables, Gaussian latent factor models lead to challenging computation and complex modeling structures. As an alternative, we propose a novel class of simplex factor models. In the single-factor case, the model treats the different categorical outcomes as independent with unknown marginals. The model can characterize flexible dependence structures parsimoniously with few factors, and as factors are added, any multivariate categorical data distribution can be accurately approximated. Using a Bayesian approach for computation and inferences, a Markov chain Monte Carlo (MCMC) algorithm is proposed that scales well with increasing dimension, with the number of factors treated as unknown. We develop an efficient proposal for updating the base probability vector in hierarchical Dirichlet models. Theoretical properties are described, and we evaluate the approach through simulation examples. Applications are described for modeling dependence in nucleotide sequences and prediction from high-dimensional categorical features.
doi:10.1080/01621459.2011.646934
PMCID: PMC3728016  PMID: 23908561
Classification; Contingency table; Factor analysis; Latent variable; Nonparametric Bayes; Nonnegative tensor factorization; Mutual information; Polytomous regression
17.  Hierarchical Spatial Process Models for Multiple Traits in Large Genetic Trials 
This article expands upon recent interest in Bayesian hierarchical models in quantitative genetics by developing spatial process models for inference on additive and dominance genetic variance within the context of large spatially referenced trial datasets of multiple traits of interest. Direct application of such multivariate models to large spatial datasets is often computationally infeasible because of cubic order matrix algorithms involved in estimation. The situation is even worse in Markov chain Monte Carlo (MCMC) contexts where such computations are performed for several thousand iterations. Here, we discuss approaches that help obviate these hurdles without sacrificing the richness in modeling. For genetic effects, we demonstrate how an initial spectral decomposition of the relationship matrices negates the expensive matrix inversions required in previously proposed MCMC methods. For spatial effects we discuss a multivariate predictive process that reduces the computational burden by projecting the original process onto a subspace generated by realizations of the original process at a specified set of locations (or knots). We illustrate the proposed methods using a synthetic dataset with multivariate additive and dominant genetic effects and anisotropic spatial residuals, and a large dataset from a scots pine (Pinus sylvestris L.) progeny study conducted in northern Sweden. Our approaches enable us to provide a comprehensive analysis of this large trial which amply demonstrates that, in addition to violating basic assumptions of the linear model, ignoring spatial effects can result in downwardly biased measures of heritability.
doi:10.1198/jasa.2009.ap09068
PMCID: PMC2911798  PMID: 20676229
Bayesian inference; Cross-covariance functions; Genetic trait models; Heredity; Hierarchical spatial models; Markov chain Monte Carlo; Multivariate spatial process; Spatial predictive process
18.  Hierarchical Spatial Modeling of Additive and Dominance Genetic Variance for Large Spatial Trial Datasets 
Biometrics  2009;65(2):441-451.
Summary
This article expands upon recent interest in Bayesian hierarchical models in quantitative genetics by developing spatial process models for inference on additive and dominance genetic variance within the context of large spatially referenced trial datasets. Direct application of such models to large spatial datasets are, however, computationally infeasible because of cubic-order matrix algorithms involved in estimation. The situation is even worse in Markov chain Monte Carlo (MCMC) contexts where such computations are performed for several iterations. Here, we discuss approaches that help obviate these hurdles without sacrificing the richness in modeling. For genetic effects, we demonstrate how an initial spectral decomposition of the relationship matrices negate the expensive matrix inversions required in previously proposed MCMC methods. For spatial effects, we outline two approaches for circumventing the prohibitively expensive matrix decompositions: the first leverages analytical results from Ornstein–Uhlenbeck processes that yield computationally efficient tridiagonal structures, whereas the second derives a modified predictive process model from the original model by projecting its realizations to a lower-dimensional subspace, thereby reducing the computational burden. We illustrate the proposed methods using a synthetic dataset with additive, dominance, genetic effects and anisotropic spatial residuals, and a large dataset from a Scots pine (Pinus sylvestris L.) progeny study conducted in northern Sweden. Our approaches enable us to provide a comprehensive analysis of this large trial, which amply demonstrates that, in addition to violating basic assumptions of the linear model, ignoring spatial effects can result in downwardly biased measures of heritability.
doi:10.1111/j.1541-0420.2008.01115.x
PMCID: PMC2775095  PMID: 18759829
Bayesian inference; Genetic variance; Markov chain Monte Carlo; Ornstein-Uhlenbeck process; Spatial predictive process; Spatial process
19.  Hamiltonian Monte Carlo methods for efficient parameter estimation in steady state dynamical systems 
BMC Bioinformatics  2014;15(1):253.
Background
Parameter estimation for differential equation models of intracellular processes is a highly relevant bu challenging task. The available experimental data do not usually contain enough information to identify all parameters uniquely, resulting in ill-posed estimation problems with often highly correlated parameters. Sampling-based Bayesian statistical approaches are appropriate for tackling this problem. The samples are typically generated via Markov chain Monte Carlo, however such methods are computationally expensive and their convergence may be slow, especially if there are strong correlations between parameters. Monte Carlo methods based on Euclidean or Riemannian Hamiltonian dynamics have been shown to outperform other samplers by making proposal moves that take the local sensitivities of the system’s states into account and accepting these moves with high probability. However, the high computational cost involved with calculating the Hamiltonian trajectories prevents their widespread use for all but the smallest differential equation models. The further development of efficient sampling algorithms is therefore an important step towards improving the statistical analysis of predictive models of intracellular processes.
Results
We show how state of the art Hamiltonian Monte Carlo methods may be significantly improved for steady state dynamical models. We present a novel approach for efficiently calculating the required geometric quantities by tracking steady states across the Hamiltonian trajectories using a Newton-Raphson method and employing local sensitivity information. Using our approach, we compare both Euclidean and Riemannian versions of Hamiltonian Monte Carlo on three models for intracellular processes with real data and demonstrate at least an order of magnitude improvement in the effective sampling speed. We further demonstrate the wider applicability of our approach to other gradient based MCMC methods, such as those based on Langevin diffusions.
Conclusion
Our approach is strictly benefitial in all test cases. The Matlab sources implementing our MCMC methodology is available from https://github.com/a-kramer/ode_rmhmc.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-253) contains supplementary material, which is available to authorized users.
doi:10.1186/1471-2105-15-253
PMCID: PMC4262080  PMID: 25066046
MCMC methods; Parameter estimation; Hybrid monte carlo; Steady state data; Systems biology
20.  On the stability of the Bayenv method in assessing human SNP-environment associations 
Human Genomics  2014;8(1):1.
Background
Phenotypic variation along environmental gradients has been documented among and within many species, and in some cases, genetic variation has been shown to be associated with these gradients. Bayenv is a relatively new method developed to detect patterns of polymorphisms associated with environmental gradients. Using a Bayesian Markov Chain Monte Carlo (MCMC) approach, Bayenv evaluates whether a linear model relating population allele frequencies to environmental variables is more probable than a null model based on observed frequencies of neutral markers. Although this method has been used to detect environmental adaptation in a number of species, including humans, plants, fish, and mosquitoes, stability between independent runs of this MCMC algorithm has not been characterized. In this paper, we explore the variability of results between runs and the factors contributing to it.
Results
Independent runs of the Bayenv program were carried out using genome-wide single-nucleotide polymorphism (SNP) data from samples from 60 worldwide human populations following previous applications of the Bayenv method. To assess factors contributing to the method's stability, we used varying numbers of MCMC iterations and also analyzed a second modified data set that excluded two Siberian populations with extreme climate variables. Between any two runs, correlations between Bayes factors and the overlap of SNPs in the empirical p value tails were surprisingly low. Enrichments of genic versus non-genic SNPs in the empirical tails were more robust than the empirical p values; however, the significance of the enrichments for some environmental variables still varied among runs, contradicting previously published conclusions. Runs with a greater number of MCMC iterations slightly reduced run-to-run variability, and excluding the Siberian populations did not have a large effect on the stability of the runs.
Conclusions
Because of high run-to-run variability, we advise against making conclusions about genome-wide patterns of adaptation based on only one run of the Bayenv algorithm and recommend caution in interpreting previous studies that have used only one run. Moving forward, we suggest carrying out multiple independent runs of Bayenv and averaging Bayes factors between runs to produce more stable and reliable results. With these modifications, future discoveries of environmental adaptation within species using the Bayenv method will be more accurate, interpretable, and easily compared between studies.
doi:10.1186/1479-7364-8-1
PMCID: PMC3896655  PMID: 24405978
Environmental adaptation; Positive selection; Genome-wide scans; Human adaptation; Markov chain monte carlo; Natural selection
21.  Markov Chain Monte Carlo: an introduction for epidemiologists 
Markov Chain Monte Carlo (MCMC) methods are increasingly popular among epidemiologists. The reason for this may in part be that MCMC offers an appealing approach to handling some difficult types of analyses. Additionally, MCMC methods are those most commonly used for Bayesian analysis. However, epidemiologists are still largely unfamiliar with MCMC. They may lack familiarity either with he implementation of MCMC or with interpretation of the resultant output. As with tutorials outlining the calculus behind maximum likelihood in previous decades, a simple description of the machinery of MCMC is needed. We provide an introduction to conducting analyses with MCMC, and show that, given the same data and under certain model specifications, the results of an MCMC simulation match those of methods based on standard maximum-likelihood estimation (MLE). In addition, we highlight examples of instances in which MCMC approaches to data analysis provide a clear advantage over MLE. We hope that this brief tutorial will encourage epidemiologists to consider MCMC approaches as part of their analytic tool-kit.
doi:10.1093/ije/dyt043
PMCID: PMC3619958  PMID: 23569196
22.  Model fitting and inference under Latent Equilibrium Processes 
Statistics and computing  2007;17(2):193-208.
This paper presents a methodology for model fitting and inference in the context of Bayesian models of the type f(Y | X, θ)f(X | θ)f(θ), where Y is the (set of) observed data, θ is a set of model parameters and X is an unobserved (latent) stationary stochastic process induced by the first order transition model f(X(t+1) | X(t), θ), where X(t) denotes the state of the process at time (or generation) t. The crucial feature of the above type of model is that, given θ, the transition model f(X(t+1) | X(t), θ) is known but the distribution of the stochastic process in equilibrium, that is f(X | θ), is, except in very special cases, intractable, hence unknown. A further point to note is that the data Y has been assumed to be observed when the underlying process is in equilibrium. In other words, the data is not collected dynamically over time.
We refer to such specification as a latent equilibrium process (LEP) model. It is motivated by problems in population genetics (though other applications are discussed), where it is of interest to learn about parameters such as mutation and migration rates and population sizes, given a sample of allele frequencies at one or more loci. In such problems it is natural to assume that the distribution of the observed allele frequencies depends on the true (unobserved) population allele frequencies, whereas the distribution of the true allele frequencies is only indirectly specified through a transition model.
As a hierarchical specification, it is natural to fit the LEP within a Bayesian framework. Fitting such models is usually done via Markov chain Monte Carlo (MCMC). However, we demonstrate that, in the case of LEP models, implementation of MCMC is far from straightforward. The main contribution of this paper is to provide a methodology to implement MCMC for LEP models. We demonstrate our approach in population genetics problems with both simulated and real data sets. The resultant model fitting is computationally intensive and thus, we also discuss parallel implementation of the procedure in special cases.
doi:10.1007/s11222-006-9015-6
PMCID: PMC2557441  PMID: 18836571
Allele; Migration; Mutation; Bayesian hierarchical model; MCMC
23.  Estimating Fetal and Maternal Genetic Contributions to Premature Birth From Multiparous Pregnancy Histories of Twins Using MCMC and Maximum-Likelihood Approaches 
The analysis of genetic and environmental contributions to preterm birth is not straightforward in family studies, as etiology could involve both maternal and fetal genes. Markov Chain Monte Carlo (MCMC) methods are presented as a flexible approach for defining user-specified covariance structures to handle multiple random effects and hierarchical dependencies inherent in children of twin (COT) studies of pregnancy outcomes. The proposed method is easily modified to allow for the study of gestational age as a continuous trait and as a binary outcome reflecting the presence or absence of preterm birth. Estimation of fetal and maternal genetic factors and the effect of the environment are demonstrated using MCMC methods implemented in WinBUGS and maximum likelihood methods in a Virginia COT sample comprising 7,061 births. In summary, although the contribution of maternal and fetal genetic factors was supported using both outcomes, additional births and/or extended relationships are required to precisely estimate both genetic effects simultaneously. We anticipate the flexibility of MCMC methods to handle increasingly complex models to be of particular relevance for the study of birth outcomes.
doi:10.1375/twin.12.4.333
PMCID: PMC2913409  PMID: 19653833
preterm birth; fetal; maternal; genetic; environment; MCMC; ML
24.  Real-time individual predictions of prostate cancer recurrence using joint models 
Biometrics  2013;69(1):206-213.
Summary
Patients who were previously treated for prostate cancer with radiation therapy are monitored at regular intervals using a laboratory test called Prostate Specific Antigen (PSA). If the value of the PSA test starts to rise, this is an indication that the prostate cancer is more likely to recur, and the patient may wish to initiate new treatments. Such patients could be helped in making medical decisions by an accurate estimate of the probability of recurrence of the cancer in the next few years. In this paper, we describe the methodology for giving the probability of recurrence for a new patient, as implemented on a web-based calculator. The methods use a joint longitudinal survival model. The model is developed on a training dataset of 2,386 patients and tested on a dataset of 846 patients. Bayesian estimation methods are used with one Markov chain Monte Carlo (MCMC) algorithm developed for estimation of the parameters from the training dataset and a second quick MCMC developed for prediction of the risk of recurrence that uses the longitudinal PSA measures from a new patient.
doi:10.1111/j.1541-0420.2012.01823.x
PMCID: PMC3622120  PMID: 23379600
Joint longitudinal-survival model; Online calculator; Predicted probability; Prostate cancer; PSA
25.  Bayesian Inference of Sampled Ancestor Trees for Epidemiology and Fossil Calibration 
PLoS Computational Biology  2014;10(12):e1003919.
Phylogenetic analyses which include fossils or molecular sequences that are sampled through time require models that allow one sample to be a direct ancestor of another sample. As previously available phylogenetic inference tools assume that all samples are tips, they do not allow for this possibility. We have developed and implemented a Bayesian Markov Chain Monte Carlo (MCMC) algorithm to infer what we call sampled ancestor trees, that is, trees in which sampled individuals can be direct ancestors of other sampled individuals. We use a family of birth-death models where individuals may remain in the tree process after sampling, in particular we extend the birth-death skyline model [Stadler et al., 2013] to sampled ancestor trees. This method allows the detection of sampled ancestors as well as estimation of the probability that an individual will be removed from the process when it is sampled. We show that even if sampled ancestors are not of specific interest in an analysis, failing to account for them leads to significant bias in parameter estimates. We also show that sampled ancestor birth-death models where every sample comes from a different time point are non-identifiable and thus require one parameter to be known in order to infer other parameters. We apply our phylogenetic inference accounting for sampled ancestors to epidemiological data, where the possibility of sampled ancestors enables us to identify individuals that infected other individuals after being sampled and to infer fundamental epidemiological parameters. We also apply the method to infer divergence times and diversification rates when fossils are included along with extant species samples, so that fossilisation events are modelled as a part of the tree branching process. Such modelling has many advantages as argued in the literature. The sampler is available as an open-source BEAST2 package (https://github.com/CompEvol/sampled-ancestors).
Author Summary
A central goal of phylogenetic analysis is to estimate evolutionary relationships and the dynamical parameters underlying the evolutionary branching process (e.g. macroevolutionary or epidemiological parameters) from molecular data. The statistical methods used in these analyses require that the underlying tree branching process is specified. Standard models for the branching process which were originally designed to describe the evolutionary past of present day species do not allow one sampled taxon to be the ancestor of another. However the probability of sampling a direct ancestor is not negligible for many types of data. For example, when fossil and living species are analysed together to infer species divergence times, fossil species may or may not be direct ancestors of living species. In epidemiology, a sampled individual (a host from which a pathogen sequence was obtained) can infect other individuals after sampling, which then go on to be sampled themselves. The models that account for direct ancestors produce phylogenetic trees with a different structure from classic phylogenetic trees and so using these models in inference requires new computational methods. Here we developed a method for phylogenetic analysis that accounts for the possibility of direct ancestors.
doi:10.1371/journal.pcbi.1003919
PMCID: PMC4263412  PMID: 25474353

Results 1-25 (1034426)