In the past decade, several principal stratification–based statistical methods have been developed for testing and estimation of a treatment effect on an outcome measured after a postrandomization event. Two examples are the evaluation of the effect of a cancer treatment on quality of life in subjects who remain alive and the evaluation of the effect of an HIV vaccine on viral load in subjects who acquire HIV infection. However, in general the developed methods have not addressed the issue of missing outcome data, and hence their validity relies on a missing completely at random (MCAR) assumption. Because in many applications the MCAR assumption is untenable, while a missing at random (MAR) assumption is defensible, we extend the semiparametric likelihood sensitivity analysis approach of Gilbert and others (2003) and Jemiai and Rotnitzky (2005) to allow the outcome to be MAR. We combine these methods with the robust likelihood–based method of Little and An (2004) for handling MAR data to provide semiparametric estimation of the average causal effect of treatment on the outcome. The new method, which does not require a monotonicity assumption, is evaluated in a simulation study and is applied to data from the first HIV vaccine efficacy trial.
Causal inference; HIV vaccine trial; Missing at random; Posttreatment selection bias; Principal stratification; Sensitivity analysis
The ROC (Receiver Operating Characteristic) curve is the most commonly used statistical tool for describing the discriminatory accuracy of a diagnostic test. Classical estimation of the ROC curve relies on data from a simple random sample from the target population. In practice, estimation is often complicated due to not all subjects undergoing a definitive assessment of disease status (verification). Estimation of the ROC curve based on data only from subjects with verified disease status may be badly biased. In this work we investigate the properties of the doubly robust (DR) method for estimating the ROC curve under verification bias originally developed by Rotnitzky et al. (2006) for estimating the area under the ROC curve. The DR method can be applied for continuous scaled tests and allows for a non ignorable process of selection to verification. We develop the estimator's asymptotic distribution and examine its finite sample properties via a simulation study. We exemplify the DR procedure for estimation of ROC curves with data collected on patients undergoing electron beam computer tomography, a diagnostic test for calcification of the arteries.
Diagnostic test; Nonignorable; Semiparametric model; Sensitivity analysis; Sensitivity; Specificity
Policy decisions often require synthesis of evidence from multiple sources, and the source studies typically vary in rigour and in relevance to the target question. We present simple methods of allowing for differences in rigour (or lack of internal bias) and relevance (or lack of external bias) in evidence synthesis. The methods are developed in the context of reanalysing a UK National Institute for Clinical Excellence technology appraisal in antenatal care, which includes eight comparative studies. Many were historically controlled, only one was a randomized trial and doses, populations and outcomes varied between studies and differed from the target UK setting. Using elicited opinion, we construct prior distributions to represent the biases in each study and perform a bias-adjusted meta-analysis. Adjustment had the effect of shifting the combined estimate away from the null by approximately 10%, and the variance of the combined estimate was almost tripled. Our generic bias modelling approach allows decisions to be based on all available evidence, with less rigorous or less relevant studies downweighted by using computationally simple methods.
Bias; Elicitation; Evidence synthesis; Heterogeneity; Meta-analysis
Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified.
Diagnostic test; Model misspecification; Propensity score; Sensitivity; Specificity
Misclassification of binary outcome variables is a known source of potentially serious bias when estimating adjusted odds ratios. Although researchers have described frequentist and Bayesian methods for dealing with the problem, these methods have seldom fully bridged the gap between statistical research and epidemiologic practice. In particular, there have been few real-world applications of readily grasped and computationally accessible methods that make direct use of internal validation data to adjust for differential outcome misclassification in logistic regression. In this paper, we illustrate likelihood-based methods for this purpose that can be implemented using standard statistical software. Using main study and internal validation data from the HIV Epidemiology Research Study, we demonstrate how misclassification rates can depend on the values of subject-specific covariates, and illustrate the importance of accounting for this dependence. Simulation studies confirm the effectiveness of the maximum likelihood approach. We emphasize clear exposition of the likelihood function itself, to permit the reader to easily assimilate appended computer code that facilitates sensitivity analyses as well as the efficient handling of main/external and main/internal validation-study data. These methods are readily applicable under random cross-sectional sampling, and we discuss the extent to which the main/internal analysis remains appropriate under outcome-dependent (case-control) sampling.
Vaccination remains the primary preventive strategy in the elderly against Streptococcus pneumoniae and influenza infections. The effectiveness of this strategy in preventing pneumonia has been in doubt despite the increase in vaccination coverage among older adults. Randomized controlled trials (RCTs) and observational studies aimed at determining clinical outcomes and immune response following pneumococcal vaccination have yielded conflicting results. The protective efficacy of pneumococcal vaccination against pneumonia in older adults has not been firmly established due to a lack of RCTs specifically examining patients ≥ 65 years of age. Similarly, the reported benefits of influenza vaccination have been derived from observational data. The assessment of clinical benefit from influenza vaccination in the elderly population is complicated by varying cohorts, virulence of the influenza strain, and matching of vaccine and circulating viral strains. The presence of selection bias and use of nonspecific end points in these studies make the current evidence inconclusive in terms of overall benefit. The development of more immunogenic vaccines through new formulations or addition of adjuvants holds the promise of revolutionizing delivery and improving efficacy. Dismantling existing barriers through education, providing technology assistance predominantly to developing countries, and establishing clear regulatory guidance on pathways for approval are necessary to ensure timely production and equitable distribution.
pneumococcal vaccine; influenza vaccine; vaccine effectiveness; pneumonia; older adults
Poisson regression modeling has been widely used to estimate influenza-associated disease burden, as it has the advantage of adjusting for multiple seasonal confounders. However, few studies have discussed how to judge the adequacy of confounding adjustment. This study aims to compare the performance of commonly adopted model selection criteria in terms of providing a reliable and valid estimate for the health impact of influenza.
We assessed four model selection criteria: quasi Akaike information criterion (QAIC), quasi Bayesian information criterion (QBIC), partial autocorrelation functions of residuals (PACF), and generalized cross-validation (GCV), by separately applying them to select the Poisson model best fitted to the mortality datasets that were simulated under the different assumptions of seasonal confounding. The performance of these criteria was evaluated by the bias and root-mean-square error (RMSE) of estimates from the pre-determined coefficients of influenza proxy variable. These four criteria were subsequently applied to an empirical hospitalization dataset to confirm the findings of simulation study.
GCV consistently provided smaller biases and RMSEs for the influenza coefficient estimates than QAIC, QBIC and PACF, under the different simulation scenarios. Sensitivity analysis of different pre-determined influenza coefficients, study periods and lag weeks showed that GCV consistently outperformed the other criteria. Similar results were found in applying these selection criteria to estimate influenza-associated hospitalization.
GCV criterion is recommended for selection of Poisson models to estimate influenza-associated mortality and morbidity burden with proper adjustment for confounding. These findings shall help standardize the Poisson modeling approach for influenza disease burden studies.
Investigations of the effect of placebo are often challenging to conduct and interpret. The history of placebo shows that assessment of its clinical significance has a real potential to be biased. We analyse and discuss typical types of bias in studies on placebo.
STUDY DESIGN AND SETTING
a methodological analysis and discussion.
The inherent nonblinded comparison between placebo and no-treatment is the best research design we have in estimating effects of placebo, both in a clinical and in an experimental setting, but the difference between placebo and no-treatment remains an approximate and fairly crude reflection of the true effect of placebo interventions. A main problem is response bias in trials with outcomes that are based on patients reports. Other biases involve differential co-intervention and patient drop-outs, publication bias, and outcome reporting bias. Furthermore, extrapolation of results to a clinical settings are challenging because of lack of clear identification of the causal factors in many clinical trials, and the non-clinical setting and short duration of most laboratory experiments.
Creative experimental efforts are needed to assess rigorously the clinical significance of placebo interventions and investigate the component elements that may contribute to therapeutic benefit.
Placebo; placebo effect; response bias; bias; randomised trial; experiment
The effects of vaccine on postinfection outcomes, such as disease, death, and secondary transmission to others, are important scientific and public health aspects of prophylactic vaccination. As a result, evaluation of many vaccine effects condition on being infected. Conditioning on an event that occurs posttreatment (in our case, infection subsequent to assignment to vaccine or control) can result in selection bias. Moreover, because the set of individuals who would become infected if vaccinated is likely not identical to the set of those who would become infected if given control, comparisons that condition on infection do not have a causal interpretation. In this article we consider identifiability and estimation of causal vaccine effects on binary postinfection outcomes. Using the principal stratification framework, we define a postinfection causal vaccine efficacy estimand in individuals who would be infected regardless of treatment assignment. The estimand is shown to be not identifiable under the standard assumptions of the stable unit treatment value, monotonicity, and independence of treatment assignment. Thus selection models are proposed that identify the causal estimand. Closed-form maximum likelihood estimators (MLEs) are then derived under these models, including those assuming maximum possible levels of positive and negative selection bias. These results show the relations between the MLE of the causal estimand and two commonly used estimators for vaccine effects on postinfection outcomes. For example, the usual intent-to-treat estimator is shown to be an upper bound on the postinfection causal vaccine effect provided that the magnitude of protection against infection is not too large. The methods are used to evaluate postinfection vaccine effects in a clinical trial of a rotavirus vaccine candidate and in a field study of a pertussis vaccine. Our results show that pertussis vaccination has a significant causal effect in reducing disease severity.
Causal inference; Infectious disease; Maximum likelihood; Principal stratification; Sensitivity analysis
When analysing microarray and other small sample size biological datasets, care is needed to avoid various biases. We analyse a form of bias, stratification bias, that can substantially affect analyses using sample-reuse validation techniques and lead to inaccurate results. This bias is due to imperfect stratification of samples in the training and test sets and the dependency between these stratification errors, i.e. the variations in class proportions in the training and test sets are negatively correlated.
We show that when estimating the performance of classifiers on low signal datasets (i.e. those which are difficult to classify), which are typical of many prognostic microarray studies, commonly used performance measures can suffer from a substantial negative bias. For error rate this bias is only severe in quite restricted situations, but can be much larger and more frequent when using ranking measures such as the receiver operating characteristic (ROC) curve and area under the ROC (AUC). Substantial biases are shown in simulations and on the van 't Veer breast cancer dataset. The classification error rate can have large negative biases for balanced datasets, whereas the AUC shows substantial pessimistic biases even for imbalanced datasets. In simulation studies using 10-fold cross-validation, AUC values of less than 0.3 can be observed on random datasets rather than the expected 0.5. Further experiments on the van 't Veer breast cancer dataset show these biases exist in practice.
Stratification bias can substantially affect several performance measures. In computing the AUC, the strategy of pooling the test samples from the various folds of cross-validation can lead to large biases; computing it as the average of per-fold estimates avoids this bias and is thus the recommended approach. As a more general solution applicable to other performance measures, we show that stratified repeated holdout and a modified version of k-fold cross-validation, balanced, stratified cross-validation and balanced leave-one-out cross-validation, avoids the bias. Therefore for model selection and evaluation of microarray and other small biological datasets, these methods should be used and unstratified versions avoided. In particular, the commonly used (unbalanced) leave-one-out cross-validation should not be used to estimate AUC for small datasets.
In 2003, an internet-based monitoring system of influenza-like illness (ILI), the Great Influenza Survey (GIS), was initiated in Belgium. For the Flemish part of Belgium, we investigate the representativeness of the GIS population and assess the validity of the survey in terms of ILI incidence during eight influenza seasons (from 2003 through 2011). The validity is investigated by comparing estimated ILI incidences from the GIS with recorded incidences from two other monitoring systems, (i) the Belgian Sentinel Network and (ii) the Google Flu Trends, and by performing a risk factor analysis to investigate whether the risks on acquiring ILI in the GIS population are comparable with results in the literature. A random walk model of first order is used to estimate ILI incidence trends based on the GIS. Good to excellent correspondence is observed between the estimated ILI trends in the GIS and the recorded trends in the Sentinel Network and the Google Flu Trends. The results of the risk factor analysis are in line with the literature. In conclusion, the GIS is a useful additional surveillance network for ILI monitoring in Flanders. The advantages are the speed at which information is available and the fact that data is gathered directly in the community at an individual level.
Kenya introduced a pentavalent vaccine including the DTP, Haemophilus influenzae type b and hepatitis b virus antigens in Nov 2001 and strengthened immunization services. We estimated immunization coverage before and after introduction, timeliness of vaccination and risk factors for failure to immunize in Kilifi district, Kenya.
In Nov 2002 we performed WHO cluster-sample surveys of >200 children scheduled for vaccination before or after introduction of pentavalent vaccine. In Mar 2004 we conducted a simple random sample (SRS) survey of 204 children aged 9–23 months. Coverage was estimated by inverse Kaplan-Meier survival analysis of vaccine-card and mothers' recall data and corroborated by reviewing administrative records from national and provincial vaccine stores. The contribution to timely immunization of distance from clinic, seasonal rainfall, mother's age, and family size was estimated by a proportional hazards model.
Immunization coverage for three DTP and pentavalent doses was 100% before and 91% after pentavalent vaccine introduction, respectively. By SRS survey, coverage was 88% for three pentavalent doses. The median age at first, second and third vaccine dose was 8, 13 and 18 weeks. Vials dispatched to Kilifi District during 2001–2003 would provide three immunizations for 92% of the birth cohort. Immunization rate ratios were reduced with every kilometre of distance from home to vaccine clinic (HR 0.95, CI 0.91–1.00), rainy seasons (HR 0.73, 95% CI 0.61–0.89) and family size, increasing progressively up to 4 children (HR 0.55, 95% CI 0.41–0.73).
Vaccine coverage was high before and after introduction of pentavalent vaccine, but most doses were given late. Coverage is limited by seasonal factors and family size.
The development of codon bias indices (CBIs) remains an active field of research due to their myriad applications in computational biology. Recently, the relative codon usage bias (RCBS) was introduced as a novel CBI able to estimate codon bias without using a reference set. The results of this new index when applied to Escherichia coli and Saccharomyces cerevisiae led the authors of the original publications to conclude that natural selection favours higher expression and enhanced codon usage optimization in short genes. Here, we show that this conclusion was flawed and based on the systematic oversight of an intrinsic bias for short sequences in the RCBS index and of biases in the small data sets used for validation in E. coli. Furthermore, we reveal that how the RCBS can be corrected to produce useful results and how its underlying principle, which we here term relative codon adaptation (RCA), can be made into a powerful reference-set-based index that directly takes into account the genomic base composition. Finally, we show that RCA outperforms the codon adaptation index (CAI) as a predictor of gene expression when operating on the CAI reference set and that this improvement is significantly larger when analysing genomes with high mutational bias.
codon bias index; gene expression; codon usage; Escherichia coli; highly expressed genes
Case-control studies are particularly susceptible to differential exposure misclassification when exposure status is determined following incident case status. Probabilistic bias analysis methods have been developed as ways to adjust standard effect estimates based on the sensitivity and specificity of exposure misclassification. The iterative sampling method advocated in probabilistic bias analysis bears a distinct resemblance to a Bayesian adjustment; however, it is not identical. Furthermore, without a formal theoretical framework (Bayesian or frequentist), the results of a probabilistic bias analysis remain somewhat difficult to interpret. We describe, both theoretically and empirically, the extent to which probabilistic bias analysis can be viewed as approximately Bayesian. While the differences between probabilistic bias analysis and Bayesian approaches to misclassification can be substantial, these situations often involve unrealistic prior specifications and are relatively easy to detect. Outside of these special cases, probabilistic bias analysis and Bayesian approaches to exposure misclassification in case-control studies appear to perform equally well.
Oncology is a highly researched therapeutic area with an ever expanding armamentarium of drugs entering the market. It is unique in how the heterogeneity of tumor, patient and treatment factors is critical in determining outcomes of interventions. When it comes to decision making in the clinic, the practicing physician often seeks answers in populations with obvious deviations from the ideal selected populations included in the pivotal phase III randomized controlled trials (RCTs). While the randomized nature of the RCT ensures its high internal validity by removing bias, their ‘controlled’ nature casts a doubt on their generalizability to the real world population. It is for this reason that trials done in a naturalistic setting post the marketing authorization of a drug are increasingly required. This article discusses the importance of non interventional drug studies in oncology as an important tool in testing the external validity of controlled trial results and its value in generation of new hypothesis. It also discusses the limitations of such studies while outlining the steps in their effective conduct.
Good clinical practice; non interventional studies; standard operating procedures; study plan
We propose a double-penalized likelihood approach for simultaneous model selection and estimation in semiparametric mixed models for longitudinal data. Two types of penalties are jointly imposed on the ordinary log-likelihood: the roughness penalty on the nonparametric baseline function and a nonconcave shrinkage penalty on linear coefficients to achieve model sparsity. Compared to existing estimation equation based approaches, our procedure provides valid inference for data with missing at random, and will be more efficient if the specified model is correct. Another advantage of the new procedure is its easy computation for both regression components and variance parameters. We show that the double penalized problem can be conveniently reformulated into a linear mixed model framework, so that existing software can be directly used to implement our method. For the purpose of model inference, we derive both frequentist and Bayesian variance estimation for estimated parametric and nonparametric components. Simulation is used to evaluate and compare the performance of our method to the existing ones. We then apply the new method to a real data set from a lactation study.
Correlated data; Gaussian stochastic process; Linear mixed models; Smoothly clipped absolute deviation; Smoothing splines
Many have documented the difficulty of using the current paradigm of Randomized Controlled Trials (RCTs) to test and validate the effectiveness of alternative medical systems such as Ayurveda. This paper critiques the applicability of RCTs for all clinical knowledge-seeking endeavors, of which Ayurveda research is a part. This is done by examining statistical hypothesis testing, the underlying foundation of RCTs, from a practical and philosophical perspective. In the philosophical critique, the two main worldviews of probability are that of the Bayesian and the frequentist. The frequentist worldview is a special case of the Bayesian worldview requiring the unrealistic assumptions of knowing nothing about the universe and believing that all observations are unrelated to each other. Many have claimed that the first belief is necessary for science, and this claim is debunked by comparing variations in learning with different prior beliefs. Moving beyond the Bayesian and frequentist worldviews, the notion of hypothesis testing itself is challenged on the grounds that a hypothesis is an unclear distinction, and assigning a probability on an unclear distinction is an exercise that does not lead to clarity of action. This critique is of the theory itself and not any particular application of statistical hypothesis testing. A decision-making frame is proposed as a way of both addressing this critique and transcending ideological debates on probability. An example of a Bayesian decision-making approach is shown as an alternative to statistical hypothesis testing, utilizing data from a past clinical trial that studied the effect of Aspirin on heart attacks in a sample population of doctors. As a big reason for the prevalence of RCTs in academia is legislation requiring it, the ethics of legislating the use of statistical methods for clinical research is also examined.
Bayesian; decision analysis; statistical hypothesis testing
If a vaccine does not protect individuals completely against infection, it could still reduce infectiousness of infected vaccinated individuals to others. Typically, vaccine efficacy for infectiousness is estimated based on contrasts between the transmission risk to susceptible individuals from infected vaccinated individuals compared with that from infected unvaccinated individuals. Such estimates are problematic, however, because they are subject to selection bias and do not have a causal interpretation. Here, we develop causal estimands for vaccine efficacy for infectiousness for four different scenarios of populations of transmission units of size two. These causal estimands incorporate both principal stratification, based on the joint potential infection outcomes under vaccine and control, and interference between individuals within transmission units. In the most general scenario, both individuals can be exposed to infection outside the transmission unit and both can be assigned either vaccine or control. The three other scenarios are special cases of the general scenario where only one individual is exposed outside the transmission unit or can be assigned vaccine. The causal estimands for vaccine efficacy for infectiousness are well defined only within certain principal strata and, in general, are identifiable only with strong unverifiable assumptions. Nonetheless, the observed data do provide some information, and we derive large sample bounds on the causal vaccine efficacy for infectiousness estimands. An example of the type of data observed in a study to estimate vaccine efficacy for infectiousness is analyzed in the causal inference framework we developed.
causal inference; principal stratification; interference; infectious disease; vaccine
The goal was to estimate the effectiveness of influenza vaccination against laboratory-confirmed influenza during the 2003–2004 and 2004–2005 influenza seasons in children aged 6-59 months.
We conducted a case-control study in children with a medically attended acute respiratory infection who received care in the inpatient, emergency department or outpatient clinic setting during two consecutive influenza seasons. All children resided in Monroe County, NY, Davidson County, TN or Hamilton County, OH, were prospectively enrolled at the time of acute illness, and had nasal/throat swabs tested for influenza by culture and/or polymerase chain reaction. Children with laboratory-confirmed influenza were cases and children who tested negative for influenza were controls. Child vaccination records from the parent and from the child's physician were used to determine and validate influenza vaccination status. Influenza vaccine effectiveness was calculated as (1 – adjusted odds ratio) × 100.
We enrolled 288 cases and 744 controls during the 2003–2004 season, and 197 cases and 1,305 controls during the 2004–2005 season. Six percent and 19% of all study children were fully vaccinated according to immunization guidelines in the respective seasons. Full vaccination was associated with significantly fewer influenza-related inpatient, emergency department, or outpatient clinic visits in 2004-2005 [vaccine effectiveness = 57% (95% CI: 28%-74%)], but not in 2003-2004 [vaccine effectiveness = 44% (95% CI: -42%-78%)]. Partial vaccination was not effective in either season.
Receipt of all recommended doses of influenza vaccine was associated with halving of laboratory-confirmed influenza-related medical visits among children aged 6-59 months in one of two study years, despite suboptimal matches between the vaccine and circulating influenza strains in both years.
Children; Vaccine Effectiveness; Laboratory-confirmed; Influenza
Network meta-analysis (NMA), a generalization of conventional MA, allows for assessing the relative effectiveness of multiple interventions. Reporting bias is a major threat to the validity of MA and NMA. Numerous methods are available to assess the robustness of MA results to reporting bias. We aimed to extend such methods to NMA.
We introduced 2 adjustment models for Bayesian NMA. First, we extended a meta-regression model that allows the effect size to depend on its standard error. Second, we used a selection model that estimates the propensity of trial results being published and in which trials with lower propensity are weighted up in the NMA model. Both models rely on the assumption that biases are exchangeable across the network. We applied the models to 2 networks of placebo-controlled trials of 12 antidepressants, with 74 trials in the US Food and Drug Administration (FDA) database but only 51 with published results. NMA and adjustment models were used to estimate the effects of the 12 drugs relative to placebo, the 66 effect sizes for all possible pair-wise comparisons between drugs, probabilities of being the best drug and ranking of drugs. We compared the results from the 2 adjustment models applied to published data and NMAs of published data and NMAs of FDA data, considered as representing the totality of the data.
Both adjustment models showed reduced estimated effects for the 12 drugs relative to the placebo as compared with NMA of published data. Pair-wise effect sizes between drugs, probabilities of being the best drug and ranking of drugs were modified. Estimated drug effects relative to the placebo from both adjustment models were corrected (i.e., similar to those from NMA of FDA data) for some drugs but not others, which resulted in differences in pair-wise effect sizes between drugs and ranking.
In this case study, adjustment models showed that NMA of published data was not robust to reporting bias and provided estimates closer to that of NMA of FDA data, although not optimal. The validity of such methods depends on the number of trials in the network and the assumption that conventional MAs in the network share a common mean bias mechanism.
Network meta-analysis; Publication bias; Small-study effect
Objectives To evaluate the risk of bias tool, introduced by the Cochrane Collaboration for assessing the internal validity of randomised trials, for inter-rater agreement, concurrent validity compared with the Jadad scale and Schulz approach to allocation concealment, and the relation between risk of bias and effect estimates.
Design Cross sectional study.
Study sample 163 trials in children.
Main outcome measures Inter-rater agreement between reviewers assessing trials using the risk of bias tool (weighted κ), time to apply the risk of bias tool compared with other approaches to quality assessment (paired t test), degree of correlation for overall risk compared with overall quality scores (Kendall’s τ statistic), and magnitude of effect estimates for studies classified as being at high, unclear, or low risk of bias (metaregression).
Results Inter-rater agreement on individual domains of the risk of bias tool ranged from slight (κ=0.13) to substantial (κ=0.74). The mean time to complete the risk of bias tool was significantly longer than for the Jadad scale and Schulz approach, individually or combined (8.8 minutes (SD 2.2) per study v 2.0 (SD 0.8), P<0.001). There was low correlation between risk of bias overall compared with the Jadad scores (P=0.395) and Schulz approach (P=0.064). Effect sizes differed between studies assessed as being at high or unclear risk of bias (0.52) compared with those at low risk (0.23).
Conclusions Inter-rater agreement varied across domains of the risk of bias tool. Generally, agreement was poorer for those items that required more judgment. There was low correlation between assessments of overall risk of bias and two common approaches to quality assessment: the Jadad scale and Schulz approach to allocation concealment. Overall risk of bias as assessed by the risk of bias tool differentiated effect estimates, with more conservative estimates for studies at low risk.
The associations of pesticide exposure with disease outcomes are estimated without the benefit of a randomized design. For this reason and others, these studies are susceptible to systematic errors. I analyzed studies of the associations between alachlor and glyphosate exposure and cancer incidence, both derived from the Agricultural Health Study cohort, to quantify the bias and uncertainty potentially attributable to systematic error.
For each study, I identified the prominent result and important sources of systematic error that might affect it. I assigned probability distributions to the bias parameters that allow quantification of the bias, drew a value at random from each assigned distribution, and calculated the estimate of effect adjusted for the biases. By repeating the draw and adjustment process over multiple iterations, I generated a frequency distribution of adjusted results, from which I obtained a point estimate and simulation interval. These methods were applied without access to the primary record-level dataset.
The conventional estimates of effect associating alachlor and glyphosate exposure with cancer incidence were likely biased away from the null and understated the uncertainty by quantifying only random error. For example, the conventional p-value for a test of trend in the alachlor study equaled 0.02, whereas fewer than 20% of the bias analysis iterations yielded a p-value of 0.02 or lower. Similarly, the conventional fully-adjusted result associating glyphosate exposure with multiple myleoma equaled 2.6 with 95% confidence interval of 0.7 to 9.4. The frequency distribution generated by the bias analysis yielded a median hazard ratio equal to 1.5 with 95% simulation interval of 0.4 to 8.9, which was 66% wider than the conventional interval.
Bias analysis provides a more complete picture of true uncertainty than conventional frequentist statistical analysis accompanied by a qualitative description of study limitations. The latter approach is likely to lead to overconfidence regarding the potential for causal associations, whereas the former safeguards against such overinterpretations. Furthermore, such analyses, once programmed, allow rapid implementation of alternative assignments of probability distributions to the bias parameters, so elevate the plane of discussion regarding study bias from characterizing studies as "valid" or "invalid" to a critical and quantitative discussion of sources of uncertainty.
To estimate an overall treatment difference with data from a randomized comparative clinical study, baseline covariates are often utilized to increase the estimation precision. Using the standard analysis of covariance technique for making inferences about such an average treatment difference may not be appropriate, especially when the fitted model is nonlinear. On the other hand, the novel augmentation procedure recently studied, for example, by Zhang and others (2008. Improving efficiency of inferences in randomized clinical trials using auxiliary covariates. Biometrics
64, 707–715) is quite flexible. However, in general, it is not clear how to select covariates for augmentation effectively. An overly adjusted estimator may inflate the variance and in some cases be biased. Furthermore, the results from the standard inference procedure by ignoring the sampling variation from the variable selection process may not be valid. In this paper, we first propose an estimation procedure, which augments the simple treatment contrast estimator directly with covariates. The new proposal is asymptotically equivalent to the aforementioned augmentation method. To select covariates, we utilize the standard lasso procedure. Furthermore, to make valid inference from the resulting lasso-type estimator, a cross validation method is used. The validity of the new proposal is justified theoretically and empirically. We illustrate the procedure extensively with a well-known primary biliary cirrhosis clinical trial data set.
ANCOVA; Cross validation; Efficiency augmentation; Mayo PBC data; Semi-parametric efficiency
A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP.
We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score).
The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial.
The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint.
Statistical analysis plan; Sensitivity analysis; Longitudinal count data; Bayesian generalized linear mixed models; INLA; Predictive performance; Bayesian model evaluation; Informed model choice
We consider nonparametric regression of a scalar outcome on a covariate when the outcome is missing at random (MAR) given the covariate and other observed auxiliary variables. We propose a class of augmented inverse probability weighted (AIPW) kernel estimating equations for nonparametric regression under MAR. We show that AIPW kernel estimators are consistent when the probability that the outcome is observed, that is, the selection probability, is either known by design or estimated under a correctly specified model. In addition, we show that a specific AIPW kernel estimator in our class that employs the fitted values from a model for the conditional mean of the outcome given covariates and auxiliaries is double-robust, that is, it remains consistent if this model is correctly specified even if the selection probabilities are modeled or specified incorrectly. Furthermore, when both models happen to be right, this double-robust estimator attains the smallest possible asymptotic variance of all AIPW kernel estimators and maximally extracts the information in the auxiliary variables. We also describe a simple correction to the AIPW kernel estimating equations that while preserving double-robustness it ensures efficiency improvement over nonaugmented IPW estimation when the selection model is correctly specified regardless of the validity of the second model used in the augmentation term. We perform simulations to evaluate the finite sample performance of the proposed estimators, and apply the methods to the analysis of the AIDS Costs and Services Utilization Survey data. Technical proofs are available online.
Asymptotics; Augmented kernel estimating equations; Double robustness; Efficiency; Inverse probability weighted kernel estimating equations; Kernel smoothing