The number needed to treat (NNT) is a popular effect measure to present study results in biomedical research. NNTs were originally proposed to describe the absolute effect of a new treatment compared with a standard treatment or placebo in randomized controlled trials (RCTs) with binary outcome. The concept of the NNT measure has been applied to a number of other research areas involving the development of related measures and more sophisticated techniques to calculate and interpret NNT measures in biomedical research. In epidemiology and public health research an adequate adjustment for covariates is usually required leading to the application of adjusted NNT measures. An overview of the recent developments regarding adjustment of NNT measures is given. The use and interpretation of adjusted NNT measures is illustrated by means of examples from dentistry research.
Number needed to treat; evidence-based medicine; confounding; adjustment for covariates; regression analysis.
We examined the effect of hospital type and medical coverage on the risk of 1-year mortality of very low birth weight (VLBW) infants while adjusting for possible selection bias.
The study population was limited to singleton live birth infants having birth weight between 500 and 1,500 grams with no congenital anomalies who were born in Arkansas hospitals between 2001 and 2007. Propensity score (PS) matching and PS covariate adjustment were used to mitigate selection bias. Additionally, a conventional multivariable logistic regression model was used for comparison purposes.
Generally, all three analytical approaches provided consistent results in terms of the estimated relative risk, absolute risk reduction, and the number needed to treat (NNT). Using the PS matching method, VLBW infants delivered at a hospital with a neonatal intensive care unit (NICU) were associated with a 35% relative decrease (95% bootstrap CI: 18.5% – 48.9%) in the risk of 1-year mortality as compared to those infants delivered at non-NICU hospitals. Furthermore, our results showed that on average, 16 VLBW infants (95% bootstrap CI: 11 – 32), would need to be delivered at a hospital with an NICU to prevent one additional death at one year. However, there was not a difference in the risk of 1-year mortality between VLBW infants born to Medicaid-insured versus non-Medicaid-insured women.
Estimated relative risk of infant mortality was significantly lower for births that occurred in hospitals with an NICU; therefore, greater efforts should be made to deliver VLBW neonates in an NICU hospital.
Propensity score methods are increasingly being used to estimate the effects of treatments on health outcomes using observational data. There are four methods for using the propensity score to estimate treatment effects: covariate adjustment using the propensity score, stratification on the propensity score, propensity-score matching, and inverse probability of treatment weighting (IPTW) using the propensity score. When outcomes are binary, the effect of treatment on the outcome can be described using odds ratios, relative risks, risk differences, or the number needed to treat. Several clinical commentators suggested that risk differences and numbers needed to treat are more meaningful for clinical decision making than are odds ratios or relative risks. However, there is a paucity of information about the relative performance of the different propensity-score methods for estimating risk differences. We conducted a series of Monte Carlo simulations to examine this issue. We examined bias, variance estimation, coverage of confidence intervals, mean-squared error (MSE), and type I error rates. A doubly robust version of IPTW had superior performance compared with the other propensity-score methods. It resulted in unbiased estimation of risk differences, treatment effects with the lowest standard errors, confidence intervals with the correct coverage rates, and correct type I error rates. Stratification, matching on the propensity score, and covariate adjustment using the propensity score resulted in minor to modest bias in estimating risk differences. Estimators based on IPTW had lower MSE compared with other propensity-score methods. Differences between IPTW and propensity-score matching may reflect that these two methods estimate the average treatment effect and the average treatment effect for the treated, respectively. Copyright © 2010 John Wiley & Sons, Ltd.
propensity score; observational study; binary data; risk difference; number needed to treat; matching; IPTW; inverse probability of treatment weighting; propensity-score matching
The number needed to treat (NNT) is a well-known effect measure for reporting the results of clinical trials. In the case of time-to-event outcomes, the calculation of NNTs is more difficult than in the case of binary data. The frequency of using NNTs to report results of randomised controlled trials (RCT) investigating time-to-event outcomes and the adequacy of the applied calculation methods are unknown.
We searched in PubMed for RCTs with parallel group design and individual randomisation, published in four frequently cited journals between 2003 and 2005. We evaluated the type of outcome, the frequency of reporting NNTs with corresponding confidence intervals, and assessed the adequacy of the methods used to calculate NNTs in the case of time-to-event outcomes.
The search resulted in 734 eligible RCTs. Of these, 373 RCTs investigated time-to-event outcomes and 361 analyzed binary data. In total, 62 articles reported NNTs (34 articles with time-to-event outcomes, 28 articles with binary outcomes). Of the 34 articles reporting NNTs derived from time-to-event outcomes, only 17 applied an appropriate calculation method. Of the 62 articles reporting NNTs, only 21 articles presented corresponding confidence intervals.
The NNT is used as effect measure to present the results from RCTs with binary and time-to-event outcomes in the current medical literature. In the case of time-to-event data incorrect methods were frequently applied. Confidence intervals for NNTs were given in one third of the NNT reporting articles only. In summary, there is much room for improvement in the application of NNTs to present results of RCTs, especially where the outcome is time to an event.
Estimates of additive interaction from case-control data are often obtained by logistic regression; such models can also be used to adjust for covariates. This approach to estimating additive interaction has come under some criticism because of possible misspecification of the logistic model: If the underlying model is linear, the logistic model will be misspecified. The authors propose an inverse probability of treatment weighting approach to causal effects and additive interaction in case-control studies. Under the assumption of no unmeasured confounding, the approach amounts to fitting a marginal structural linear odds model. The approach allows for the estimation of measures of additive interaction between dichotomous exposures, such as the relative excess risk due to interaction, using case-control data without having to rely on modeling assumptions for the outcome conditional on the exposures and covariates. Rather than using conditional models for the outcome, models are instead specified for the exposures conditional on the covariates. The approach is illustrated by assessing additive interaction between genetic and environmental factors using data from a case-control study.
case-control studies; interaction; linear model; structural model; synergism; weighting
In a study comparing the effects of two treatments, the propensity score is the probability of assignment to one treatment conditional on a subject's measured baseline covariates. Propensity-score matching is increasingly being used to estimate the effects of exposures using observational data. In the most common implementation of propensity-score matching, pairs of treated and untreated subjects are formed whose propensity scores differ by at most a pre-specified amount (the caliper width). There has been a little research into the optimal caliper width. We conducted an extensive series of Monte Carlo simulations to determine the optimal caliper width for estimating differences in means (for continuous outcomes) and risk differences (for binary outcomes). When estimating differences in means or risk differences, we recommend that researchers match on the logit of the propensity score using calipers of width equal to 0.2 of the standard deviation of the logit of the propensity score. When at least some of the covariates were continuous, then either this value, or one close to it, minimized the mean square error of the resultant estimated treatment effect. It also eliminated at least 98% of the bias in the crude estimator, and it resulted in confidence intervals with approximately the correct coverage rates. Furthermore, the empirical type I error rate was approximately correct. When all of the covariates were binary, then the choice of caliper width had a much smaller impact on the performance of estimation of risk differences and differences in means. Copyright © 2010 John Wiley & Sons, Ltd.
propensity score; observational study; binary data; risk difference; propensity-score matching; Monte Carlo simulations; bias; matching
Regression adjustment for the propensity score is a statistical method that reduces confounding from measured variables in observational data. A Bayesian propensity score analysis extends this idea by using simultaneous estimation of the propensity scores and the treatment effect. In this article, we conduct an empirical investigation of the performance of Bayesian propensity scores in the context of an observational study of the effectiveness of beta-blocker therapy in heart failure patients. We study the balancing properties of the estimated propensity scores. Traditional Frequentist propensity scores focus attention on balancing covariates that are strongly associated with treatment. In contrast, we demonstrate that Bayesian propensity scores can be used to balance the association between covariates and the outcome. This balancing property has the effect of reducing confounding bias because it reduces the degree to which covariates are outcome risk factors.
Covariate adjustment using linear models for continuous outcomes in randomized trials has been shown to increase efficiency and power over the unadjusted method in estimating the marginal effect of treatment. However, for binary outcomes, investigators generally rely on the unadjusted estimate as the literature indicates that covariate-adjusted estimates based on the logistic regression models are less efficient. The crucial step that has been missing when adjusting for covariates is that one must integrate/average the adjusted estimate over those covariates in order to obtain the marginal effect. We apply the method of targeted maximum likelihood estimation (tMLE) to obtain estimators for the marginal effect using covariate adjustment for binary outcomes. We show that the covariate adjustment in randomized trials using the logistic regression models can be mapped, by averaging over the covariate(s), to obtain a fully robust and efficient estimator of the marginal effect, which equals a targeted maximum likelihood estimator. This tMLE is obtained by simply adding a clever covariate to a fixed initial regression. We present simulation studies that demonstrate that this tMLE increases efficiency and power over the unadjusted method, particularly for smaller sample sizes, even when the regression model is mis-specified.
clinical trails; efficiency; covariate adjustment; variable selection
Background and Purpose
To make informed treatment decisions, patients and physicians need to be aware of the benefits and risks of a proposed treatment. The number needed to treat (NNT) for benefit and harm are intuitive and statistically valid measures to describe a treatment effect. The aim of this study is to calculate treatment time-specific NNT estimates based on shifts over the entire spectrum of clinically relevant functional outcomes.
The pooled data set of the first 6 major randomized acute stroke trials of intravenous tissue plasminogen activator was used for this study. The data were stratified by 90-minute treatment time windows. NNT for benefit and NNT for harm estimates were determined based on expert generation of joint outcome distribution tables. NNT for benefit estimates were also calculated based on joint outcome distribution tables generated by a computer model.
NNT for benefit estimates based on the expert panel were 3.6 for patients treated between 0 and 90 minutes, 4.3 with treatment between 91 and 180 minutes, 5.9 with treatment between 181 and 270 minutes, and 19.3 with treatment between 271 and 360 minutes. The computer simulation yielded very similar results. The NNT for harm estimates for the corresponding time intervals are 65, 38, 30, and 14.
Up to 4½ hours after symptom onset, tissue plasminogen activator therapy is associated with more benefit than harm, whereas there is no evidence of a net benefit in the 4½- to 6-hour time window. The NNT estimates for each 90-minute epoch provide useful and intuitive information based on which patients may be able to make better informed treatment decisions.
biostatistics; number needed to treat; stroke; thrombolysis
Methods for estimating average treatment effects, under the assumption of no unmeasured confounders, include regression models; propensity score adjustments using stratification, weighting, or matching; and doubly robust estimators (a combination of both). Researchers continue to debate about the best estimator for outcomes such as health care cost data, as they are usually characterized by an asymmetric distribution and heterogeneous treatment effects,. Challenges in finding the right specifications for regression models are well documented in the literature. Propensity score estimators are proposed as alternatives to overcoming these challenges. Using simulations, we find that in moderate size samples (n= 5000), balancing on propensity scores that are estimated from saturated specifications can balance the covariate means across treatment arms but fails to balance higher-order moments and covariances amongst covariates. Therefore, unlike regression model, even if a formal model for outcomes is not required, propensity score estimators can be inefficient at best and biased at worst for health care cost data. Our simulation study, designed to take a ‘proof by contradiction’ approach, proves that no one estimator can be considered the best under all data generating processes for outcomes such as costs. The inverse-propensity weighted estimator is most likely to be unbiased under alternate data generating processes but is prone to bias under misspecification of the propensity score model and is inefficient compared to an unbiased regression estimator. Our results show that there are no ‘magic bullets’ when it comes to estimating treatment effects in health care costs. Care should be taken before naively applying any one estimator to estimate average treatment effects in these data. We illustrate the performance of alternative methods in a cost dataset on breast cancer treatment.
Propensity score; non-linear regression; average treatment effect; health care costs
Propensity-score matching allows one to reduce the effects of treatment-selection bias or confounding when estimating the effects of treatments when using observational data. Some authors have suggested that methods of inference appropriate for independent samples can be used for assessing the statistical significance of treatment effects when using propensity-score matching. Indeed, many authors in the applied medical literature use methods for independent samples when making inferences about treatment effects using propensity-score matched samples. Dichotomous outcomes are common in healthcare research. In this study, we used Monte Carlo simulations to examine the effect on inferences about risk differences (or absolute risk reductions) when statistical methods for independent samples are used compared with when statistical methods for paired samples are used in propensity-score matched samples. We found that compared with using methods for independent samples, the use of methods for paired samples resulted in: (i) empirical type I error rates that were closer to the advertised rate; (ii) empirical coverage rates of 95 per cent confidence intervals that were closer to the advertised rate; (iii) narrower 95 per cent confidence intervals; and (iv) estimated standard errors that more closely reflected the sampling variability of the estimated risk difference. Differences between the empirical and advertised performance of methods for independent samples were greater when the treatment-selection process was stronger compared with when treatment-selection process was weaker. We recommend using statistical methods for paired samples when using propensity-score matched samples for making inferences on the effect of treatment on the reduction in the probability of an event occurring. Copyright © 2011 John Wiley & Sons, Ltd.
propensity score; propensity-score matching; risk difference; absolute risk reduction; Monte Carlo simulations; statistical inference; hypothesis testing; type I error rate; categorical data analysis
In the literature we find many indices of size of treatment effect (effect size: ES). The preferred index of treatment effect in evidence-based medicine is the number needed to treat (NNT), while the most common one in the medical literature is Cohen's d when the outcome is continuous. There is confusion about how to convert Cohen's d into NNT.
We conducted meta-analyses of individual patient data from 10 randomized controlled trials of second generation antipsychotics for schizophrenia (n = 4278) to produce Cohen's d and NNTs for various definitions of response, using cutoffs of 10% through 90% reduction on the symptom severity scale. These actual NNTs were compared with NNTs calculated from Cohen's d according to two proposed methods in the literature (Kraemer, et al., Biological Psychiatry, 2006; Furukawa, Lancet, 1999).
NNTs from Kraemer's method overlapped with the actual NNTs in 56%, while those based on Furukawa's method fell within the observed ranges of NNTs in 97% of the examined instances. For various definitions of response corresponding with 10% through 70% symptom reduction where we observed a non-small number of responders, the degree of agreement for the former method was at a chance level (ANOVA ICC of 0.12, p = 0.22) but that for the latter method was ANOVA ICC of 0.86 (95%CI: 0.55 to 0.95, p<0.01).
Furukawa's method allows more accurate prediction of NNTs from Cohen's d. Kraemer's method gives a wrong impression that NNT is constant for a given d even when the event rate differs.
Lack of randomization of nursing intervention in outcome effectiveness studies may lead to imbalanced covariates. Consequently, estimation of nursing intervention effect can be biased as in other observational studies. Propensity score analysis is an effective statistical method to reduce such bias and further derive causal effects in observational studies.
To illustrate the use of propensity score analysis in quantitative nursing research through an example of pain management effect on length of hospital stay.
Propensity scores are generated through a regression model treating the nursing intervention as the dependent variable and all confounding covariates as predictor variables. Then propensity scores are used to adjust for this nonrandomized assignment of nursing intervention through three approaches: regression covariance adjustment, stratification, and matching in the predictive outcome model for nursing intervention.
Propensity score analysis reduces the confounding covariates into a single variable of propensity score. After stratification and matching on propensity scores, observed covariates between nursing intervention groups are more balanced within each stratum or in the matched samples. The likelihood of receiving pain management is accounted for in the outcome model through the propensity scores. Both regression covariance adjustment and matching methods report a significant pain management effect on length of hospital stay in this example. The pain management effect can be regarded as causal when the strongly ignorable treatment assignment assumption holds.
Propensity score analysis provides an alternative statistical approach to the classical multivariate regression, stratification and matching techniques for examining the effects of nursing intervention with a large number of confounding covariates in the background. It can be used to derive causal effects of nursing intervention in observational studies under certain circumstances.
matching; nursing effectiveness research; nursing interventions; propensity score
There is considerable debate regarding whether and how covariate adjusted analyses should be used in the comparison of treatments in randomized clinical trials. Substantial baseline covariate information is routinely collected in such trials, and one goal of adjustment is to exploit covariates associated with outcome to increase precision of estimation of the treatment effect. However, concerns are routinely raised over the potential for bias when the covariates used are selected post hoc; and the potential for adjustment based on a model of the relationship between outcome, covariates, and treatment to invite a “fishing expedition” for that leading to the most dramatic effect estimate. By appealing to the theory of semiparametrics, we are led naturally to a characterization of all treatment effect estimators and to principled, practically-feasible methods for covariate adjustment that yield the desired gains in efficiency and that allow covariate relationships to be identified and exploited while circumventing the usual concerns. The methods and strategies for their implementation in practice are presented. Simulation studies and an application to data from an HIV clinical trial demonstrate the performance of the techniques relative to existing methods.
baseline variables; clinical trials; covariate adjustment; efficiency; semiparametric theory; variable selection
The quality of propensity scores is traditionally measured by assessing how well they make the distributions of covariates in the treatment and control groups match, which we refer to as “good balance”. Good balance guarantees less biased estimates of the treatment effect. However, the cost of achieving good balance is that the variance of the estimates increases due to a reduction in effective sample size, either through the introduction of propensity score weights or dropping cases when propensity score matching. In this paper, we investigate whether it is best to optimize the balance or to settle for a less than optimal balance and use double robust estimation to adjust for remaining differences. We compare treatment effect estimates from regression, propensity score weighting, and double robust estimation with varying levels of effort expended to achieve balance using data from a study about the differences in outcomes by HIV status in heterosexually active homeless men residing in Los Angeles. Because of how costly data collection efforts are for this population, it is important to find an alternative estimation method that does not reduce effective sample size as much as methods that aggressively aim to optimize balance. Results from a simulation study suggest that there are instances in which we can obtain more precise treatment effect estimates without increasing bias too much by using a combination of regression and propensity score weights that achieve a less than optimal balance. There is a bias-variance tradeoff at work in propensity score estimation; every step toward better balance usually means an increase in variance and at some point a marginal decrease in bias may not be worth the associated increase in variance.
Propensity score; Double robust estimation; HIV status; Homeless men
An instrumental variable (IV) is an unconfounded proxy for a study exposure that can be used to estimate a causal effect in the presence of unmeasured confounding. To provide reliably consistent estimates of effect, IVs should be both valid and reasonably strong. Physician prescribing preference (PPP) is an IV that uses variation in doctors' prescribing to predict drug treatment. As reduction in covariate imbalance may suggest increased IV validity, we sought to examine the covariate balance and instrument strength in 25 formulations of the PPP IV in two cohort studies.
Study Design and Setting
We applied the PPP IV to assess antipsychotic medication (APM) use and subsequent death among two cohorts of elderly patients. We varied the measurement of PPP, plus performed cohort restriction and stratification. We modeled risk differences with two-stage least square regression. First-stage partial r2 values characterized the strength of the instrument. The Mahalanobis distance summarized balance across multiple covariates.
Partial r2 ranged from 0.028 to 0.099. PPP generally alleviated imbalances in nonpsychiatry-related patient characteristics, and the overall imbalance was reduced by an average of 36% (±40%) over the two cohorts.
In our study setting, most of the 25 formulations of the PPP IV were strong IVs and resulted in a strong reduction of imbalance in many variations. The association between strength and imbalance was mixed.
Pharmacoepidemiology; Antipsychotic agents; Instrumental variable; Mahalanobis distance; Partial R-squared; Confounding factor (epidemiology); Physician prescribing preference
Commentators have suggested that patients may understand quantitative information about treatment benefits better when they are presented as numbers needed to treat (NNT) rather than as absolute or relative risk reductions.
To determine whether NNT helps patients interpret treatment benefits better than absolute risk reduction (ARR), relative risk reduction (RRR), or a combination of all three of these risk reduction presentations (COMBO).
Randomized cross-sectional survey.
University internal medicine clinic.
Three hundred fifty-seven men and women, ages 50 to 80, who presented for health care.
Subjects were given written information about the baseline risk of a hypothetical “disease Y” and were asked (1) to compare the benefits of two drug treatments for disease Y, stating which provided more benefit; and (2) to calculate the effect of one of those drug treatments on a given baseline risk of disease. Risk information was presented to each subject in one of four randomly allocated risk formats: NNT, ARR, RRR, or COMBO.
When asked to state which of two treatments provided more benefit, subjects who received the RRR format responded correctly most often (60% correct vs 43% for COMBO, 42% for ARR, and 30% for NNT, P = .001). Most subjects were unable to calculate the effect of drug treatment on the given baseline risk of disease, although subjects receiving the RRR and ARR formats responded correctly more often (21% and 17% compared to 7% for COMBO and 6% for NNT, P = .004).
Patients are best able to interpret the benefits of treatment when they are presented in an RRR format with a given baseline risk of disease. ARR also is easily interpreted. NNT is often misinterpreted by patients and should not be used alone to communicate risk to patients.
data interpretation (statistical); decision making; numeracy; patient participation (statistics and numerical data)
We propose a structural mean modeling approach to obtain compliance-adjusted estimates for treatment effects in a randomized-controlled trial comparing 2 active treatments. The model relates an individual's observed outcome to his or her counterfactual untreated outcome through the observed receipt of active treatments. Our proposed estimation procedure exploits baseline covariates that predict compliance levels on each arm. We give a closed-form estimator which allows for differential and unexplained selectivity (i.e. noncausal compliance-outcome association due to unobserved confounding) as well as a nonparametric error distribution. In a simple linear model for a 2-arm trial, we show that the distinct causal parameters are identified unless covariate-specific expected compliance levels are proportional on both treatment arms. In the latter case, only a linear contrast between the 2 treatment effects is estimable and may well be of key interest. We demonstrate the method in a clinical trial comparing 2 antidepressants.
Causal inference; Randomized-controlled trials; Structural mean models
Propensity score methods are increasingly being used to reduce or minimize the effects of confounding when estimating the effects of treatments, exposures, or interventions when using observational or non-randomized data. Under the assumption of no unmeasured confounders, previous research has shown that propensity score methods allow for unbiased estimation of linear treatment effects (e.g., differences in means or proportions). However, in biomedical research, time-to-event outcomes occur frequently. There is a paucity of research into the performance of different propensity score methods for estimating the effect of treatment on time-to-event outcomes. Furthermore, propensity score methods allow for the estimation of marginal or population-average treatment effects. We conducted an extensive series of Monte Carlo simulations to examine the performance of propensity score matching (1:1 greedy nearest-neighbor matching within propensity score calipers), stratification on the propensity score, inverse probability of treatment weighting (IPTW) using the propensity score, and covariate adjustment using the propensity score to estimate marginal hazard ratios. We found that both propensity score matching and IPTW using the propensity score allow for the estimation of marginal hazard ratios with minimal bias. Of these two approaches, IPTW using the propensity score resulted in estimates with lower mean squared error when estimating the effect of treatment in the treated. Stratification on the propensity score and covariate adjustment using the propensity score result in biased estimation of both marginal and conditional hazard ratios. Applied researchers are encouraged to use propensity score matching and IPTW using the propensity score when estimating the relative effect of treatment on time-to-event outcomes. Copyright © 2012 John Wiley & Sons, Ltd.
propensity score; survival analysis; inverse probability of treatment weighting (IPTW); Monte Carlo simulations; observational study; time-to-event outcomes
To determine whether the way in which information on benefits and harms of long-term hormone replacement therapy (HRT) is presented influences family physicians' intentions to prescribe this treatment.
Family physicians were randomized to receive information on treatment outcomes expressed in relative terms, or as the number needing to be treated (NNT) with HRT to prevent or cause an event. A control group received no information.
Family physicians practicing in the Hunter Valley, New South Wales, Australia.
Estimates of the impact of long-term HRT on risk of coronary events, hip fractures, and breast cancer were summarized as relative (proportional) decreases or increases in risk, or as NNT.
MEASUREMENTS AND MAIN RESULTS
Intention to prescribe HRT for seven hypothetical patients was measured on Likert scales. Of 389 family physicians working in the Hunter Valley, 243 completed the baseline survey and 215 participated in the randomized trial. Baseline intention to prescribe varied across patients—it was highest in the presence of risk factors for hip fracture, but coexisting risk factors for breast cancer had a strong negative influence. Overall, a larger proportion of subjects receiving information expressed as NNT had reduced intentions, and a smaller proportion had increased intentions to prescribe HRT than those receiving the information expressed in relative terms, or the control group. However, the differences were small and only reached statistical significance for three hypothetical patients. Framing effects were minimal when the hypothetical patient had coexisting risk factors for breast cancer.
Information framing had some effect on family physicians' intentions to prescribe HRT, but the effects were smaller than those previously reported, and they were modified by the presence of serious potential adverse treatment effects.
information framing; medical decision making; relative risk; absolute risk; randomized controlled trial
Both propensity score (PS) matching and inverse probability of treatment weighting (IPTW) allow causal contrasts, albeit different ones. In the presence of effect-measure modification, different analytic approaches produce different summary estimates.
We present a spreadsheet example that assumes a dichotomous exposure, covariate, and outcome. The covariate can be a confounder or not and a modifier of the relative risk (RR) or not. Based on expected cell counts, we calculate RR estimates using five summary estimators: Mantel-Haenszel (MH), maximum likelihood (ML), the standardized mortality ratio (SMR), PS matching, and a common implementation of IPTW.
Without effect-measure modification, all approaches produce identical results. In the presence of effect-measure modification and regardless of the presence of confounding, results from the SMR and PS are identical, but IPTW can produce strikingly different results (e.g. RR=0.83 vs. RR=1.50). In such settings, MH and ML do not estimate a population parameter and results for those measures fall between PS and IPTW.
Discrepancies between PS and IPTW reflect different weighting of stratum specific effect estimates. SMR and PS matching assign weight according to the distribution of the effect-measure modifier in the exposed subpopulation, whereas IPTW assigns weights according to the distribution of the entire study population. In pharmacoepidemiology, contraindications to treatment that also modify the effect might be prevalent in the population, but would be rare among the exposed. In such settings, estimating the effect of exposure in the exposed rather than the whole population is preferable.
epidemiologic methods; confounding factors (epidemiology); bias (epidemiology); effect measure modification; interaction; propensity score; inverse probability of treatment weighting; standardized mortality ratio; Mantel-Haenszel; maximum likelihood
There is debate concerning methods for calculating numbers needed to treat (NNT) from results of systematic reviews.
We investigate the susceptibility to bias for alternative methods for calculating NNTs through illustrative examples and mathematical theory.
Two competing methods have been recommended: one method involves calculating the NNT from meta-analytical estimates, the other by treating the data as if it all arose from a single trial. The 'treat-as-one-trial' method was found to be susceptible to bias when there were imbalances between groups within one or more trials in the meta-analysis (Simpson's paradox). Calculation of NNTs from meta-analytical estimates is not prone to the same bias. The method of calculating the NNT from a meta-analysis depends on the treatment effect used. When relative measures of treatment effect are used the estimates of NNTs can be tailored to the level of baseline risk.
The treat-as-one-trial method of calculating numbers needed to treat should not be used as it is prone to bias. Analysts should always report the method they use to compute estimates to enable readers to judge whether it is appropriate.
To evaluate the comparative effectiveness of antiviral drugs in adults with chronic hepatitis B monoinfection for evidence-based decision-making.
A systematic review of randomized controlled clinical trials (RCTs) published in English. Results after interferon and nucleos(t)ides analog therapies were synthesized with random-effects meta-analyses and number needed to treat (NNT).
Despite sustained improvements in selected biomarkers, no one drug regimen improved all intermediate outcomes. In 16 underpowered RCTs, drug treatments did not reduce mortality, liver cancer, or cirrhosis. Sustained HBV DNA clearance was achieved in one patient when two were treated with adefovir (NNT from 1 RCT = 2 95%CI 1;2) or interferon alfa-2b (NNT from 2 RCTs = 2 95%CI 2;4), 13 with lamivudine (NNT from 1 RCT = 13 95%CI 7;1000), and 11 with peginterferon alfa-2a vs. lamivudine (NNT from 1 RCT = 11 95%CI 7;25). Sustained HBeAg seroconversion was achieved in one patient when eight were treated with interferon alfa-2b (NNT from 2 RCTs = 8 95%CI 5;33) or 10—with peginterferon alfa-2b vs. interferon alfa-2b (NNT from 1 RCT = 10 95%CI 5;1000). Greater benefits and safety after entecavir vs. lamivudine or pegylated interferon alfa-2b vs. interferon alfa-2b require future investigation of clinical outcomes. Adverse events were common and more frequent after interferon. Treatment utilization for adverse effects is unknown.
Individual clinical decisions should rely on comparative effectiveness and absolute rates of intermediate outcomes and adverse events. Future research should clarify the relationship of intermediate and clinical outcomes and cost-effectiveness of drugs for evidence-based policy and clinical decisions.
Electronic supplementary material
The online version of this article (doi:10.1007/s11606-010-1569-5) contains supplementary material, which is available to authorized users.
antiviral agents/adverse effects; antiviral agents/therapeutic use; hepatitis B/therapy; treatment outcome; cost-benefit analysis; decision trees
The primary goal of a randomized clinical trial is to make comparisons among two or more treatments. For example, in a two-arm trial with continuous response, the focus may be on the difference in treatment means; with more than two treatments, the comparison may be based on pairwise differences. With binary outcomes, pairwise odds-ratios or log-odds ratios may be used. In general, comparisons may be based on meaningful parameters in a relevant statistical model. Standard analyses for estimation and testing in this context typically are based on the data collected on response and treatment assignment only. In many trials, auxiliary baseline covariate information may also be available, and it is of interest to exploit these data to improve the efficiency of inferences. Taking a semiparametric theory perspective, we propose a broadly-applicable approach to adjustment for auxiliary covariates to achieve more efficient estimators and tests for treatment parameters in the analysis of randomized clinical trials. Simulations and applications demonstrate the performance of the methods.
Covariate adjustment; Hypothesis test; k-arm trial; Kruskal-Wallis test; Log-odds ratio; Longitudinal data; Semiparametric theory
Association studies of risk factors and complex diseases require careful assessment of potential confounding factors. Two-stage regression analysis, sometimes referred to as residual- or adjusted-outcome analysis, has been increasingly used in association studies of single nucleotide polymorphisms (SNPs) and quantitative traits. In this analysis, first, a residual-outcome is calculated from a regression of the outcome variable on covariates and then the relationship between the adjusted-outcome and the SNP is evaluated by a simple linear regression of the adjusted-outcome on the SNP. In this paper, we examine the performance of this 2-stage analysis as compared with multiple linear regression (MLR) analysis. Our findings show that when a SNP and a covariate are correlated, the 2-stage approach results in biased genotypic effect and loss of power. Bias is always toward the null and increases with the squared-correlation between the SNP and the covariate (ρSC2). For example, for ρSC2=0.0, 0.1 and 0.5, 2-stage analysis results in, respectively, 0%, 10% and 50% attenuation in the SNP effect. As expected, MLR was always unbiased. Since individual SNPs often show little or no correlation with covariates, a 2-stage analysis is expected to perform as well as MLR in many genetic studies; however, it produces considerably different results from MLR and may lead to incorrect conclusions when independent variables are highly correlated. While a useful alternative to MLR under ρSC2=0.0, the 2-stage approach has serious limitations. Its use as a simple substitute for MLR should be avoided.
confounding; conditional analysis; covariate; 2-stage regression; adjusted-outcome; adjusted-genotype