Observational (nonexperimental) studies of the association of infant feeding and subsequent child or adult behavior are prone to residual confounding by subtle differences in psychological attributes and interactional styles of mothers who breastfeed vs those who formula-feed. We followed up 13,889 6.5-year-old Belarusian children who participated in a large cluster-randomized trial of a breastfeeding promotion intervention. Behavior was evaluated using the Strengths and Difficulties Questionnaire (SDQ), completed independently by the children’s parents and teachers. We compared the results of experimental (intention-to-treat, ITT) and observational analyses (based on feeding actually received), both adjusted for clustering. Observational analyses were additionally adjusted for geographic region and urban vs rural residence; child sex, age at follow-up, and birth weight; and maternal and paternal education. No differences between the randomized experimental vs control groups were observed in ITT analyses. In contrast, small but statistically significant associations with weaning prior to 3 months were observed for parent and teacher SDQ scores on total difficulties, conduct problems, and hyperactivity, even after multivariate adjustment. The absence of associations based on ITT analyses, in contrast with the significant associations based on observed BF duration, strongly suggests that the latter are biased by residual confounding.
Motivated by a previously published study of HIV treatment, we simulated data subject to time-varying confounding affected by prior treatment to examine some finite-sample properties of marginal structural Cox proportional hazards models. We compared (a) unadjusted, (b) regression-adjusted, (c) unstabilized and (d) stabilized marginal structural (inverse probability-of-treatment [IPT] weighted) model estimators of effect in terms of bias, standard error, root mean squared error (MSE) and 95% confidence limit coverage over a range of research scenarios, including relatively small sample sizes and ten study assessments. In the base-case scenario resembling the motivating example, where the true hazard ratio was 0.5, both IPT-weighted analyses were unbiased while crude and adjusted analyses showed substantial bias towards and across the null. Stabilized IPT-weighted analyses remained unbiased across a range of scenarios, including relatively small sample size; however, the standard error was generally smaller in crude and adjusted models. In many cases, unstabilized weighted analysis showed a substantial increase in standard error compared to other approaches. Root MSE was smallest in the IPT-weighted analyses for the base-case scenario. In situations where time-varying confounding affected by prior treatment was absent, IPT-weighted analyses were less precise and therefore had greater root MSE compared with adjusted analyses. The 95% confidence limit coverage was close to nominal for all stabilized IPT-weighted but poor in crude, adjusted, and unstabilized IPT-weighted analysis. Under realistic scenarios, marginal structural Cox proportional hazards models performed according to expectations based on large-sample theory and provided accurate estimates of the hazard ratio.
Bias; Causal inference; Marginal structural models; Monte Carlo study
To synthesise estimates of the prevalence of cessation attempts among adolescent smokers generally, and according to age and level of cigarette consumption.
PubMed, ERIC, and PsychInfo databases and Internet searches of central data collection agencies.
National population‐based studies published in English between 1990 and 2005 reporting the prevalence, frequency and/or duration of cessation attempts among smokers aged ⩾10 to <20 years.
Five reviewers determined inclusion criteria for full‐text reports. One reviewer extracted data on the design, population characteristics and results from the reports.
In total, 52 studies conformed to the inclusion criteria. The marked heterogeneity that characterised the study populations and survey questions precluded a meta‐analysis. Among adolescent current smokers, the median 6‐month, 12‐month and lifetime cessation attempt prevalence was 58% (range: 22–73%), 68% (range 43–92%) and 71% (range 28–84%), respectively. More than half had made multiple attempts. Among smokers who had attempted cessation, the median prevalence of relapse was 34, 56, 89 and 92% within 1 week, 1 month, 6 months, and 1 year, respectively, following the longest attempt. Younger (age<16 years) and non‐daily smokers experienced a similar or higher prevalence of cessation attempts compared with older (age ⩾16 years) or daily smokers. Moreover, the prevalence of relapse by 6 months following the longest cessation attempt was similar across age and smoking frequency.
The high prevalence of cessation attempts and relapse among adolescent smokers extends to young adolescents and non‐daily smokers. Cessation surveillance, research and program development should be more inclusive of these subgroups.
Although administrative health care databases have long been used to evaluate adverse drug effects, responses to drug safety signals have been slow and uncoordinated. We describe the establishment of the Canadian Network for Observational Drug Effect Studies (CNODES), a collaborating centre of the Drug Safety and Effectiveness Network (DSEN). CNODES is a distributed network of investigators and linked databases in British Columbia, Alberta, Saskatchewan, Manitoba, Ontario, Quebec and Nova Scotia. Principles of operation are as follows: (1) research questions are prioritized by the coordinating office of DSEN; (2) the linked data stay within the provinces; (3) for each question, a study team formulates a detailed protocol enabling consistent analyses in each province; (4) analyses are “blind” to results obtained elsewhere; (5) protocol deviations are permitted for technical reasons only; (6) analyses using multivariable methods are lodged centrally with a methods team, which is responsible for combining the results to provide a summary estimate of effect. These procedures are designed to achieve high internal validity of risk estimates and to eliminate the possibility of selective reporting of analyses or outcomes. The value of a coordinated multi-provincial approach is illustrated by projects studying acute renal injury with high-potency statins, community-acquired pneumonia with proton pump inhibitors, and hyperglycemic emergencies with antipsychotic drugs. CNODES is an academically based distributed network of Canadian researchers and data centres with a commitment to rapid and sophisticated analysis of emerging drug safety signals in study populations totalling over 40 million.
According to the authors, time-modified confounding occurs when the causal relation between a time-fixed or time-varying confounder and the treatment or outcome changes over time. A key difference between previously described time-varying confounding and the proposed time-modified confounding is that, in the former, the values of the confounding variable change over time while, in the latter, the effects of the confounder change over time. Using marginal structural models, the authors propose an approach to account for time-modified confounding when the relation between the confounder and treatment is modified over time. An illustrative example and simulation show that, when time-modified confounding is present, a marginal structural model with inverse probability-of-treatment weights specified to account for time-modified confounding remains approximately unbiased with appropriate confidence limit coverage, while models that do not account for time-modified confounding are biased. Correct specification of the treatment model, including accounting for potential variation over time in confounding, is an important assumption of marginal structural models. When the effect of confounders on either the treatment or outcome changes over time, time-modified confounding should be considered.
bias (epidemiology); confounding factors (epidemiology); structural model
We present a model for longitudinal measures of fetal weight as a function of gestational age. We use a linear mixed model, with a Box-Cox transformation of fetal weight values, and restricted cubic splines, in order to flexibly but parsimoniously model median fetal weight. We systematically compare our model to other proposed approaches. All proposed methods are shown to yield similar median estimates, as evidenced by overlapping pointwise confidence bands, except after 40 completed weeks, where our method seems to produce estimates more consistent with observed data. Sex-based stratification affects the estimates of the random effects variance-covariance structure, without significantly changing sex-specific fitted median values. We illustrate the benefits of including sex-gestational age interaction terms in the model over stratification. The comparison leads to the conclusion that the selection of a model for fetal weight for gestational age can be based on the specific goals and configuration of a given study without affecting the precision or value of median estimates for most gestational ages of interest.
multi-level models; fetal growth; small for gestational age
The authors tested whether the relation between gestational weight gain (GWG) and 5 adverse pregnancy outcomes (small-for-gestational-age (SGA) birth, large-for-gestational-age (LGA) birth, spontaneous preterm birth, indicated preterm birth, and unplanned cesarean delivery) differed according to maternal race/ethnicity, smoking, parity, age, and/or height. They also evaluated whether GWG guidelines should be modified for special populations by studying GWG and risk of at least 1 adverse outcome within different subgroups. Data came from a cohort of 23,362 normal-weight mothers who delivered singletons at Magee-Womens Hospital in Pittsburgh, Pennsylvania (2003–2008). Adequacy of GWG was defined as observed GWG divided by recommended GWG. The synergy analysis found that the combination of smoking, black race/ethnicity, primiparity, or short height with poor GWG was associated with an excess risk of SGA birth, while high GWG combined with each of these characteristics diminished risk of LGA birth in comparison with the same GWG among the women's counterparts. Nevertheless, there were no significant or meaningful differences in the risk of at least 1 adverse outcome between the GWG recommended by the Institute of Medicine in 2009 and the GWG that minimized risk of the composite outcome. These findings do not support the tailoring of GWG guidelines on the basis of a mother's smoking status, race/ethnicity, parity, age, or height among normal-weight women.
ethnic groups; gestational age; parity; practice guidelines as topic; pregnancy; smoking; weight gain
Infants who receive prolonged and exclusive breastfeeding grow more slowly during the first year of life than those who do not. However, infant feeding and growth are dynamic processes in which feeding may affect growth, and prior growth and size may also influence subsequent feeding decisions. The authors carried out an observational analysis of 17,046 Belarusian infants who were recruited between June 1996 and December 1997 and who participated in a cluster-randomized trial of a breastfeeding promotion intervention. To assess the effects of infant size on subsequent feeding, the authors restricted the analysis to infants breastfed (or exclusively breastfed) at the beginning of each follow-up interval and examined associations between weight or length at the beginning of the interval and weaning or discontinuation of exclusive breastfeeding by the end of the interval. Smaller size (especially weight for age) was strongly and statistically significantly associated with increased risks of subsequent weaning and of discontinuing exclusive breastfeeding (adjusted odds ratios = 1.2–1.6), especially between 2 and 6 months, even after adjusment for potential confounding factors and clustered measurement. The authors speculate that similar dynamic processes involving infant crying, other signs of hunger, and supplementation/weaning undermine causal inferences about the “effect” of prolonged and exclusive breastfeeding on slower infant growth.
body size; breast feeding; causal inference; evidence; infant
Background: Health plans must prioritize disease management efforts to reduce hospitalization and mortality rates in heart failure patients.
Methods and Results: We developed a risk model to predict the 5-year risk of mortality or hospitalization for heart failure among patients at a large health maintenance organization. We identified 4696 patients who had an echocardiogram and a heart failure diagnosis from 1999 to 2004.
We observed a 56% five-year risk of hospitalization for heart failure or death (95% confidence interval, 54% to 58%). The hazard ratios for echocardiogram data contributed statistically significantly to the model, but echocardiogram findings did not improve our ability to predict risk accurately once we had accounted for demographic characteristics and clinical findings. A more complex model demonstrated a modest capacity to accurately predict risk. Our risk model discriminated the highest- and lowest-risk patients with limited success–the observed risk was 3 times higher in the highest risk quintile, compared with the lowest-risk quintile.
Conclusions: Using data available from electronic health records, we developed a series of risk-prediction models for poor outcomes in patients with heart failure. We found that a relatively simple model is as effective as a more complex model, but that all the models predict with only modest accuracy. Until better prediction variables are available for heart failure patients, our prediction model may be valuable for prioritizing centralized disease management program efforts by stratifying patients according to their absolute risk of poor outcomes.
That conditioning on a common effect of exposure and outcome may cause selection, or collider-stratification, bias is not intuitive. We provide two hypothetical examples to convey concepts underlying bias due to conditioning on a collider. In the first example, fever is a common effect of influenza and consumption of a tainted egg-salad sandwich. In the second example, case-status is a common effect of a genotype and an environmental factor. In both examples, conditioning on the common effect imparts an association between two otherwise independent variables; we call this selection bias.
Bias; selection; methods; epidemiologic
The authors investigated variations in cognitive ability by gestational age among 13,824 children at age 6.5 years who were born at term with normal weight, using data from a prospective cohort recruited in 1996–1997 in Belarus. The mean differences in the Wechsler Abbreviated Scales of Intelligence were examined by gestational age in completed weeks and by fetal growth after controlling for maternal and family characteristics. Compared with the score for those born at 39–41 weeks, the full-scale intelligence quotient (IQ) score was 1.7 points (95% confidence interval (CI): −2.7, −0.7) lower in children born at 37 weeks and 0.4 points (95% CI: −1.1, 0.02) lower at 38 weeks after controlling for confounders. There was also a graded relation in postterm children: a 0.5-points (95% CI: −2.6, 1.6) lower score at 42 weeks and 6.0 points (95% CI: −15.1, 3.1) lower at 43 weeks. Compared with children born large for gestational age (>90th percentile), children born small for gestational age (<10th percentile) had the lowest IQ, followed by those at the 10th–50th percentile and those at the >50th–90th percentile. These findings suggest that, even among healthy children born at term, cognitive ability at age 6.5 years is lower in those born at 37 or 38 weeks and those with suboptimal fetal growth.
birth weight; cognition; gestational age; term birth
Our objective was to examine the association between HIV and HCV discordant infection status and the sharing of drug equipment by injection drug users (IDUs). IDUs were recruited from syringe exchange and methadone treatment programmes in Montreal, Canada. Characteristics of participants and their injecting partners were elicited using a structured questionnaire. Among 159 participants and 245 injecting partners, sharing of syringes and drug preparation equipment did not differ between concordant or discordant partners, although HIV-positive subjects did not share with HIV-negative injectors. Sharing of syringes was positively associated with discordant HIV status (OR = 1.85) and negatively with discordant HCV status (OR = 0.65), but both results were not statistically significant. Sharing of drug preparation equipment was positively associated with both discordant HIV (OR = 1.61) and HCV (OR = 1.18) status, but both results were non-significant. Factors such as large injecting networks, frequent mutual injections, younger age, and male gender were stronger predictors of equipment sharing. In conclusion, IDUs do not appear to discriminate drug equipment sharing partners based at least on their HCV infection status. The results warrant greater screening to raise awareness of infection status, post-test counselling to promote status disclosure among partners, and skill-building to avoid equipment sharing between discordant partners.
PMID: 19172434 CAMSID: cams1471
Overadjustment is defined inconsistently. This term is meant to describe control (eg, by regression adjustment, stratification, or restriction) for a variable that either increases net bias or decreases precision without affecting bias. We define overadjustment bias as control for an intermediate variable (or a descending proxy for an intermediate variable) on a causal path from exposure to outcome. We define unnecessary adjustment as control for a variable that does not affect bias of the causal relation between exposure and outcome but may affect its precision. We use causal diagrams and an empirical example (the effect of maternal smoking on neonatal mortality) to illustrate and clarify the definition of overadjustment bias, and to distinguish overadjustment bias from unnecessary adjustment. Using simulations, we quantify the amount of bias associated with overadjustment. Moreover, we show that this bias is based on a different causal structure from confounding or selection biases. Overadjustment bias is not a finite sample bias, while inefficiencies due to control for unnecessary variables are a function of sample size.
The ‘birthweight paradox’ describes the phenomenon whereby birthweight-specific mortality curves cross when stratified on other exposures, most notably cigarette smoking. The paradox has been noted widely in the literature and numerous explanations and corrections have been suggested. Recently, causal diagrams have been used to illustrate the possibility for collider-stratification bias in models adjusting for birthweight. When two variables share a common effect, stratification on the variable representing that effect induces a statistical relation between otherwise independent factors. This bias has been proposed to explain the birthweight paradox.
Causal diagrams may illustrate sources of bias, but are limited to describing qualitative effects. In this paper, we provide causal diagrams that illustrate the birthweight paradox and use a simulation study to quantify the collider-stratification bias under a range of circumstances. Considered circumstances include exposures with and without direct effects on neonatal mortality, as well as with and without indirect effects acting through birthweight on neonatal mortality. The results of these simulations illustrate that when the birthweight-mortality relation is subject to substantial uncontrolled confounding, the bias on estimates of effect adjusted for birthweight may be sufficient to yield opposite causal conclusions, i.e. a factor that poses increased risk appears protective. Effects on stratum-specific birthweight-mortality curves were considered to illustrate the connection between collider-stratification bias and the crossing of the curves. The simulations demonstrate the conditions necessary to give rise to empirical evidence of the paradox.
collider-stratification bias; birthweight; directed acyclic graphs; neonatal nortality
Investigators have long puzzled over the observation that low-birthweight babies of smokers tend to fare better than low-birthweight babies of non-smokers. Similar observations have been made with regard to factors other than smoking status, including socio-economic status, race and parity. Use of standardised birthweights, or birthweight z-scores, has been proposed as an approach to resolve the crossing of the curves that is the hallmark of the so-called birthweight paradox. In this paper, we utilise directed acyclic graphs, analytical proofs and an extensive simulation study to consider the use of z-scores of birthweight and their effect on statistical analysis. We illustrate the causal questions implied by inclusion of birthweight in statistical models, and illustrate the utility of models that include birthweight or z-scores to address those questions.
Both analytically and through a simulation study we show that neither birthweight nor z-score adjustment may be used for effect decomposition. The z-score approach yields an unbiased estimate of the total effect, even when collider-stratification would adversely impact estimates from birthweight-adjusted models; however, the total effect could have been estimated more directly with an unadjusted model. The use of z-scores does not add additional information beyond the use of unadjusted models. Thus, the ability of z-scores to successfully resolve the paradoxical crossing of mortality curves is due to an alteration in the causal parameter being estimated (total effect), rather than adjustment for confounding or effect decomposition or other factors.
birthweight; birthweight paradox; simulation; directed acyclic graphs; z-scores
Michael S Kramer and colleagues suggest that double clustering might explain the negative results of some cluster randomised trials and describe some strategies for avoiding the problem
Contemporary fetal growth standards are created by using theoretical properties (percentiles) of birth weight (for gestational age) distributions. The authors used a clinically relevant, outcome-based methodology to determine if separate fetal growth standards are required for singletons and twins. All singleton and twin livebirths between 36 and 42 weeks’ gestation in the United States (1995–2002) were included, after exclusions for missing information and other factors (n = 17,811,922). A birth weight range was identified, at each gestational age, over which serious neonatal morbidity and neonatal mortality rates were lowest. Among singleton males at 40 weeks, serious neonatal morbidity/mortality rates were lowest between 3,012 g (95% confidence interval (CI): 3,008, 3,018) and 3,978 g (95% CI: 3,976, 3,980). The low end of this optimal birth weight range for females was 37 g (95% CI: 21, 53) less. The low optimal birth weight was 152 g (95% CI: 121, 183) less for twins compared with singletons. No differences were observed in low optimal birth weight by period (1999–2002 vs. 1995–1998), but small differences were observed for maternal education, race, parity, age, and smoking status. Patterns of birth weight-specific serious neonatal morbidity/neonatal mortality support the need for plurality-specific fetal growth standards.
birth weight; fetal development; gestational age; infant mortality; morbidity
Secondary syringe exchange (SSE) refers to the exchange of sterile syringes between injection drug users (IDUs). To date there has been limited examination of SSE in relation to the social networks of IDUs. This study aimed to identify characteristics of drug injecting networks associated with the receipt of syringes through SSE. Active IDUs were recruited from syringe exchange and methadone treatment programs in Montreal, Canada, between April 2004 and January 2005. Information on each participant and on their drug-injecting networks was elicited using a structured, interviewer-administered questionnaire. Subjects’ network characteristics were examined in relation to SSE using regression models with generalized estimating equations. Of 218 participants, 126 were SSE recipients with 186 IDUs in their injecting networks. The 92 non-recipients reported 188 network IDUs. Networks of SSE recipients and non-recipients were similar with regard to network size and demographics of network members. In multivariate analyses adjusted for age and gender, SSE recipients were more likely than non-recipients to self-report being HIV-positive (OR = 3.56 [1.54–8.23]); require or provide help with injecting (OR = 3.74 [2.01–6.95]); have a social network member who is a sexual partner (OR = 1.90 [1.11–3.24]), who currently attends a syringe exchange or methadone program (OR = 2.33 [1.16–4.70]), injects daily (OR = 1.77 [1.11–2.84]), and shares syringes with the subject (OR = 2.24 [1.13–4.46]). SSE is associated with several injection-related risk factors that could be used to help focus public health interventions for risk reduction. Since SSE offers an opportunity for the dissemination of important prevention messages, SSE-based networks should be used to improve public health interventions. This approach can optimize the benefits of SSE while minimizing the potential risks associated with the practice of secondary exchange.
HIV; Hepatitis C; Injection drug use; Social network; Secondary syringe exchange; Syringe sharing
The objective of most biomedical research is to determine an unbiased estimate of effect for an exposure on an outcome, i.e. to make causal inferences about the exposure. Recent developments in epidemiology have shown that traditional methods of identifying confounding and adjusting for confounding may be inadequate.
The traditional methods of adjusting for "potential confounders" may introduce conditional associations and bias rather than minimize it. Although previous published articles have discussed the role of the causal directed acyclic graph approach (DAGs) with respect to confounding, many clinical problems require complicated DAGs and therefore investigators may continue to use traditional practices because they do not have the tools necessary to properly use the DAG approach. The purpose of this manuscript is to demonstrate a simple 6-step approach to the use of DAGs, and also to explain why the method works from a conceptual point of view.
Using the simple 6-step DAG approach to confounding and selection bias discussed is likely to reduce the degree of bias for the effect estimate in the chosen statistical model.
Discrepancies between the conclusions of different meta-analyses (quantitative syntheses of systematic reviews) are often ascribed to methodological differences. The objective of this study was to determine the discordance in interpretations when meta-analysts are presented with identical data.
We searched the literature for all randomized clinical trials (RCT) and review articles on the efficacy of intravenous magnesium in the early post-myocardial infarction period. We organized the articles chronologically and grouped them in packages. The first package included the first RCT, and a summary of the review articles published prior to first RCT. The second package contained the second and third RCT, a meta-analysis based on the data, and a summary of all review articles published prior to the third RCT. Similar packages were created for the 5th RCT, 10th RCT, 20th RCT and 23rd RCT (all articles). We presented the packages one at a time to eight different reviewers and asked them to answer three clinical questions after each package based solely on the information provided. The clinical questions included whether 1) they believed magnesium is now proven beneficial, 2) they believed magnesium will eventually be proven to be beneficial, and 3) they would recommend its use at this time.
There was considerable disagreement among the reviewers for each package, and for each question. The discrepancies increased when the heterogeneity of the data increased. In addition, some reviewers became more sceptical of the effectiveness of magnesium over time, and some reviewers became less sceptical.
The interpretation of the results of systematic reviews with meta-analyses includes a subjective component that can lead to discordant conclusions that are independent of the methodology used to obtain or analyse the data.
Interpregnancy interval (IPI), marital status, and neighborhood are independently associated with birth outcomes. The joint contribution of these exposures has not been evaluated. We tested for effect modification between IPI and marriage, controlling for neighborhood.
We analyzed a cohort of 98,330 live births in Montréal, Canada from 1997–2001 to assess IPI and marital status in relation to small for gestational age (SGA) birth. Births were categorized as subsequent-born with short (<12 months), intermediate (12–35 months), or long (36+ months) IPI, or as firstborn. The data had a 2-level hierarchical structure, with births nested in 49 neighborhoods. We used multilevel logistic regression to obtain adjusted effect estimates.
Marital status modified the association between IPI and SGA birth. Being unmarried relative to married was associated with SGA birth for all IPI categories, particularly for subsequent births with short (odds ratio [OR] 1.60, 95% confidence interval [CI] 1.31–1.95) and intermediate (OR 1.48, 95% CI 1.26–1.74) IPIs. Subsequent births had a lower likelihood of SGA birth than firstborns. Intermediate IPIs were more protective for married (OR 0.50, 95% CI 0.47–0.54) than unmarried mothers (OR 0.65, 95% CI 0.56–0.76).
Being unmarried increases the likelihood of SGA birth as the IPI shortens, and the protective effect of intermediate IPIs is reduced in unmarried mothers. Marital status should be considered in recommending particular IPIs as an intervention to improve birth outcomes.
After publication it was brought to our attention that the information for one of the variables in Table 1 was incorrect (Weiss, O'Loughlin et al. International Journal of Behavioral Nutrition and Physical Activity 2007, 4:2). The variable in question is "Use of a neighborhood facility for activity". In the first column, the first row should read "yes", and the second row, "no". In the second column, the first row should read 25.8 (41) and the second row, 41.3 (152).
Unadjusted and adjusted Odds Ratios for potential predictors of becoming inactive .
aBody Mass Index
* Not included in final model
Obesity in North America is now endemic, and increased understanding of the determinants of physical inactivity is critical. This analysis identified predictors of declines in physical activity over 5 years among adults in low-income, inner-city neighbourhoods.
Data on leisure time physical activity were collected in telephone interviews in 1992 and 1997 from 765 adults (47% of baseline respondents), as part of the evaluation of a community-based cardiovascular disease risk reduction program.
One-third of 527 participants who were physically active at baseline, were inactive in 1997. Predictors of becoming inactive included female sex (OR = 1.63 95% CI (1.09, 2.43)), older age (1.02 (1.01, 1.04)), higher BMI (1.57 (1.03, 2.40)), poor self-rated health (1.39 (1.05, 1.84)), lower self-efficacy for physical activity (1.46 (1.00, 2.14)), and not using a neighborhood facility for physical activity (1.61 (1.02, 2.14)).
These results highlight the fact that a variety of variables play a role in determining activity level, from demographic variables such as age and sex, to psychosocial and environmental variables. In addition, these results highlight the important role that other health-related variables may play in predicting physical activity level, in particular the observed association between baseline BMI and the increased risk of becoming inactive over time. Lastly, these results demonstrate the need for multi-component interventions in low-income communities, which target a range of issues, from psychosocial factors, to features of the physical environment.
Search filters or hedges play an important role in evidence-based medicine but their development depends on the availability of a "gold standard" – a reference standard against which to establish the performance of the filter. We demonstrate the feasibility of using relative recall of included studies from multiple systematic reviews to validate methodological search filters as an alternative to validation against a gold standard formed through hand searching.
We identified 105 Cochrane reviews that used the Highly Sensitive Search Strategy (HSSS), included randomized or quasi-randomized controlled trials, and reported their included studies. We measured the ability of two published and one novel variant of the HSSS to retrieve the MEDLINE-index studies included in these reviews.
The systematic reviews were comprehensive in their searches. 72% of included primary studies were indexed in MEDLINE. Relative recall of the three strategies ranged from .98 to .91 across all reviews and more comprehensive strategies showed higher recall.
An approach using relative recall instead of a hand searching gold standard proved feasible and produced recall figures that were congruent with previously published figures for the HSSS. This technique would permit validation of a methodological filter using a collection of approximately 100 studies of the chosen design drawn from the included studies of multiple systematic reviews that used comprehensive search strategies.
Most electronic search efforts directed at identifying primary studies for inclusion in systematic reviews rely on the optimal Boolean search features of search interfaces such as DIALOG® and Ovid™. Our objective is to test the ability of an Ultraseek® search engine to rank MEDLINE® records of the included studies of Cochrane reviews within the top half of all the records retrieved by the Boolean MEDLINE search used by the reviewers.
Collections were created using the MEDLINE bibliographic records of included and excluded studies listed in the review and all records retrieved by the MEDLINE search. Records were converted to individual HTML files. Collections of records were indexed and searched through a statistical search engine, Ultraseek, using review-specific search terms. Our data sources, systematic reviews published in the Cochrane library, were included if they reported using at least one phase of the Cochrane Highly Sensitive Search Strategy (HSSS), provided citations for both included and excluded studies and conducted a meta-analysis using a binary outcome measure. Reviews were selected if they yielded between 1000–6000 records when the MEDLINE search strategy was replicated.
Nine Cochrane reviews were included. Included studies within the Cochrane reviews were found within the first 500 retrieved studies more often than would be expected by chance. Across all reviews, recall of included studies into the top 500 was 0.70. There was no statistically significant difference in ranking when comparing included studies with just the subset of excluded studies listed as excluded in the published review.
The relevance ranking provided by the search engine was better than expected by chance and shows promise for the preliminary evaluation of large results from Boolean searches. A statistical search engine does not appear to be able to make fine discriminations concerning the relevance of bibliographic records that have been pre-screened by systematic reviewers.