|Home | About | Journals | Submit | Contact Us | Français|
Abbreviation: DDD, difference-in-difference-in-differences.
Evidence-based policymaking is becoming the norm, but how do we acquire the evidence to inform policies? In their article in the present issue of the Journal, Basu et al. (Am J Epidemiol. 2016;183(6):531–538) used difference-in-difference-in-differences models and a synthetic control approach to examine the effects of the 1996 welfare reforms on health outcomes among single mothers. In the present commentary, we discuss the limitations of observational studies for policy evaluation. Difference-in-differences models, from the field of economics, offer a rigorous approach to cope with those limitations.
Evidence-based policymaking is becoming the norm (1) and is currently being discussed federally (2). How do we acquire the evidence to inform policies? The scientific community would ideally conduct a randomized controlled trial to randomly allocate policies to states. It might then take years to gather enough evidence to make an informed decision. Not only is this unfeasible scientifically, because policies often cannot be and are not implemented randomly, but the timeframe is also untenable. Unfortunately, observational studies are often used to evaluate policies, and as we learn in Epidemiology 101, associations do not imply causation.
There are 2 problems with using observational studies to evaluate policies. First, reverse causation cannot be ruled out. Tobacco control efforts provide a good example. States with historically lower smoking rates continue to strengthen policies, whereas in tobacco-growing states, lobbying has succeeded in thwarting these efforts (3, 4). Thus, we cannot rule out the possibility that lower smoking rates caused the implementation of stronger policies rather than the stronger policies causing the lower smoking rates. Second, we cannot account for background changes that have occurred over time. Trends in health behaviors, such as decreases in adult smoking rates (5), might create the appearance of an association. Multiple years of data in 1 state before and after a policy change may show a decrease in smoking over time. However, if smoking decreased in all states over this same time period, it cannot be concluded that the stronger policies caused the decrease. In sum, alternative methodologies are needed, and epidemiologists should look outside the discipline.
Difference-in-differences models, which are from the field of economics, partially address these limitations by using repeated cross-sectional data linked to state policies (6). In its simplest form, the model compares changes in the outcome in those states that have implemented a policy with changes in the outcome in those states with no policy. This approach removes trends over time in both intervention and control states. It is then possible to conclude that significant changes in the outcome are associated with the new policy. Regression models allow for the inclusion of other covariates, including sociodemographic characteristics, and of state and time fixed effects, which control for time invariant characteristics at the state-level and for nationwide time trends, respectively. Carpenter and Cook (7) nicely demonstrated the differences between results when using an observational approach with no state or year fixed effects and those from a difference-in-differences approach with both controls. The authors illustrated how the effect sizes are often smaller when using the latter methods because cross-sectional analyses do not account for the unobserved state characteristics, which are often correlated with both the policy and outcome measures. Despite this rigorous methodology, evaluations often focus only on the overall impact of policies, without considering whether they may differentially affect disadvantaged groups and, ultimately, health disparities.
In the present issue of the Journal, Basu et al. (8) examined the effects of the 1996 welfare reforms on health outcomes among single mothers by using repeated cross-sections of the Behavior Risk Factor Surveillance System data from 1993 to 2012. Although the welfare reforms were effective at bringing single, poor women into the workforce (9), the unintended health consequences have been a neglected area of research. Basu et al. used difference-in-difference-in-differences (DDD) models to compare the associations between welfare reforms and health outcomes among single mothers to those among married mothers, single nonmothers, and married nonmothers. The authors noted that this approach controlled for 3 types of unmeasured confounders: factors that are different between groups and are constant over time, factors that vary over time and affect all groups, and factors that vary over time and affect groups differently. They found that welfare reforms were associated with an increase in binge drinking and a decrease in being able to afford medical care. The authors rightly concluded that policymakers need to balance the intended consequences with any potential detrimental effects on health.
Basu et al. reported that they included women who were 18–64 years of age in 1997 and were most affected by welfare reforms, and this cohort was tracked over time. By 2012, these women were 33–79 years of age. First, the age range is very wide, and single mothers who are in their late teens or early twenties are very different from those who are older because of their reasons for being single parents and their social and economic resources differ between the groups. Second, the ages of their children also varied from 0 to 18 years, and age might be associated with outcome measures. Parents who have young children and cannot find childcare may be less likely to attend clinic appointments than parents of older children who can stay home on their own. Parental health behaviors, such as smoking, also vary by the ages of the children in the household (10). Only adjusting for whether a child is present rather than for the age of the child might not adequately account for potential confounding factors. The authors found that the health statuses of unemployed single mothers declined relative to those of their employed counterparts, but they did not control for the presence of multiple preschool-aged children, which may render employment infeasible.
Although Basu et al. utilized methods that have the potential to yield meaningful results from observational data, a number of methodological issues arise. The authors' coding of their policy variable, Reform, is provided in Web Table 2 of their article (8). They used the earlier date (if waivers were granted) for each state as the effective date of reform. A tabulation shows that reforms are dated 1992 in 2 states, 1993 in 4 states, 1994 in 4 states, and 1995 in 8 states, all of which were before the passage of the reform legislation in 1996. Nineteen states had an effective reform date of 1996, with the remaining 13 states (including Washington, DC) having a date of 1997. Because their sample from the Behavior Risk Factor Surveillance System starts with the 1993 wave, very few years of data (or no data) before reform were available for several states. Although analysis through the 2012 wave allowed for precise measurement after reform, the paucity of pre-reform data is somewhat troubling. The authors mentioned that DDD “is limited by … misestimation of secular trends when time series have few pre-intervention time periods” (8, p. 534). Their data included very few pre-intervention control periods, but by using time fixed effects, they were not estimating secular trends. The time fixed effects represent the net effects of all nationwide factors, such as business cycles, that may influence the outcome variables.
The authors also stated that DDD is limited “by potential bias in the estimation of parameters that can result from serial autocorrelation” and that the synthetic control method “can be less biased even if the data are autocorrelated” (8, p. 534). They referenced Bertrand et al. (11), who studied the effect of serially correlated outcomes on the consistency of point estimates' standard errors. However, Bertrand et al. did not consider bias in the parameters' point estimates (or effect sizes), focusing only on potential bias in their measures of precision. Thus, it is difficult to understand the critiques from Basu et al. regarding the DDD yielding biased point estimates.
Basu et al. argued that a limitation of difference-in-differences models is that there is not an ideal control group (8). Their design overview stated that the data are repeated cross-sections with complex survey weights. The DDD method is an ordinary least squares regression, which can readily handle complex survey weights. However, the application of the synthetic control method (the user-written “synth” command in Stata (StataCorp LP, College Station, Texas)) is not clearly explained. That command, in its current form (12), can only be applied to balanced panel data. The published applications of the synthetic control method by its authors relate to aggregate panel data on states or countries (13, 14). It is not at all clear how Basu et al. used the synthetic control method on the Behavior Risk Factor Surveillance System data, which are individual-level pooled cross-sections. For instance, are the individual observations for each state-year collapsed into a single observation for each of the 4 groups (single mothers, married mothers, single nonmothers, and married nonmothers) in order to construct a balanced state-year panel? Although their rationale for applying this methodology in terms of constructing a better control group is clear, the reader is not made aware of how that methodology might actually be used in the context of pooled cross-sections from a complex survey design. In that context, the much larger effect sizes that are derived from this method might lack credence or at minimum lack direct comparability with the DDD estimates derived from the microdata.
Basu et al. (8) make use of 2 empirical methodologies from the social science literature. However, the strengths and weaknesses of those techniques deserve greater mention in their work, and the applicability of the synthetic control method to the context of their data should be clarified. If the synthetic control method is being applied to aggregate data in order to form a balanced panel, the authors should discuss the comparability of effect-size estimates for the 2 methods as applied to data at different levels of aggregation. Because their critiques of the DDD method are not well founded and the synthetic control method appears to rely on the aggregation of microdata, the extension of the study to the aggregate data may be of questionable merit.
“Interdisciplinary collaboration” and “big data” should be considered not just buzz words but the way forward for policy evaluation. Researchers need to build networks across disciplines and harvest state-representative data to rigorously test nascent and established policies to determine for whom and in what context are policies working (or not). In particular, the unintended consequences and downstream effects of policies must be examined and balanced with the intended health, social, or economic benefits. Working collaboratively across disciplines, we have used difference-in-differences models to show that stronger state cigarette taxes not only reduced disparities in rates of maternal smoking during pregnancy (15) but also improved birth outcomes (16). These are welcome by-products of state-level tobacco control efforts.
In summary, we want to challenge economists to think more like epidemiologists and consider not just the overall effects of policies but also whom they may be affecting and whether they are increasing disparities. Equally, epidemiologists need to think more like economists and use rigorous methods for policy evaluation. Just like Basu et al., we have found that working across the aisle creates better research than working alone. We only wish our policymakers and funding agencies would take notice.
Author affiliations: School of Social Work, Boston College, Chestnut Hill, Massachusetts (Summer Sherburne Hawkins, Christopher F. Baum); Department of Economics, Boston College, Chestnut Hill, Massachusetts (Christopher F. Baum); and Deutsches Institut für Wirtschaftforschung, Berlin, Germany (Christopher F. Baum).
Conflict of interest: none declared.