Given the billions of dollars invested in conservation initiatives and research in the past two decades, one may wonder why careful empirical studies and compelling data are lacking (see
, however, for some recent examples). We do not claim to have conducted a formal study on this topic, but our experience in the field leads us to several conclusions.
Box 3. Examples of Ongoing Evaluations (Unpublished) of Conservation Initiatives
Can cash incentives encourage upland farmers to forgo clear-cutting of forests?
A Vietnamese professor designed a quasi-experiment in which forest owners were offered cash to adopt sustainable forest management. Full randomization was considered difficult and costly. However, analysis of initial surveys showed that factors that affect forest use—e.g., availability of family labor, distance to roads—also affect decisions to participate, suggesting evidence of selection bias. Thus a control group of 50 households from a different upland community was chosen such that the characteristics of interest were balanced (i.e., no statistical difference) between “cash treatment” and the control groups. The control group received the forest management training, but not the cash payment. The ongoing experiment, while small and far from perfect, shows how an understanding of proper evaluation techniques can avert potential analytical pitfalls in the design stage (T. Bui Dung, unpublished data).
Does listing and funding under the U.S. Endangered Species Act affect species recovery?
The evidence marshaled to date for and against the effectiveness of the U.S. Endangered Species Act suffers from a problem common in analyses of biodiversity protection measures: the absence of data on what would have happened without the act. Statistical matching methods can be used to select control groups of species and thereby estimate how species listed and funded under the act would have fared had they not been listed or funded. The control groups must be similar in characteristics that can plausibly affect both listing/ funding and recovery (e.g., level of endangerment, biological characteristics, political influences, scientific knowledge, and advocacy). The analysis offers new insights and a methodology to guide evaluation of the effectiveness of non-randomized regulatory approaches to biodiversity protection (P. J. Ferraro, C. McIntosh, and M. Ospina, unpublished data).
Do protected areas improve health and income of local people?
Most answers to this question are based either on ex ante predictions from historical use patterns and strong assumptions, or ex post analyses that often prove only that the poor live near protected areas. Because national parks are not randomly sited, we can expect selection bias in interpreting the impact of parks on local people. An ongoing evaluation tracks health and livelihood outcomes of 1,000 households that traditionally have used resources around four new national parks in Gabon and 1,000 households that live outside the influence of the same parks. The simple selection of control households will go a long way to making a meaningful contribution in the debate over the effects of protected areas on local people [
First, one usually needs a remarkable combination of political will, a strong commitment to transparency, and a strong ethic of accountability to conduct a well-designed evaluation. Second, the diversity of donors and practitioners often leads to a plethora of objectives (e.g., scientific, aesthetic, humanitarian). Encouraging participants, including local actors, to agree on a set of explicit objectives to evaluate may be difficult in many conservation contexts.
At the very least, we must use the principles of evaluation to assess the potential for bias in making inferences about program effectiveness.
Third, conservation researchers are unaware of state-of-the-art empirical program evaluation techniques and the biases in current analyses. Donors and government agencies that fund conservation projects typically know little about program evaluation methods, and the practitioners who implement the projects typically lack incentives for careful analysis and falsification of hypotheses. Thus there is neither funding, nor a demand for funding, to conduct more careful analysis of interventions.
Fourth, many believe that rigorous evaluations of effectiveness are expensive and thus would divert scarce conservation funds toward “non-essential” investments. In contrast, researchers and practitioners in other policy fields have demonstrated that randomized experimental methods can be implemented in the context of small pilot programs or policies that are phased in over time. The difference between what one can learn from a pilot initiative that uses an experimental (or quasi-experimental) design and from one that does not is enormous.
Fifth, the nature of biodiversity conservation can make evaluations more difficult than in other fields. Where outcomes are local, strong and complex spillover effects can occur. Enforcement and cheating can be difficult to verify. Property rights are often unclear in low-income nations and so the effects of interventions are complex both cross-sectionally and in time-series. Biological outcomes often respond slowly to interventions (wildlife stocks), and only time-series identification can be used for many problems.
Sixth, many conservation interventions are short-term projects. The benefits of a careful evaluation, however, will largely be realized after the project ends and will accrue to the global conservation community. Field personnel are thus better off investing their time and resources in actions that will yield benefits to them rather than to the larger conservation community.
Seventh, program evaluation methods require data. In other fields of policy analysis, researchers have longstanding national surveys and historical relationships with government agencies and field practitioners that generate substantial datasets for research. Most conservation interventions, particularly in low-income nations, are framed as independent projects that “test” an idea in one or several locations. Data collection in these locations is often poor or non-existent, with little or no planning for data collection in control “non-project” locations. Furthermore, we can comprehensively link programs to changes in behaviors and conservation success only when we combine data on ecological, geographic, socio-economic, demographic, and institutional measures. Given the disciplinary biases about appropriate scale and methods for data collection, we rarely find such transdisciplinary efforts.
Finally, on a related point, credible estimates of conservation success depend on the ability to vary (or isolate) policy interventions in simple ways across space and time. We are well aware that within the same ecosystem, heterogeneity in institutions, income opportunities, access to markets, and other socio-economic characteristics can lead to different reactions to a given intervention. However, if every village or household is exposed to a different intervention (one gets direct payments, one gets fish farms, one gets agricultural assistance, etc.), we are left with few observations for each intervention and thus cannot make any inferences about effectiveness.
We are not proposing that all policy interventions be uniformly applied across space and time, but we are arguing that
some policy interventions should be conducted in this manner to allow practitioners and decision makers to make inferences about their effectiveness. An evaluation may not be able to address the full range of questions, but addressing a tractable subset of questions may be far more productive, particularly given that reliable knowledge obtained from narrow studies may ultimately inform broader policy questions. Where it is impossible to use experiments, analysts must creatively use quasi-experimental methods to characterize the counterfactual and attribute cause to outcomes. At the very least, we must use the principles of evaluation to assess the potential for bias in making inferences about program effectiveness.
In the field of program evaluation, one lesson is paramount: you cannot overcome poor quality with greater quantity.