|Home | About | Journals | Submit | Contact Us | Français|
Evaluating the success of major funding programs from the National Institutes of Health (NIH) remains a vexing challenge. We propose a set of criteria to evaluate epidemiological studies that fit within the discovery, development, and delivery paradigm introduced by the NIH. We apply these criteria to the Nurses’ Health Study (NHS), a large epidemiological cohort study initiated in the 1970s to evaluate the associations between oral contraceptives and risk of breast cancer and between diet and other lifestyle factors and risk of cancer overall. Our evaluation suggests that the NHS has led to important changes in health practice, and it underscores the need to develop metrics that are suitable to the evaluation of large epidemiological cohort studies.
In its annual plan and budget proposal for 2004, the National Cancer Institute (NCI) identified this challenge: to “accelerate the engine of discovery; translate knowledge gained about the genetic, molecular, and cellular basis of cancer into the development of interventions to detect, diagnose, treat, and prevent cancer; and ensure that these interventions are delivered to all who need them” (1). But how is success in meeting this challenge to be measured? Research productivity has increasingly come to be measured by publication of scientific articles in peer-reviewed journals of varying impact factors and the extent to which other scientists cite these articles in their work. The National Institutes of Health (NIH), including the NCI, have used this metric in assessing new grant proposals and the success of grants it funded. However, other metrics—sensitive indicators of the development and delivery of improvements in health—are needed.
One innovative means of evaluating success in fostering scientific discovery was implemented by the NCI’s Division of Cancer Control and Population Sciences (DCCPS). It focuses on determining whether its initiatives soliciting grants on a thematic topic induce collaborative science in ways that the individual grants do not. This approach has been applied to recent DCCPS initiatives including the Transdisciplinary Tobacco Use Research Centers (TTURCs) (2). Four metrics have been used in the evaluation of TTURCs. These are: the pace at which new collaborative transdisciplinary projects are developed (intellectual integration); the speed of implementing the Centers; cumulative changes in the collaborative behaviors and values of participants in the Centers; and the extent to which there is a move to collaborative publications (2). This example serves as a model for applying metrics to evaluation of initiatives.
The extramural cancer epidemiology program in DCCPS and its extramural academic partners have been exploring how to assess NCI’s cancer epidemiology program according to the classical uses or purposes of epidemiology. Some of these uses or aims correspond to phases of the discovery, development, and delivery paradigm (3). For example, a purpose of epidemiology that corresponds to the discovery challenge is to explain the etiology of diseases and health conditions (4). Assessment of articles on this issue by defined criteria in the peer-reviewed literature may provide an excellent measurement tool for assessing discovery because scientific journals focus on direct scientific findings from research.
However, this approach cannot address the success of federal initiatives and investigator-initiated research in epidemiology in terms of development and delivery, and in this commentary we will propose additional criteria in these areas. We apply these criteria to the Nurses’ Health Study (NHS), a large prospective cohort study that was initiated in the 1976 and has followed 121700 women who were 30–55 years of age at baseline to evaluate the associations between oral contraceptives and risk of breast cancer and between diet and other lifestyle factors and risk of cancer overall (5). Although the cohort is funded primarily by NCI, it receives focused support for the study of other endpoints from other pertinent NIH institutes. We discuss how our approach to evaluation of the NHS may be generally applicable to epidemiology.
The purpose of epidemiology that corresponds to discovery is to explain the etiology of diseases and health conditions by providing information about the distribution of cancer in populations, testing (or helping to formulate) hypotheses about disease etiology, and identifying new risk factors. We propose that success of the NHS in meeting this aim can be reasonably assessed based on the number of its publications, most of which address risk factors for cancer, and the impact factors of the journals in which they appear. Therefore, our assessment of discovery was based on a database of NHS publications that is maintained at Harvard University.
In the area of development, the aim of epidemiology is to provide the scientific basis for developing control measures and prevention strategies for groups and populations at risk and to develop needed public health measures and practices. The criteria we used in the area of development were: contributions of NHS publications to establishing factors that cause disease; development of risk models; development of clinical or prevention guidelines; and quantification of the preventable burden of disease (Table 1). We also sought to determine if NHS publications were used to justify the initiation of prevention and clinical trials and whether NHS findings were confirmed in these kinds of trials.
In terms of delivery, the goal of epidemiology can be considered to be implementation of the epidemiological findings by the public, clinicians, health practitioners, policy makers, industry, and others. Therefore, our delivery criteria were the use of NHS research findings by these end users in the areas of governmental policy, regulation, and industry applications (Table 1).
In assessing the NHS in terms of development and delivery, we selected NHS publications for which we could identify bibliographic or other authoritative sources of evidence that the NHS addressed one or more of the criteria. We sought to identify examples that addressed each of the criteria, but we did not attempt to evaluate all NHS publications. Publications that referenced NHS articles and addressed a particular criterion for success were identified through Web of Science searches; through review of references in published guidelines from major cancer organizations, professional societies, and other organizations; and through review of references (including those to guidelines and clinical trials) that supported conclusions about health effects in International Agency for Research on Cancer (IARC) (101) reports and reports from the US Surgeon General. Additional selection standards were applied to determine if the references were evidence of NHS success. For example, if the reference was to a clinical guideline, publication of the guideline had to use results from the NHS as support. If the reference was to a trial, the NHS had to be cited in the introduction (ie, as part of the rationale for the trial). Selection of a reference that established causality required that the NHS had been used to provide positive support for the ultimate conclusion about causality.
The early years of the NHS (extending to the first 10 years or so) led to relatively few publications (Figure 1), reflecting the young age of the women and the limited number of endpoints available for study. However, as of February 2005, after 28 years of follow-up, the findings from the NHS had led to numerous publications in a range of high-impact journals (Figure 2), including 36 publications in the New England Journal of Medicine, 41 in the Journal of the American Medical Association, and 41 in the Journal of the National Cancer Institute; 56 methodological pieces as well as papers reporting major findings appeared in the American Journal of Epidemiology. Reflecting the cohort’s status as a unique resource for the study of cancer in women and the fact that its primary funding was from the NCI, most of the publications addressed risk factors for cancer (Figure 3). Risk factors discovered using blood samples that were stored by the NHS for prospective evaluation of cancer risk included insulin-like growth factor (IGF) (for premenopausal breast cancer) (6); premenopausal estrogen, prolactin (7), and testosterone (8) (for breast cancer in general); and endogenous hormone levels (for receptor-positive postmenopausal breast cancer) (8). Nurses’ Health Study investigators also reported that folate, cysteine, (9) and vitamin D blood levels were associated with the risk of breast cancer (10,11), and the NHS also documented that radial scars on benign breast biopsies were associated with risk of subsequent breast cancer independent of other markers of proliferative disease in benign biopsies (12).
Statistical methods have also been developed by the investigators of the NHS to address issues that have arisen in the analysis of the prospective data with repeated measures (13). These include approaches to measurement error correction (14), validation studies with variable number of observations (15), and polytomous regression with time-varying covariates (16).
Epidemiological findings are a critical component of established approaches to the assessment of disease causality (17). Several organizations, including the IARC and the Office of the US Surgeon General, impanel experts to evaluate the published scientific literature using a systematic series of criteria to reach conclusions about causality. Findings from the NHS on the associations of alcohol intake and exogenous hormone exposure with breast cancer risk (18) and the association of tubal ligation with ovarian cancer risk (19) have contributed substantially to conclusions about causality in reports from the IARC.
For the study of many lifestyle factors in epidemiological investigations (eg, consumption of trans-fatty acids, weight gain, tubal ligation), randomized controlled trials are neither ethical nor practical. Making the case for the development of control and prevention efforts therefore involves assembling and assessing the evidence from epidemiological studies and other sources (20). Thus, preventive service and community practice guidelines often cite epidemiological findings as supportive evidence in lieu of or in addition to findings from randomized controlled trials (21,22). In Table 2, we list findings from the NHS that pertain to the association of behavioral and dietary risk factors with risk of cancer and their impact in terms of guidelines that draw on this evidence. For example, NHS data (23), along with findings from American Cancer Society (ACS) cohort studies and other epidemiological studies (73), led to recommendations by the ACS that people reduce alcohol intake and red meat consumption and increase physical activity (74).
Findings from observational studies provide essential rationale for conducting clinical trials. Findings from NHS that have been followed up in trials include those that showed the associations of endogenous estrogen levels (8,71), IGF (6), and combination hormone therapy (61) with increased risk of breast cancer. Development can also be measured by the extent to which epidemiological findings regarding protective or adverse effects are used to develop and test cancer control interventions at the level of individuals or communities or lead to the initiation of chemoprevention trials to reduce risk. The initiation of the Selenium and Vitamin E Cancer Prevention Trial (75), which built on epidemiological findings that found that lower selenium levels were associated with lower cancer risk, is an example of this use of epidemiological findings. Similar additional examples are the development of behavior change interventions in worksites (76) and for patients who have had a colon polyp removed (34).
Determination of the preventable burden of disease is important to understanding disease inequalities in human populations and provides the basis for deciding which populations should be targeted for disease control and prevention. The preventable burden of disease depends on the prevalence of the risk factor for the disease in the population and the relative risk of disease among those exposed and not exposed to the risk factor. The preventable burden of disease has been estimated from NHS for several diseases, including heart disease (72) and diabetes (77), and the results suggest that modifiable risk factors are substantial contributors to the population burden of those diseases. Data from the NHS have contributed to estimates of the population-attributable risk (ie, the percentage of disease that could be prevented if the risk factor could be eliminated) for obesity and inactivity in relation to cancer, as summarized by the IARC (36).
The development of risk prediction models has drawn on the rich longitudinal data resources generated by cohort studies. For example, the Framingham Heart Study is the basis for the Framingham risk score for heart disease, which has been used extensively in clinical settings and validated in other populations (78). Risk prediction models, which permit the translation of data about risk factors into algorithms to identify high-risk subsets of the population, are a growing application of epidemiological data (79,80). Cancer-specific models have been developed from the prospective NHS data for breast cancer (81), ovarian cancer (82), and melanoma (83). Ongoing work will refine these models to include biomarkers and explore the usefulness of these models for predicting the development of disease. As Glasziou and Irwig (84) have pointed out, translating results from randomized controlled trials to therapy decisions for individual patients requires an assessment of underlying risk that can be obtained only from epidemiological studies.
The extent to which delivery is successful can be assessed by use of the cohort study’s findings by end users. The public at large is an end user, and awareness of the results of the epidemiological research is an important prelude to behavior change. Much media attention has focused on the individual findings from the NHS cohort studies as they were published. This was often followed by more detailed coverage (in monthly magazines). To our knowledge, no database is available to place the coverage in context with that received by other NIH-funded research, and we did not apply this criterion to the NHS. Clinicians and public health practitioners are also end users. They tend to use guidelines developed from epidemiological studies rather than results of a single study as the evidence base for their practices. Therefore, individual NHS findings for these end users are not listed in Table 2. Finally, policy makers and industry are also end users. For example, findings on dietary trans-fat intake and increased risk of coronary heart disease (72) led the US Food and Drug Administration to change regulations such that as of January 2006 trans-fatty acids in foods had to be indicated on the package’s Nutrition Facts panels. The finding in the NHS of an association of higher vitamin A intake with increased risk of fractures (85,86) led manufacturers of multivitamins to reduce their vitamin A content.
Although discovery is often the primary goal of scientific investigations, success is frequently narrowly defined in terms of numbers of journal articles and the perceived quality of the journal. Cohort studies in cancer epidemiology—which require large numbers of study participants and long time frames and lead to relatively few publications in the initial years, when there are few case subjects with the outcomes of interest—are criticized for their relatively high costs (87). The value of these studies has also been criticized on the basis that discovery is rare and because the time frame for investigation is such that these studies usually only confirm findings previously reported from weaker study designs.
Our analysis of the publication record of the NHS shows that the numbers of publications increased rapidly 10 years after the cohort’s establishment and that ultimately hundreds of publications reported findings from the cohort. This is consistent with the experience of other long-standing cohorts. For example, the Framingham Heart Study lists more than 1200 papers. Although the early focus of the Framingham Study was predominantly on the identification of risk factors for cardiovascular disease, studies of other chronic diseases (eg, osteoarthritis) were added over time. The Atherosclerosis Risk In Communities (ARIC) lists some 500 research publications over the life of the cohort to date. Almost all are cardiovascular in focus. The Rotterdam Study has focused on the incidence of major diseases in the elderly, and in the period from 1990 to 2004 it published some 500 papers addressing a broad range of endpoints and intermediate disease markers (88). In general, in long-established and well-conducted cohort studies the rate of production of publications tends to accelerate over time. The success of cohorts should be evaluated accordingly. Our analysis of the NHS reveals that this cohort has been successful in terms of discovery, development, and delivery. Moreover, we suggest that the approach to assessment that we outline can be used to evaluate other cohorts and other epidemiological studies.
The addition of several criteria to traditional measures of productivity based on the number of articles in journals of a given impact factor is appropriate for epidemiology. We extended our evaluation of scientific discovery to include evidence of uses of NHS and other cohort study data to explain disparities among populations with regard to the incidence of cancer and contributions to establishing causality. Our evaluation suggests that NHS findings have been important in explaining the lack of association between total fat intake and breast cancer risk, the direct association between physical activity and reduced risk of colon cancer, and the association between long-term intake of folate and methionine and reduced colon cancer risk. Our evaluation also suggests that NHS findings have helped to clarify the role of obesity and weight gain in increasing cancer risk among women and that they were critical in demonstrating that both alcohol consumption and adult weight gain are associated with increased risk of breast cancer.
Cohorts with a focus on multiple outcomes beyond cancer permit investigators to evaluate the risks and benefits of lifestyle and medications more comprehensively than cohorts with focus only on one disease. Evaluating the associations of multiple outcomes and lifestyle factors and medications with data on disease incidence avoids potential biases induced by behavior change after diagnosis of chronic diseases or the effects of comorbidities and different therapies administered after diagnosis. Also, NHS investigators have evaluated potential risk factors such as alcohol consumption (89), obesity (45), physical activity (90), and oral contraceptive use (91) in terms of total and cause-specific mortality, as well as incidence. Investigators have also evaluated a measure of diet quality in relation to incidence of chronic diseases (92).
Although we have identified some criteria that can be used to examine the productivity of a cohort such as the NHS across all domains and have found some data and bibliographic sources that would be useful in developing quantitative measures of discovery, developing adequate quantitative measures of the impact of scientific research in the development and delivery domains is difficult. Data sources for development and delivery applications, except for a few well-established searchable sources, such as the IARC monograph series and the Community Preventive Services Task Force, are not available. In addition, certain criteria typically can only be applied after the passage of sometimes substantial amounts of time. For example, one important criterion, the extent to which there is confirmation in clinical trials of epidemiological findings, can take years to assess because of the long time period required for sufficient scientific support to mount, execute, and analyze findings from trials.
Another possible criterion with which to evaluate the success of any large initiative is the extent of collaborative activities. These activities may involve pooling data on individual study participants from multiple cohort studies to improve sample size for specialized epidemiological investigations. Data from the NHS have been contributed to the Oxford University–led effort to combine data pertaining to the association of exogenous hormonal exposures (93) and endogenous hormones (94) with breast cancer risk, and NHS investigators are now participating in studies of gene–environment interactions that combine prospective data across several cohorts (95).
In summary, as financial resources for health research and their allocation change, and investigators must consider commitment to studies that can easily take an entire career, appropriate and comprehensive evaluation becomes increasingly important. We hope that the criteria suggested here and our application of them to NHS will initiate greater discussion of the issues relevant to evaluation, promote further improvements in the evaluation of large epidemiological studies beyond the confines of bibliographic analysis, and foster the development of improved data sources for evaluations.
This work was conducted while Dr Colditz was principal investigator on the NHS (CA89769) and under an interagency professional agreement with Epidemiology and Genetics Research Program, Division of Cancer Control and Population Science, National Cancer Institute.