|Home | About | Journals | Submit | Contact Us | Français|
In 2012, the National Cancer Institute (NCI) engaged the scientific community to provide a vision for cancer epidemiology in the 21st century. Eight overarching thematic recommendations, with proposed corresponding actions for consideration by funding agencies, professional societies, and the research community emerged from the collective intellectual discourse. The themes are (i) extending the reach of epidemiology beyond discovery and etiologic research to include multilevel analysis, intervention evaluation, implementation, and outcomes research; (ii) transforming the practice of epidemiology by moving towards more access and sharing of protocols, data, metadata, and specimens to foster collaboration, to ensure reproducibility and replication, and accelerate translation; (iii) expanding cohort studies to collect exposure, clinical and other information across the life course and examining multiple health-related endpoints; (iv) developing and validating reliable methods and technologies to quantify exposures and outcomes on a massive scale, and to assess concomitantly the role of multiple factors in complex diseases; (v) integrating “big data” science into the practice of epidemiology; (vi) expanding knowledge integration to drive research, policy and practice; (vii) transforming training of 21st century epidemiologists to address interdisciplinary and translational research; and (viii) optimizing the use of resources and infrastructure for epidemiologic studies. These recommendations can transform cancer epidemiology and the field of epidemiology in general, by enhancing transparency, interdisciplinary collaboration, and strategic applications of new technologies. They should lay a strong scientific foundation for accelerated translation of scientific discoveries into individual and population health benefits.
For decades, epidemiology has provided a scientific foundation for public health and disease prevention (1). Epidemiology has contributed to major scientific discoveries such as the relationship between cigarette smoking and common diseases (2). Yet, the observational nature of much of epidemiologic research has attracted criticism including “excess expense, repudiated findings, studies that offer small incremental, knowledge, inability to innovate at reasonable cost, and failure to identify research questions with the greatest merit” (3).
In the past few years, translational research (4) has sought to accelerate the movement of scientific discoveries into practice and improved health outcomes. However, the main focus of translational research remains, by and large, on basic science to clinical applications (bench to bedside). Epidemiology and other population sciences can be integrated into a full translational framework that spans scientific discoveries through improved population health (4). Within this framework, Lam et al. have identified four drivers that are increasingly shaping the field of epidemiology: interdisciplinary collaboration, multilevel analysis, emergence of innovative technologies, and knowledge integration from basic, clinical and population sciences (5). Epidemiology can be a key translational discipline for addressing questions of current great societal importance such as the economics of health services, the aging of our population, the growing burden of common chronic diseases, the persistence of health disparities, and global health. The translational impact of epidemiology similarly must be achieved in an era of greater consumer awareness, open access to health and other types of information, and enhanced communications, via the web, mobile technologies and social media.
In 2012, the NCI initiated a conversation aiming to shape the future of cancer epidemiology and to establish priorities for action (6). Web-based blog posts, several commentaries (5, 6, 7, 8–11), online dialogue using social media (@NCIEpi #trendsinepi on Twitter), and an interdisciplinary workshop (12) informed the proposals presented herein. Table 1 outlines eight broad recommendations with proposed actions targeted to funding agencies, professional societies, and the research community. Many of these actions already feature prominently in epidemiologic research but a more systematic approach will be needed to increase the impact of epidemiology in the 21st century. While the recommendations presented here are focused on cancer epidemiology, we believe they apply to the whole field of epidemiology.
The imperatives of the 21st century require epidemiology to extend its reach beyond the historical perspective on etiology to embrace the continuum of early detection, treatment, prognosis through survivorship, and to become more effective in translating scientific discoveries into individual and population health impact (13). Epidemiology in academic institutions has traditionally focused on advancing discoveries, while epidemiology in public health and healthcare settings focuses on disease control and program implementation and evaluation. In cancer, cohort studies increasingly try to assess factors that impact natural history, response to interventions, and long-term survivorship (14). Along the full translational continuum (4), most epidemiologic research, however, still focuses on etiology and replication/characterization of findings (4). Funding agencies and research institutions need a more balanced epidemiologic portfolio including evaluation of interventions to develop evidence-based policies and guidelines, implementation strategies of applications in healthcare decisions and population health policy, and evaluation of impact, including benefits and harms of interventions in the “real world”. For example, as epidemiology has uncovered strong associations between tobacco and mortality from various diseases (15), it should increasingly focus on developing, implementing and evaluating pharmacologic, behavioral, policy and environmental interventions.
Moving from observation and discovery to the development and evaluation of interventions will require a better integration of clinical and community trials with large scale epidemiologic studies (3). As randomized clinical trials face increasing challenges due to expense, complexity and non-representativeness, it would be cost-effective and efficient to embed trials into preexisting epidemiologic registries such as large scale cohort studies. These trials can relatively easily enroll large numbers of subjects at relatively low cost. In Scandinavia, there are examples of trials that have already been successfully integrated into preexisting registries or administrative databases, often at low marginal cost (3). Moreover, there will be an increasing need to integrate observational epidemiologic studies into the NCI clinical trials infrastructure. Lastly, epidemiologic cohort studies can be cultivated for translational evaluation research, especially in the development and validation of biomarkers (10).
To extend the impact of epidemiology on translational efforts, epidemiologists need to become even more effective in team science (16) and translational research collaboration (17), as well as address multilevel determinants of diseases ranging from social and environmental determinants to biologic and molecular pathways and their interactions (18). Critical to this success is an enhanced effort by funding agencies and the research community to reward interdisciplinary and translational research. As such, the real value of epidemiology resides in informing both discovery research and translational research and embodying a broad perspective on the multilevel origins of disease and an appreciation for the need to apply incremental knowledge to advance population health (19).
Epidemiology has traditionally involved single teams with proprietary control of their data and specimens which they use effectively to publish and garner additional funding. The inner workings of protocols and analyses are typically invisible to outsiders and raw data rarely became available. This practice can adversely impact reproducibility, accountability, and efficiency (20). Peer-review usually depends on limited information communicated in a short scientific paper. Fragmentation of information and selective reporting are prominent and published information is difficult to integrate with other studies after the fact. These practices have led to the kind of criticisms mentioned earlier including repudiated or inconsistent findings and studies that offer small incremental, knowledge gains (3). The advent of genome-wide association studies has not only shown that reproducible results can be achieved with large enough sample sizes, but that new models of collaboration and data sharing can be developed (21). The time is right to ensure greater credibility of all epidemiologic studies by adopting a reproducibility culture through greater sharing of data, protocols, and analyses (22–24). Funding agencies can catalyze this transformation, since they are responsible for shaping the incentive system for science. One possibility is that funding can be based in part on the extent to which investigators adopt sharing of data and specimens (25). Scientific journals can contribute to this transformation by making availability of protocols, raw data, and analyses a prerequisite to publication (26). Concurrently, the scientific community can assist by adopting a culture of data sharing and collaboration. Such a culture shift can acquire value in the academic coinage for appointments, promotion, and awards, and is required to propel the field forward into a more consistent realm of scientific credibility.
This transformation has to address potential obstacles, such as legal, ethical, or pragmatic limitations that may not allow full transparency and availability of information in public view. Issues of informed consent restraints, privacy of participants, and the extra effort and resources needed to make data, protocols, and analyses available widely in sufficiently high quality and accessibility should be anticipated (22, 27). These issues are more prominent for studies that were designed in the past and continue data collection and/or analyses, but should be more straightforward to tackle in new studies. Nevertheless, other considerations must still be addressed including potential impact on participation rates and on the quality and types of data participants will be willing to provide.
One can consider multiple levels of transparency in access to information and decide what would be maximally attainable for each study (as suggested in Table 2). At a minimum, registration of datasets should be achievable for all epidemiologic studies, past and future (28). Funding agencies can support pilot studies and expert panels to assess the feasibility, advantages and disadvantages, and ways to optimize reporting. Some efforts would require creating and expanding existing repositories for information, and there is already substantial experience from some scientific fields, e.g. microarray experiments. Making data and protocols more accessible will accelerate harmonization of existing datasets, as in the case of collaborative efforts involving consortia, cohort studies and biobanks (29, 30). Expansion of open access repositories of data and biological specimens will require partnership among funding agencies, academic institutions, and scientific journals to create more incentives for data sharing, reproducibility and replication.
Case-control studies, the traditional workhorse of epidemiology, will continue to make strong contributions to the field in the next decade. In particular, these studies can contribute to in-depth examinations of patients with specific (and especially rare) cancers. Nevertheless, with increasing interest in early antecedents of disease and pre-diagnostic risk factors and biomarkers, large scale prospective cohort studies for disease etiology and outcomes will become increasingly important (and will undoubtedly include nested case-control components). Such studies should be conducted in informative populations, apply validated methods to measure genetic and environmental influences, and include pre-diagnostic data and biological samples In cancer risk cohort studies, organ-specific incidence remains a main outcome of interest for discovering etiology, but other outcomes can be studied as well. First, with advances in molecular tumor classification, we are distinguishing among cancer subtypes by means other than histopathology. Second, the expanding list of recognized precursors (e.g., colon polyps, Barrett’s esophagus) can provide insight because they occur years or decades before the development of cancer and progression to cancer is highly variable. Third, many etiological studies are expanding to include treatment and outcome information to allow the evaluation of response to interventions, and long-term survival. These efforts complement new and ongoing cancer patient cohorts designed to collect epidemiologic, clinical, genomic and detailed treatment information after a cancer diagnosis (14).
Ideally, the cohort study should collect information using a life course approach with documented medical histories and exposure information and appropriate biological tissue collection. Assembling a cohort with these key features is expensive and difficult within the United States health care system (31). In response, NCI and other research organizations have created approximations to a singular cohort by developing a consortium of multiple cohorts of over a million people followed for many years (32). In addition, efforts are underway to build cohorts within existing medical care delivery systems by linking epidemiologic data with electronic health records. Cohort studies can be conducted as consortia at multiple sites, combinations of existing ongoing studies, a single large site system, or centralized approach such as the one used by the United Kingdom Biobank which completed recruitment of more than half a million participants between 2007 and 2010 (33). Given the existence of many ongoing cohort studies, serious considerations need to be given to mapping and registering all existing prospective cohorts worldwide, harmonizing efforts in data collection and analyses, and expanding current disease-specific studies to include multiple outcomes and to incorporate early life exposures and pre-diagnostic information. Critical issues for success include collaboration and sharing, modern recruitment structures that facilitate outcome determination, utilizing comprehensive and flexible information technology, automated biological specimen processing, and broad stakeholder engagement (31). Better coordination and collaboration in funding by disease specific research agencies will be needed.
Cancer epidemiology is unusual because of the opportunity to work with two genomes – the germline genome that can be used to understand susceptibility to specific cancers, and the somatic genome of the cancers can sometime be used to understand the exposures that gave rise to the cancer by using mutational fingerprints of exposures, mutational determinants of tumor progression and recurrence, as well as drug sensitivity and resistance. Flagship projects such as the Cancer Genome Atlas (TCGA) have been mostly conducted on anonymized tumor samples (34, 35). Completing the life course approach by using tumor samples from cases of cancer that arise within cohort studies offer the opportunity to study the pre-diagnostic predictors of both cancer incidence and survival.
New technologies and platforms of biomarker measurement continuously become available for incorporation into epidemiologic studies. Examples include genomic, proteomic, metabolomic, non-coding RNA, epigenomic markers, mitochondrial DNA, telomerase platforms, infectious agent markers and microbiota, and immune marker profiles. Similarly, a wide array of environmental measurements using increasingly sophisticated sensor technologies may be measured in blood or other tissues, as well as incorporated into portable devices and mobile phones (9, 36, 37). Exploring the potential of the “exposome” may provide a way for assessing the impact of multiple exposures on key internal metabolic processes also using new lab-based technologies (38). It is premature to predict how these approaches will evolve in practice, but techniques for inexpensively sampling a wide array of exposures offer great conceptual appeal. Likewise, we cannot anticipate what new platforms will be available and ready for prime time even in a few years from now, but measurement capacity is likely to continue expanding at a rapid pace. What should be anticipated, however, is the need for careful attention to the proper collection, sampling, processing and storage of biological specimens to be interrogated with these evolving technologies, and the development of principles for their optimal use in epidemiological studies of all types. This need is particularly acute for cohort studies that collect biological samples today, but may assay these samples many years in the future using measurement platforms that were unknown at the time of sample collection.
Analytical methods for these platforms need to evolve and may need to account for platform-specific peculiarities as well as study design issues. An even greater challenge is how to integrate multiple platforms within the same analysis. These platforms are likely to offer complementary information, but may also have redundancies that need to be avoided. A series of carefully designed studies can move from proof-of-concept to wide-scale validation and successful application of these new technologies. As the possibilities for false leads and dead ends increase exponentially with each new measurement platform, methodological work is essential in evaluating any technology’s analytic performance, reproducibility, replication, disease associations, ethical and legal issues, and clinical utility (26).
The unquestionable reality of 21st century epidemiology is the tsunami of data spanning the spectrum of genomic, molecular, clinical, epidemiologic, environmental, and digital information. The amalgamation of data from these disparate sources has the potential to alter medical and public health decision making. Nevertheless, we currently do not have a firm grasp on how to systematically and efficiently tackle the data deluge. In 2012, the US government unveiled the “Big Data” Initiative with $200 million committed to research across several agencies (39). Epidemiologists have traditionally been involved in the collection and analysis of large data sets, and therefore should play a central role in directing the use of financial resources and institutional/organizational investment to build infrastructures for the storage and analysis of massive datasets. Critical to the implementation of big data science is the need for high-quality biomedical informatics, bioinformatics, and mathematics and biostatistics expertise.
The development of systematic approaches to robustly manage, integrate, analyze, and interpret large complex data sets is crucial. Overcoming the challenges of developing the architectural framework for data storage and management may benefit from the lessons learned and the knowledge gained from other disciplines (40). Adaptation of technological advancements like cloud-computing platforms, already in use by private industries (e.g. Amazon Cloud Drive and Apple iCloud), can further facilitate this virtual infrastructure and transform biomedical research and health care (41). The tasking challenges for integration of multiscale data to promote progress in research lies more in the realm of bioinformatics and in the unwieldy and politically-charged details related to data sharing (e.g. data sovereignty, buy-ins from stakeholders, see Recommendation #2) and to adoption of standards and metrics that can cross studies and disciplines. As we write this commentary, the National Institute of Standards and Technology (NIST) is sponsoring the “Cloud Computing and Big Data Workshop” precisely to deliberate on some of these pressing challenges (42). For data acquired from disparate sources, harmonization of definitions can be a challenge. The epidemiology community and funding agencies can integrate the insights gained from this NIST workshop towards better integration of big data science in future epidemiologic studies.
With data-intensive 21st century epidemiology, there is a need for a systematic approach to manage and synthesize large amounts of information (43). Knowledge integration is the process of combining information or data from many sources (and disciplines) in a systematic way to accelerate translation of discoveries into population health benefits. Knowledge integration also seeks to achieve the effective incorporation of new knowledge in the decisions, practices and policies of organizations and systems (13). As illustrated by Ioannidis et al in this issue, knowledge integration involves three interconnected components (8). First, knowledge management is a continuous process of identifying, selecting, storing, curating, and tracking relevant information across disciplines. Second, knowledge synthesis is a process of applying tools and methods for systematic review of published and unpublished data using a priori rules of evidence, including systematic reviews and meta-analysis. In addition, decision analysis and modeling can provide valuable additional synthesis tools to guide policy actions and clinical practice, even with disparate observational and RCT data (8). Third, knowledge translation utilizes synthesized information in stakeholder engagement and in influencing policy, guideline development, practice, and research. Moreover, performing meta-research (or research on research) analyses can aid in understanding evidence across research fields and can reveal patterns of study design, reporting, and biases (20).
A current limitation of knowledge integration is that researchers rely heavily on published literature, which tends to overly report positive associations due to selective reporting and other biases (44). Furthermore, raw data are rarely available to incorporate with the existing published results to uncover true associations. Ioannidis et al (8) outline future suggestions for knowledge integration that may diminish these biases. In knowledge management, there is a need for improved methods for mining published and unpublished data; registration of studies, datasets and protocols; availability of raw data and analysis codes; and facilitation of repeatability and reproducibility checks. With regard to knowledge synthesis, consortia that run analyses prospectively should optimize collaboration and communication. Prospective stakeholder engagement at the outset of a study is essential for knowledge translation (8).
Funding agencies and journals can also help knowledge integration efforts. They can facilitate the development and use of online tools and databases to capture published and unpublished data, datasets, studies, and protocols from funded epidemiologic studies. Journals can promote the publication of relevant “null results” to minimize publication bias, as the Cancer Epidemiology, Biomarker, and Prevention already does. The NIH and other funding agencies can also capitalize on the process of knowledge integration to systematically track existing research and resources to identify gaps and redundancies to guide future funding.
Academic training in modern epidemiology requires a problem-solving, action-oriented approach. Traditionally, epidemiologic investigations tend to end with the discovery of risk factors, and leave the translation of that research to others (45). There is a need to shift from epidemiologic research that is etiologic to that which is applied with a focus on innovation and translation (46). Ness (47) has further outlined a toolbox of evidence-based creativity programs to be incorporated into every epidemiology curriculum.
Core training of the next generation of epidemiologists should offer skills in integrating biology and epidemiology into studies of etiology and outcomes, mastering sufficient quantitative skills, understanding new quantitative methods, and integrating rapidly evolving measurement platforms (48). The epidemiologist of the 21st century will need deeper immersion in informatics and emerging technologies, as such skills are critical to appropriately leverage and interpret increasingly dense biological, clinical and environmental data across multiple sources and platforms.
At the same time, there is a need to reorient the training of practicing epidemiologists towards implementation and dissemination research. The training curriculum must be modified to adapt an interdisciplinary approach to graduate and post-doctoral education by equipping future epidemiologists with practical skills to meet the needs of modern epidemiologic research in collaboration, translation, and multi-level analysis (17). Training must incorporate concepts of knowledge integration to promote the most effective use of information from many sources to further accelerate translation of scientific discoveries into clinical and public health applications. Likewise there is a need for integration of epidemiologic concepts into training curricula for clinical and public health practitioners to meet the increasing challenge of translating scientific discoveries into population health benefits (4). Medical schools and schools of public health are beginning to work more closely to create a climate of collaboration and shared knowledge across disciplines that nurtures and rewards team efforts. This could include more encouragement for medical students and clinicians to get training in public health (e.g. MPH) and for epidemiology students and practitioners to get more exposure to basic and clinical sciences.
In an environment of funding limitations and rapid technology advances, funding agencies and the epidemiology community need to optimize their strategies for the most efficient use of data, biosamples, and other research resources. First, we should practice the art of bricolage, a critical attribute of resourcefulness, which refers to the novel use of available resources to construct new forms or ideas—creativity under constraints. Second, there needs to be a fair and transparent process to critically examine the criteria needed to discontinue, extend, or expand existing studies and to permit the funding of new cutting-edge studies. Some benefits can be achieved by extending existing cohorts to integrate data on multiple health-related endpoints. Optimization of resources include, leveraging biospecimens from existing biobanks, harnessing data gathered from various sources (e.g. health maintenance organizations (HMO), Medicare/Medicaid, and cancer registries), linking and mining information from electronic health records, randomized clinical trial networks, as well as other databases (e.g. census bureau) to perform research, test novel hypotheses, and discover novel exposures. For example, to characterize the natural history of HPV-associated carcinogenesis, molecular epidemiologists can capitalize on the samples stored in cervical cytology biobanks (49). Patient-provided data and health information can be collected and delivered, respectively, within an existing health care system (3). The Moffitt Cancer Center’s MyMoffitt Patient portal, for example, represents one archetype of this future approach (50). Current collaborations with the HMO Research Network can be encouraged, enhanced, and incentivized to conduct population-based research on a multitude of health-related outcomes (51). Investigators may expand their interest across the boundaries of different disease-specific endpoints and diverse biologic/genomic exposures (e.g. to include, stress and social variables), while keeping in mind the translational value of the research question (4, 5). As outlined in Recommendation #6, a robust knowledge integration process can be used to determine how best to allocate resources.
Optimizing resources for epidemiologic research will require the direct involvement of funding agencies to serve as active liaison with researchers to improve efficiency in the research process, communication, and management. The overarching push for epidemiology to more collaborative, interdisciplinary, and translational research also requires novel funding mechanisms and enlightened study review teams. Alternative avenues need to be explored to provide investigators with the incentives to abandon non-yielding research courses without causing disruption to their academic career and funding situation.
The eight broad recommendations and corresponding proposed actions presented here are intended to transform cancer epidemiology by enhancing transparency, multidisciplinary collaboration, and strategic applications of new technologies. The recommendations apply more broadly to the field of epidemiology, and should lay a strong scientific foundation for accelerated translation of scientific discoveries into individual and population health benefits. Clearly, more details are needed to address the opportunities and challenges that permeate each of these recommendations requiring further deliberation by the scientific and consumer communities. We invite ongoing conversation on how to strengthen the future of epidemiology using our cancer epidemiology matters blog (7).
The authors are grateful to the comments received on the NCI’s “Cancer Epidemiology Matters Blog” http://blog-epi.grants.cancer.gov/ and for Martin L. Brown for contributing to the NCI’s Trends in 21st Century Cancer Epidemiology workshop http://epi.grants.cancer.gov/workshops/century-trends/
Conflict of Interest Statement: None of the authors listed above have any relationships with commercial entities that may have a bearing on the relevant subject matter.