|Home | About | Journals | Submit | Contact Us | Français|
Comparative effectiveness research (CER) can efficiently and rapidly generate new scientific evidence and address knowledge gaps, reduce clinical uncertainty, and guide health care choices. Much of the potential in CER is driven by the application of novel methods to analyze existing data. Despite its potential, several challenges must be identified and overcome so that CER may be improved, accelerated, and expeditiously implemented into the broad spectrum of cancer care and clinical practice.
To identify and characterize the challenges to cancer CER, we reviewed the literature and conducted semi-structured interviews with 41 cancer CER researchers at the Agency for Healthcare Research and Quality (AHRQ)'s Developing Evidence to Inform Decisions about Effectiveness (DEcIDE) Cancer CER Consortium.
A number of datasets for cancer CER were identified and differentiated into an ontology of eight categories, and characterized in terms of strengths, weaknesses, and utility. Several themes emerged during development of this ontology and discussions with CER researchers. Dominant among them was accelerating cancer CER and promoting the acceptance of findings, which will necessitate transcending disciplinary silos to incorporate diverse perspectives and expertise. Multidisciplinary collaboration is required including those with expertise in non-experimental data, outcomes research, clinical trials, epidemiology, generalist and specialty medicine, survivorship, informatics, data, and methods, among others.
Recommendations highlight the systematic, collaborative identification of critical measures; application of more rigorous study design and sampling methods; policy-level resolution of issues in data ownership, governance, access, and cost; and development and application of consistent standards for data security, privacy, and confidentiality.
Rapid advances in cancer care continues through an accelerated pace of scientific discovery and technology development. Timely integration of developments into clinical practice is increasingly challenging, and it is imperative for more immediate, generalizable, and evidence-based information. Randomized controlled trials (RCTs) remain the gold standard for developing such information; however, this research design is not always feasible, practical, or sufficiently timely. Additionally, RCT designs limit generalizability of findings to heterogeneous patient populations or specific subgroups seen in clinical practice. 1-7
Cancer comparative effectiveness research (CER) holds great promise for meeting many shortcomings of RCTs. Though CER takes many forms, for this discussion, we focus on the Institute of Medicine's (IOM) definition of CER:
Comparative effectiveness research is the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition or to improve the delivery of care. The purpose of CER is to assist consumers, clinicians, purchasers, and policy makers to make informed decisions that will improve health care at both the individual and population levels.6, 8
The foundation of CER is understanding effectiveness in the context of large, heterogeneous populations. Propitiously, large population-based data are becoming increasingly available through advances in information technology and research methods in the form of secondary data collected for non-research purposes. By increasing our understanding of these data, CER stands to benefit immeasurably by these ever-growing repositories.
For cancer CER, these data originate from many different sources including electronic health records, registries, administrative data, observational studies, clinical trials, and others. Not all existing or secondary data are adequate, and each data source comes with its own unique challenges. Because secondary data originate from many different sources, they may be missing critical variables or have significant and systematic differences how variables are measured. . These differences impede the ability to confidently characterize important care processes and outcomes across data. An additional challenge is the lack of randomization, which makes controlling for relevant confounders critical. As a result, cancer care stakeholders are frequently uncomfortable acting on CER findings generated from these data sources.
A better understanding of data is necessary to improve data collection and methods development, to overcome the challenges facing cancer CER. To further this understanding, and help guide federal data and research partners, we reviewed the literature and met with over 40 cancer outcomes researchers and clinicians. Our goals were to: 1) develop a conceptual model for examining observational data in cancer CER; 2) characterize the strengths and limitations of current cancer CER data resources; 3) identify barriers in the conduct of cancer CER; and 4) formulate recommendations and guiding principles. While our focus was on secondary, observational data (i.e., non-randomized, retrospective), the findings we present are also applicable to any prospective data collection.
Literature was reviewed regarding current cancer care, cancer research data, and cancer comparative effectiveness research. This information helped inform the development of a conceptual model of data needs for cancer CER,9 and frame discussions with a convenience sample of cancer outcomes researchers associated with the Can-DEcIDE Consortium.a Participants were from multiple disciplines and included clinicians, clinical trials experts, epidemiologists, pharmacoepidemiologists, health services researchers, biostatisticians, clinical data managers, state public health workers, and informaticians. The majority of participants relied on federal or academic funding; individuals who relied on funding from industry or non-government third party payers were not targeted in the initial sampling frame. Applying snowball sampling, participants were asked to identify other researchers that may provide additional insight, and together comprised the study sample of 41 discussants.
Discussions were conducted individually and tailored according to each researcher's area of expertise. Guided by findings in the literature, discussions centered on the following: 1) identification of specific datasets for cancer CER; 2) utility of measures 3) data access or logistical challenges; 4) population/target and sampling; 5) data linking capabilities; 6) longitudinal follow-up in datasets; 7) temporality of data/measures; 8) data completeness; 9) data standardization, formatting, and documentation; and 10) data processing and required expertise.
The primary study team comprised an epidemiologist, pharmacoepidemiologist, biostatistician, health services researcher, and three cancer-focused physician researchers, all of whom conduct federally-funded patient centered cancer outcomes research. The study team met multiple times to summarize key informant interviews, integrate it with information from the literature, and organize the findings into categories and themes. These meetings were audiotaped to ensure capture of the entire discussion. Recommendations were collaboratively developed by the study team and reflect broad themes observed in the literature, results from the interviews, and specific issues or examples specified by multiple participants. Draft findings and recommendations were subsequently reviewed by select participants and other outcomes researchers to assure their accuracy and face validity. Lastly, the entire manuscript was reviewed by external experts participating in the AHRQ DEcIDE network.
We identified 46 relevant datasets from our study sample of cancer outcomes researchers. Participants themselves expressed different opinions with regard to which data were important, adequate, or weak for cancer CER. This variability highlighted the lack of standardized nomenclature associated with these data. Therefore, our first priority was to identify patterns with which we could organize existing datasets and broad themes. 10-29 Inspection of the datasets and their purposes revealed that a consistent nomenclature was needed before they could be easily organized to support CER. In response, an ontology was developed which divided the data into eight categories, including six “existing or fixed data” categories and two “hybrid” categories (Table 1).
Definitive empirical definitions for each category were difficult since they are not mutually exclusive. This was complicated by the fact that participants from different clinical and methodological specialties prioritized different characteristics of the datasets. Despite this, the study team reached consensus and unanimously agreed on the final ontology, which had face validity to internal and external reviewers, and provides useful classification and characterization of the datasets. Table 2 presents an illustrative sampling of what were perceived to be key datasets, assigned categories, and a summary of their strengths, limitations, and applicability for cancer CER, as-informed by the discussants and study team.
Several consistent themes emerged through the discussions and analysis: (1) There is a need for systematically identified, standardized measures to fill gaps and enhance data linkage and transferability. (2) Improvements in study design and population sampling are critical for CER studies to be meaningful. (3) Substantial issues exist regarding data ownership, access, governance, and cost. (4) Data security, privacy, and confidentiality remain paramount. (5) Broad multidisciplinary representation is needed to effectively address these CER data needs. These themes were consistent throughout the analysis and resonated with key informants and the study team.
We have developed a novel framework for organizing and characterizing cancer CER data together with relevant research needs. Based on the literature and key informant interviews, we propose a practical ontology regarding data resources and availability. The structure of this ontology was defined through a retrospective lens, by asking participants to nominate secondary sources of data that could be immediately leveraged or developed.
The retrospective lens provides a starting point from which a rational ontology can be developed. It allows us to define and characterize available data resources ready for cancer CER. And lastly, it provides a characterized delineation point, for transition from retrospective data models to prospective CER data models. Moving forward, we anticipate a transition to more frequent prospective and real-time data collection activities (electronic health records, continuously aggregating registries, rapid learning data systems). The ontology proposed from our study provides a foundational nomenclature from which to build future data resources. Increasingly clear throughout the fields of science and engineering is the need to organize and systematically structure data so that information can be maximally extracted to assist in prescribing the right treatment at the right time for a specific patient. Moreover, we need agreement and collaboration from the respective stakeholders of the multiple diverse systems for collecting data. This study highlights these realities and helps to point a practical way forward.
Our approach includes several limitations. Development of this ontology was challenged by a lack of mutual exclusivity among datasets and the diverse perspectives of participants. Our federally-funded study team was focused on describing datasets and CER opportunities with a government perspective. Our sampling was purposeful but not exhaustive; additional cancer CER datasets have likely been missed, and the relative impact on the ontology is not clear.
Despite these challenges, this work provides a practical ontology that is adaptive and can be upgraded over time. It provides a template for understanding the strengths and limitations of current CER data resources, and formulating recommendations and guiding principles to advance cancer CER.
We present recommendations corresponding to the major themes identified in this study, with a goal of informing the evolution of the CER data framework, resolving data gaps, and ultimately establishing a national data infrastructure for cancer CER. Our focus was on existing secondary, observational data, though findings we present are also applicable to prospective data collection and future data resources.
Inconsistent, incomplete measures and a lack of data standardization pose a substantial threat to improving public health through CER. Stakeholders (e.g., researchers, providers, payers) collect clinical, population, and health services data in numerous ways. Even within the research community, there are substantial differences of opinion on essential variables. This lack of consensus inhibits comparability across and within health datasets.
Intervention selection, exposure assignment, and outcome measures must be systematically identified and characterized. As a starting point, the study authors have recommended a framework for identifying measures across the cancer care continuum.9 Standards for how measures are defined, collected, and coded must be developed and broadly applied, even for very basic measures such as race and ethnicity. This issue extends to algorithms for defining meaningful measures and cohorts, or deriving complex treatments or outcomes. Lack of global standardization inhibits data pooling, comparability among multiple sources, and generalizability of findings in the context of population heterogeneity.30
A multidisciplinary panel of CER researchers, stakeholders, and their partners is required to address this diversity of measures and lack of data standardization. A goal of such an effort should be identification of a minimum basic set of essential measures in all new data collection initiatives, including standardized data definitions.
Additional measures must be identified which incorporate advances in medicine and health sciences. A key example is the enhancement of our national cancer registries’ collection of data on genetic markers. These tests, like the KRAS test, are increasingly able to provide predictive insight into intervention effectiveness for individual patients.31-33 Because of the potentially rapid and inconsistent adoption of these markers, multi-concept coding systems are necessary to capture (1) if the test was used, (2) test results, and (3) test characteristics. In addition genetic markers, federal and other payers could consider standardization of clinical markers such as stage, grade, and performance status. The current utilization of ICD-9 and Healthcare Common Procedural Coding System (HCPCS) codes is insufficient in this cancer-specific context. Furthermore, investment in measurement and methods research could facilitate the development of ‘crosswalks’ between existing measures and instruments.34, 35 This will enable the comparison of constructs between datasets and offer potential mechanisms for combining existing data, or supplementing missing information.36, 37
Increasingly relevant for cancer CER are intermediate outcomes, including patient reported outcomes.38-42 Historically, clinical research has focused on mortality, but through advances in cancer detection and treatment, patients are living longer and may not die from cancer. To enable better comparisons between treatments new measures are needed which go beyond life expectancy and better quantify side-effects, costs, and other trade-offs such as the probability of continuing to work or attending to family needs.43, 44 Patient treatment decisions are increasingly likely to be informed by factors such as these. Systems to capture these measures must be better integrated into clinical care data, and embedded in future datasets.30
To date, health care reform has focused on standards for patient care, transferability (Health Information Exchange [HIE]), and quality of care evaluations; however, CER also needs to be included as a priority component for improving health care. “Meaningful use” regulations offer significant incentives to standardize clinical data for transferability and interoperability, though these efforts are still nascent. Accordingly, CER stakeholder involvement is critical in the discussions between Centers for Medicare & Medicaid Services (CMS) and the Office of the National Coordinator for Health Information Technology (ONC), and must extend beyond meaningful use requirements for HIT development and requirements. NCI's cancer Biomedical Informatics Grid (caBIG) and Cancer Data Standards Registry and Repository (caDSR) have already developed an interoperable information technology (IT) infrastructure that offers standard rules, unified architecture, and common language to develop and use cancer research data.30 It is vital that open-source, open-access tools such as these remain at the forefront of integrating with health care data coding such as Systematized Nomenclature of Medicine – Clinical Terms (SNOMED CT), Logical Observation Identifiers Names and Codes (LOINC), or data interoperability such as HL7.
Many of the problems with existing data sources cannot be solved through data standardization or sophisticated statistical methods. For example, a greater quantity of data will not necessarily make CER studies more generalizable or reproducible; rather, CER study design issues need to be better understood and overcome, resulting in better quality data.
While the focus of this work is not statistical methods or study design, data and study methodology are inexorably connected. Recognizing this, future studies need to prospectively apply more advanced data collection, better study designs, and sampling frameworks. At the same time, investments need to be made in ways to reduce bias through advanced statistical methods.33 By funding research on study design issues in existing CER studies, we can develop better methods to apply toward future studies and data collection. It is also important to recognize that the advancement of complex methods requires consistency of measures and data interoperability described in the first recommendation.
The majority of data currently used for CER is collected for non-research purposes and is non-experimental with regard to most CER questions. Consequently, several significant sources of bias exist, some of which are correctable through advanced methods. Other sources of error are quantifiable, but cannot be adequately addressed. The development and application of better analytic methods can help overcome the design limitations of existing data. Propensity score matching and instrumental variable analysis are two important examples of statistical approaches that can capitalize on important data elements and advanced methods.
Many biases or data uncertainties can also be examined using specifically collected data or hybrid data sources. For example, linking administrative data to epidemiologic or clinical data (e.g., SEER-Medicare linked data45) creates powerful research resources that serve as models for other such efforts.46 Other approaches include ancillary or validation studies collecting new data on a subgroup of the main population, or an external population, to supplement missing information or to extrapolate the distribution of an important variable into the study population.47-50
Pragmatic trials can overcome many limitations of randomized clinical trials (namely, limited sample sizes and restrictive inclusion criteria). Pragmatic trials employ randomization but aim to make eligibility criteria and treatment decisions representative of “real world” settings.4 They also collect information on a broader number of risks, determinants, health outcomes, and events, either directly or through the novel and efficient use of other data sources (e.g., claims/administrative data). As such, they can yield more generalizable findings. In addition to these benefits, increased funding for pragmatic trials could also help spur methods development on sampling and design issues commonly seen in traditional CER studies.51
There are many large data resources and innumerous small datasets relevant to cancer CER. However, there are significant barriers limiting their use including political obstacles, costs, and administrative burden associated with data access. 25 Important and timely data are often closely controlled by those who collect the data. Even data from federally funded studies may languish as the investigative team exhausts its “first right of publication.” The potential benefits from additional data linkages are prevented by lack of access, cost, or tightly constrained data use agreements. For example, developing resources analogous to SEER-Medicare for the under-65 population is imminently feasible by linking registry data to private payer data. However, efforts to do so have commonly met with reluctance on the part of the payers and even registries. For these groups, research is not a primary priority, and the risks or “unknowns” are perceived to outweigh the prospective benefits.
There are practical solutions to identify CER-relevant datasets and facilitate their acquisition.52 This includes development of codified relationships among federal agencies, their contractors, and many data-holders.25 For example, the individual SEER or NPCR registries could approve a single data acquisition process to be followed for all federally-contracted CER studies, which may relieve administrative burden. The National Cancer Institute's Central Institutional Review Board (IRB) may serve as a useful analog, as it was designed to relieve the work of the multitude of institutional IRBs.53 However, it provides a cautionary tale, as the centralized IRB has been criticized for replacing rather than relieving the work needed to open a study.54 Other examples include the broad DUAs between Medicare and important epidemiologic cohorts such as the Women's Health Initiative (WHI) study. Similar agreements could be developed for important cancer studies, making them more accessible to the research community.
Standardized relationships between state and federal agencies would help data-holders be reassured that their data will be used appropriately, while distilling data acquisition logistics to a formulaic process. These relationships would also help facilitate the timeliness of data for research and enable quick turn-around on important questions. Regarding access to costly or proprietary datasets, government stakeholders (e.g., AHRQ, NCI) may consider directly lending their weight to developing special agreements for select restricted or tightly-held datasets.
There may be utility in centrally-brokered and managed data subscriptions based on standing data use agreements. For example, states such as Maine and Oregon have implemented requirements that payers deposit “shadow claims” to public health agencies for purposes of quality improvement and informing policy decisions.55 Formal mechanisms could be established to facilitate the updating and regular access to such data for CER.
While access to data must be improved, data security, privacy, and confidentiality are critical, and remain top concerns.56 Moreover, there are multiple laws and regulations governing the maintenance, release, and use of many datasets, such as Medicare or Medicaid claims.
Two short-term practical opportunities warrant further exploration. First, at the state-level, health information exchange (HIE) is focusing on standardization of electronic health records and rules governing data transfer and use. It is prudent that the federal CER agenda be represented as new processes and regulations continue to be defined.57-60 Second, developing a CER data security and utilization “accreditation” system may help assure compliance with regulations. This would ensure a baseline level of IT sophistication that facilitates data use while assuring data vendors that accredited research sites are top-tier, “safe” data custodians. Examining the Centers for Medicare and Medicaid (CMS) requirements of their quality improvement organizations (QIOs) 61 may be a first step to developing such accreditation systems.
The recommendations (from methods to policy) presented by this study's discussants are a microcosm of the larger CER discussion and highlight many differences in the cultures, values, terminology, measures, approaches, and priorities relevant for cancer CER, and are a microcosm of the larger CER discussion.
Multidisciplinary representation is necessary to adequately capture important differences in the cultures, values, and terminology surrounding perceptions of cancer CER and data issues.26 Accordingly, a critical step will be identifying individuals who can represent their disciplines (and industries) to optimally advance the cancer CER discussion. To be successful, these individuals must not only be technical experts, they must also be mavens and translators who can bridge technical and disciplinary gaps to identify and achieve solutions.62 Supporting the identification and ongoing communication of such a group will be important to drive CER data needs forward.
Beyond informing cancer CER and its data needs, it is important that a multidisciplinary advisory group be well represented in the context of health care policy reform. It will be vital to engage these diverse groups and address these issues in a timely and consistent manner – the recently established Patient Centered Outcomes Research Institute (PCORI) is the obvious choice to lead such an effort.63 PCORI can identify members of the research community and partner with federal agencies. Both groups represent research need and interests and help define the future of CER in the context of health reform. Additionally, PCORI is well positioned to address other needs, such as maintaining a cancer CER data inventory, and perhaps similar registries for protocols including those with null results. The Registry of Patient Registries project is a promising project to begin to address this need.64, 65
By leveraging secondary data we can fill gaps and provide timely, valid, scientific knowledge to systematically conduct CER and improve cancer care and outcomes. However, substantial engagement is required from many organizations in order to address the issues outlined here. Multidisciplinary individuals within these organizations need to be identified who can help facilitate solutions in order for CER to reach its full potential.
The data ontology and recommendations we present provide guidance for critical discussions between multidisciplinary teams of cancer researchers, methods experts, and other stakeholders. They align with previous calls for infrastructure development to support cancer research and CER.13, 15 Together they provide a template for systematically addressing cancer CER data needs. By understanding and overcoming weaknesses in current data, we can accelerate the pace of cancer CER, and ultimately enhance the adoption of CER findings to improve patient-centered care and outcomes.
We thank Timothy S. Carey, MD, MPH; and Janet K. Freburger, PhD; for their review and feedback which informed and strengthened this manuscript. We thank the anonymous reviewers from the Agency for Healthcare Resaerch and Quality (AHRQ)'s Effective Healthcare Program manuscript review system for their constructive suggestions and comments. This work was supported by funding from AHRQ through the Cancer DEcIDE Comparative Effectiveness Research Consortium, contract HHSA290-205-0040-I-TO4-WA5 – Data Committee for the DEciDE Cancer Consortium.
Dr. Abernethy has research funding from the US National Institutes of Health, US Agency for Healthcare
Research and Quality, Robert Wood Johnson Foundation, Pfizer, Eli Lilly, Bristol Meyers Squibb, Helsinn Therapeutics, Amgen, Kanglaite, Alexion, Biovex, DARA Therapeutics,Novartis, and Mi-Co; these funds are all distributed to Duke University Medical Center to support research. In the last 2 years she has had nominal consulting agreements (<$10,000) with Helsinn Therapeutics, Amgen, and Novartis.
*The authors comprise the Data Committee of the Agency for Healthcare Research and Quality's Cancer DEcIDE Comparative Effectiveness Research Consortium: http://www.effectivehealthcare.ahrq.gov/index.cfm/who-is-involved-in-the-effective-health-care-program1/about-the-decide-network/
The views expressed in this article are those of the authors, and no official endorsement by the Agency for Healthcare Research and Quality or the U.S. Department of Health and Human Services is intended or should be inferred.
aAssociated with UNC Can-DEcIDE from: the University of North Carolina at Chapel Hill, Duke University, the Centers for Disease Control and Prevention, the Brigham and Women's Hospital, the University of Virginia, the Epidemiologic Research and Information Center at the Durham Veteran's Affairs Medical Center, the NC Central Cancer Registry, Blue Cross and Blue Shield of NC, Agency for Healthcare Research and Quality, and the National Cancer Institute.