We have developed a novel framework for organizing and characterizing cancer CER data together with relevant research needs. Based on the literature and key informant interviews, we propose a practical ontology regarding data resources and availability. The structure of this ontology was defined through a retrospective lens, by asking participants to nominate secondary sources of data that could be immediately leveraged or developed.
The retrospective lens provides a starting point from which a rational ontology can be developed. It allows us to define and characterize available data resources ready for cancer CER. And lastly, it provides a characterized delineation point, for transition from retrospective data models to prospective CER data models. Moving forward, we anticipate a transition to more frequent prospective and real-time data collection activities (electronic health records, continuously aggregating registries, rapid learning data systems). The ontology proposed from our study provides a foundational nomenclature from which to build future data resources. Increasingly clear throughout the fields of science and engineering is the need to organize and systematically structure data so that information can be maximally extracted to assist in prescribing the right treatment at the right time for a specific patient. Moreover, we need agreement and collaboration from the respective stakeholders of the multiple diverse systems for collecting data. This study highlights these realities and helps to point a practical way forward.
Our approach includes several limitations. Development of this ontology was challenged by a lack of mutual exclusivity among datasets and the diverse perspectives of participants. Our federally-funded study team was focused on describing datasets and CER opportunities with a government perspective. Our sampling was purposeful but not exhaustive; additional cancer CER datasets have likely been missed, and the relative impact on the ontology is not clear.
Despite these challenges, this work provides a practical ontology that is adaptive and can be upgraded over time. It provides a template for understanding the strengths and limitations of current CER data resources, and formulating recommendations and guiding principles to advance cancer CER.
We present recommendations corresponding to the major themes identified in this study, with a goal of informing the evolution of the CER data framework, resolving data gaps, and ultimately establishing a national data infrastructure for cancer CER. Our focus was on existing secondary, observational data, though findings we present are also applicable to prospective data collection and future data resources.
1. There is a need for systematically identified, standardized measures to fill data gaps and enhance linkages and transferability
Inconsistent, incomplete measures and a lack of data standardization pose a substantial threat to improving public health through CER. Stakeholders (e.g., researchers, providers, payers) collect clinical, population, and health services data in numerous ways. Even within the research community, there are substantial differences of opinion on essential variables. This lack of consensus inhibits comparability across and within health datasets.
Recommendation 1a: Systematically identify necessary measures including uniform definitions and standardization of collection and coding
Intervention selection, exposure assignment, and outcome measures must be systematically identified and characterized. As a starting point, the study authors have recommended a framework for identifying measures across the cancer care continuum.9
Standards for how measures are defined, collected, and coded must be developed and broadly applied, even for very basic measures such as race and ethnicity. This issue extends to algorithms for defining meaningful measures and cohorts, or deriving complex treatments or outcomes. Lack of global standardization inhibits data pooling, comparability among multiple sources, and generalizability of findings in the context of population heterogeneity.30
A multidisciplinary panel of CER researchers, stakeholders, and their partners is required to address this diversity of measures and lack of data standardization. A goal of such an effort should be identification of a minimum basic set of essential measures in all new data collection initiatives, including standardized data definitions.
Recommendation 1b: Develop and incorporate new measures and dataset crosswalks to address gaps among current data resources
Additional measures must be identified which incorporate advances in medicine and health sciences. A key example is the enhancement of our national cancer registries’ collection of data on genetic markers. These tests, like the KRAS test, are increasingly able to provide predictive insight into intervention effectiveness for individual patients.31-33
Because of the potentially rapid and inconsistent adoption of these markers, multi-concept coding systems are necessary to capture (1) if the test was used, (2) test results, and (3) test characteristics. In addition genetic markers, federal and other payers could consider standardization of clinical markers such as stage, grade, and performance status. The current utilization of ICD-9 and Healthcare Common Procedural Coding System (HCPCS) codes is insufficient in this cancer-specific context. Furthermore, investment in measurement and methods research could facilitate the development of ‘crosswalks’ between existing measures and instruments.34, 35
This will enable the comparison of constructs between datasets and offer potential mechanisms for combining existing data, or supplementing missing information.36, 37
Increasingly relevant for cancer CER are intermediate outcomes, including patient reported outcomes.38-42
Historically, clinical research has focused on mortality, but through advances in cancer detection and treatment, patients are living longer and may not die from cancer. To enable better comparisons between treatments new measures are needed which go beyond life expectancy and better quantify side-effects, costs, and other trade-offs such as the probability of continuing to work or attending to family needs.43, 44
Patient treatment decisions are increasingly likely to be informed by factors such as these. Systems to capture these measures must be better integrated into clinical care data, and embedded in future datasets.30
Recommendation 1c: Establish data architecture and systems standards for collecting and communicating these measures among health care delivery and financing organizations and researchers
To date, health care reform has focused on standards for patient care, transferability (Health Information Exchange [HIE]), and quality of care evaluations; however, CER also needs to be included as a priority component for improving health care. “Meaningful use” regulations offer significant incentives to standardize clinical data for transferability and interoperability, though these efforts are still nascent. Accordingly, CER stakeholder involvement is critical in the discussions between Centers for Medicare & Medicaid Services (CMS) and the Office of the National Coordinator for Health Information Technology (ONC), and must extend beyond meaningful use requirements for HIT development and requirements. NCI's cancer Biomedical Informatics Grid (caBIG) and Cancer Data Standards Registry and Repository (caDSR) have already developed an interoperable information technology (IT) infrastructure that offers standard rules, unified architecture, and common language to develop and use cancer research data.30
It is vital that open-source, open-access tools such as these remain at the forefront of integrating with health care data coding such as Systematized Nomenclature of Medicine – Clinical Terms (SNOMED CT), Logical Observation Identifiers Names and Codes (LOINC), or data interoperability such as HL7.
2. Improvements in study design and population sampling are critical for CER studies to be meaningful
Many of the problems with existing data sources cannot be solved through data standardization or sophisticated statistical methods. For example, a greater quantity of data will not necessarily make CER studies more generalizable or reproducible; rather, CER study design issues need to be better understood and overcome, resulting in better quality data.
While the focus of this work is not statistical methods or study design, data and study methodology are inexorably connected. Recognizing this, future studies need to prospectively apply more advanced data collection, better study designs, and sampling frameworks. At the same time, investments need to be made in ways to reduce bias through advanced statistical methods.33
By funding research on study design issues in existing CER studies, we can develop better methods to apply toward future studies and data collection. It is also important to recognize that the advancement of complex methods requires consistency of measures and data interoperability described in the first recommendation.
Recommendation 2a: Develop methods to leverage existing data, overcome data limitations, and reduce bias
The majority of data currently used for CER is collected for non-research purposes and is non-experimental with regard to most CER questions. Consequently, several significant sources of bias exist, some of which are correctable through advanced methods. Other sources of error are quantifiable, but cannot be adequately addressed. The development and application of better analytic methods can help overcome the design limitations of existing data. Propensity score matching and instrumental variable analysis are two important examples of statistical approaches that can capitalize on important data elements and advanced methods.
Many biases or data uncertainties can also be examined using specifically collected data or hybrid data sources. For example, linking administrative data to epidemiologic or clinical data (e.g., SEER-Medicare linked data45
) creates powerful research resources that serve as models for other such efforts.46
Other approaches include ancillary or validation studies collecting new data on a subgroup of the main population, or an external population, to supplement missing information or to extrapolate the distribution of an important variable into the study population.47-50
Recommendation 2b: Facilitate the conduct and completion of pragmatic trials for CER
Pragmatic trials can overcome many limitations of randomized clinical trials (namely, limited sample sizes and restrictive inclusion criteria). Pragmatic trials employ randomization but aim to make eligibility criteria and treatment decisions representative of “real world” settings.4
They also collect information on a broader number of risks, determinants, health outcomes, and events, either directly or through the novel and efficient use of other data sources (e.g., claims/administrative data). As such, they can yield more generalizable findings. In addition to these benefits, increased funding for pragmatic trials could also help spur methods development on sampling and design issues commonly seen in traditional CER studies.51
3. Issues of data ownership, access, governance, and cost are substantial
There are many large data resources and innumerous small datasets relevant to cancer CER. However, there are significant barriers limiting their use including political obstacles, costs, and administrative burden associated with data access. 25
Important and timely data are often closely controlled by those who collect the data. Even data from federally funded studies may languish as the investigative team exhausts its “first right of publication.” The potential benefits from additional data linkages are prevented by lack of access, cost, or tightly constrained data use agreements. For example, developing resources analogous to SEER-Medicare for the under-65 population is imminently feasible by linking registry data to private payer data. However, efforts to do so have commonly met with reluctance on the part of the payers and even registries. For these groups, research is not a primary priority, and the risks or “unknowns” are perceived to outweigh the prospective benefits.
Recommendation 3: Develop systems to facilitate timely data sharing for research supporting the public good
There are practical solutions to identify CER-relevant datasets and facilitate their acquisition.52
This includes development of codified relationships among federal agencies, their contractors, and many data-holders.25
For example, the individual SEER or NPCR registries could approve a single data acquisition process to be followed for all federally-contracted CER studies, which may relieve administrative burden. The National Cancer Institute's Central Institutional Review Board (IRB) may serve as a useful analog, as it was designed to relieve the work of the multitude of institutional IRBs.53
However, it provides a cautionary tale, as the centralized IRB has been criticized for replacing rather than relieving the work needed to open a study.54
Other examples include the broad DUAs between Medicare and important epidemiologic cohorts such as the Women's Health Initiative (WHI) study. Similar agreements could be developed for important cancer studies, making them more accessible to the research community.
Standardized relationships between state and federal agencies would help data-holders be reassured that their data will be used appropriately, while distilling data acquisition logistics to a formulaic process. These relationships would also help facilitate the timeliness of data for research and enable quick turn-around on important questions. Regarding access to costly or proprietary datasets, government stakeholders (e.g., AHRQ, NCI) may consider directly lending their weight to developing special agreements for select restricted or tightly-held datasets.
There may be utility in centrally-brokered and managed data subscriptions based on standing data use agreements. For example, states such as Maine and Oregon have implemented requirements that payers deposit “shadow claims” to public health agencies for purposes of quality improvement and informing policy decisions.55
Formal mechanisms could be established to facilitate the updating and regular access to such data for CER.
4. Data security, privacy, and confidentiality remain paramount
While access to data must be improved, data security, privacy, and confidentiality are critical, and remain top concerns.56
Moreover, there are multiple laws and regulations governing the maintenance, release, and use of many datasets, such as Medicare or Medicaid claims.
Recommendation 4: Develop systems to assure data security, privacy, and confidentiality with any enhanced access to data for CER
Two short-term practical opportunities warrant further exploration. First, at the state-level, health information exchange (HIE) is focusing on standardization of electronic health records and rules governing data transfer and use. It is prudent that the federal CER agenda be represented as new processes and regulations continue to be defined.57-60
Second, developing a CER data security and utilization “accreditation” system may help assure compliance with regulations. This would ensure a baseline level of IT sophistication that facilitates data use while assuring data vendors that accredited research sites are top-tier, “safe” data custodians. Examining the Centers for Medicare and Medicaid (CMS) requirements of their quality improvement organizations (QIOs) 61
may be a first step to developing such accreditation systems.
5. Broad multidisciplinary representation is necessary to effectively address these CER data needs
The recommendations (from methods to policy) presented by this study's discussants are a microcosm of the larger CER discussion and highlight many differences in the cultures, values, terminology, measures, approaches, and priorities relevant for cancer CER, and are a microcosm of the larger CER discussion.
Recommendation 5a: A collaborative, multidisciplinary approach must be emphasized to successfully address data needs for CER
Multidisciplinary representation is necessary to adequately capture important differences in the cultures, values, and terminology surrounding perceptions of cancer CER and data issues.26
Accordingly, a critical step will be identifying individuals who can represent their disciplines (and industries) to optimally advance the cancer CER discussion. To be successful, these individuals must not only be technical experts, they must also be mavens and translators who can bridge technical and disciplinary gaps to identify and achieve solutions.62
Supporting the identification and ongoing communication of such a group will be important to drive CER data needs forward.
Recommendation 5b: CER stakeholders must be engaged and coordinated in the development of rules and standards to inform health reform
Beyond informing cancer CER and its data needs, it is important that a multidisciplinary advisory group be well represented in the context of health care policy reform. It will be vital to engage these diverse groups and address these issues in a timely and consistent manner – the recently established Patient Centered Outcomes Research Institute (PCORI) is the obvious choice to lead such an effort.63
PCORI can identify members of the research community and partner with federal agencies. Both groups represent research need and interests and help define the future of CER in the context of health reform. Additionally, PCORI is well positioned to address other needs, such as maintaining a cancer CER data inventory, and perhaps similar registries for protocols including those with null results. The Registry of Patient Registries project is a promising project to begin to address this need.64, 65