|Home | About | Journals | Submit | Contact Us | Français|
Recent advancements in consumer directed personal computing technology have led to the generation of biomedically‐relevant data streams with potential health applications. This has catalyzed international interest in Patient Generated Health Data (PGHD), defined as “health‐related data – including health history, symptoms, biometric data, treatment history, lifestyle choices, and other information‐created, recorded, gathered, or inferred by or from patients or their designees (i.e. care partners or those who assist them) to help address a health concern.”(Shapiro et al., 2012) PGHD offers several opportunities to improve the efficiency and output of clinical trials, particularly within oncology. These range from using PGHD to understand mechanisms of action of therapeutic strategies, to understanding and predicting treatment‐related toxicity, to designing interventions to improve adherence and clinical outcomes. To facilitate the optimal use of PGHD, methodological research around considerations related to feasibility, validation, measure selection, and modeling of PGHD streams is needed. With successful integration, PGHD can catalyze the application of “big data” to cancer clinical research, creating both “n of 1” and population‐level observations, and generating new insights into the nature of health and disease.
In recent years, technological advancements have enabled consumers to interact with personal computing devices in ways that produce large amounts of consumer‐specific data. As personal devices have grown more portable and powerful, consumer‐directed applications have proliferated and have exponentially increased the breadth and depth of these data streams. Accelerometers, geolocators, and physiological sensors are now embedded in many personal computing devices. Some devices continue to exist in standalone, multipurpose computing form (e.g. smartphones, tablets, laptops, and desktops), others in uni‐ or oligo‐purpose “wearable” form (e.g. wristbands, belt clips, skin patches), and still others that are a hybrid of the two models (e.g. “smartwatches”). With varying amounts of active or passive consumer data entry, these devices can provide day to day or even hour to hour information about a person's location, diet, movement, symptoms, blood pressure, and heart rate.
Concurrently with these trends, the potential for “big data” to reveal insights about the external environment has gripped the public consciousness. Integrating multiple longitudinal data sources to predictively model complex events has long been a mainstay of activities as diverse as forecasting weather, choosing stocks, or assembling professional sports teams (Lewis, 2003). Entities in the for‐profit, non‐profit and academic spheres have recognized the ability of newer consumer‐specific data streams to predict human behavior and outcomes. For example, large retailers like Target use data on consumer habits to identify and engage specific consumers for marketing purposes (Duhigg, 2012). The increasing amounts of data from personal devices promise to further improve these capabilities.
In clinical care, we recognize that our patients' pathophysiological trends and events outside of clinic are at least as relevant to their health and disease as the brief snapshots of pathophysiology that are provided at the time of clinic visits. In the “big data” era, we can imagine using this information to predictively model disease states and to inform health‐promoting interventions. Indeed, many of the newer consumer‐specific data streams produce information that is biomedically relevant and which could inform research and clinical care. In this regard, an international dialog has emerged around health‐related data that come specifically from patients, outside of the more general consumer context. These data are termed “Patient‐generated health data” (PGHD) and defined as “health‐related data – including health history, symptoms, biometric data, treatment history, lifestyle choices, and other information‐created, recorded, gathered, or inferred by or from patients or their designees (i.e. care partners or those who assist them) to help address a health concern” (Shapiro et al., 2012).
As interest in PGHD has increased, we are now seeing a convergence in consumer‐directed personal technology and health‐related applications. Samsung and Apple have recently announced major digital health initiatives, with Apple's features integrated into their new operating system (iOS8) as “HealthKit” and partnerships announced with the Mayo Clinic and the EPIC electronic health record, (Weise, 2014; Munro, 2014).
From a research standpoint, some of the device‐generated PGHD of greatest interest include vital signs, stress levels, mood, physical activity, weight, diet, blood levels, medications, sleep patterns, tobacco and alcohol use, and environmental exposures (California Institute for Telecommunications and Information Technology, 2014). Under the more expansive PGHD definition, patient‐curated histories, diaries, risk assessments, and reports of health and functional status are also likely to contribute valuable information within the research context. Additionally, other types of data that are not specifically health‐related could be co‐opted to generate health related insights, such as geolocation, social, and financial information. Examples of PGHD with potential relevance to clinical research are provided in Table 1. In general, key features of PGHD are that: patients, not providers, capture and record these data; PGHD is obtainable outside of clinical encounters; PGHD is longitudinal, with the potential for repeated measures over time; and PGHD can be collected at high frequency intervals, enabling nearly continuous data streams over extended periods of observation, depending on the metric of interest.
Patient generated health data with potential usefulness for clinical research.
As a separate issue, it is increasingly clear that there is a major need to improve the design and conduct of clinical trials in biomedical research. In the current era, clinical trials are expensive, inefficient, and time‐consuming. While much has been written on these topics (Institute of Medicine 2010), these issues have had tangible consequences, including increasing political pressure on large clinical trial cooperative groups, and internal mandates among drug and device manufacturers to lower research and development costs. Perhaps most importantly, in many of the most significant areas of human suffering and disease such as oncology, the underlying scientific understanding of disease states is moving faster than the development and execution of clinical trials to address management considerations. This mismatch of science and practice leads to trial results that become quickly outdated, and lost opportunities to improve patient outcomes.
Against this background, PGHD may provide opportunities to address some of the current shortcomings of cancer clinical trials. In the observational context, PGHD can generate information that may inform hypotheses and design considerations related to future clinical trials. Within clinical trials, PGHD may increase the value of each patient contribution on a clinical trial by improving the characterization of previously unmeasured confounders, thus maximizing the information gained from each trial and decreasing required sample sizes for future studies. PGHD offers the potential to increase the number of clinical observations and data points per patient, leading to new scientific insights about the positive and negative effects of cancer treatments upon patient outcomes.
In the remainder of this review, we will discuss potential considerations related to the integration of PGHD into future clinical trials. We will illustrate how PGHD can contribute to findings that are generated by trials. We will also offer an agenda for methodological research that we believe is critical to informing future PGHD integration into studies, addressing some of the current barriers and limitations to the use of PGHD in clinical trials. Last, we will conclude with a vision for what fully integrated PGHD may mean within the clinical research environment of the future.
There are multiple potential ways in which PGHD can inform and improve the design, conduct, and output of clinical trials. We summarize a few of these ways here:
Monitoring of participants on clinical trials varies widely by context and by type of trial. Though intensive monitoring through frequent clinic visits and biological correlates are more common in earlier phase studies, the principle of associating treatment with biological effect retains relevance in all clinical trial settings. Insights gained from earlier phase studies may benefit from validation in advanced phase trials; further, inclusion of participants with varied underlying host phenotypes may require re‐affirmation of biological treatment effects in order to generalize study results to a larger population. In this context, PGHD may provide important longitudinal physiological data to further elucidate the effects of trial interventions upon host biology.
Example: Patient‐reported symptoms (a form of Patient‐Reported Outcomes, or PROs) can be reported electronically or by phone, inside or outside of clinic. Multiple prior studies have demonstrated that patient‐reported symptoms are more reliable and informative than clinician report (Basch, 2014). Further, patient‐reported symptoms can be obtained outside of clinic and therefore more frequently than clinician report (Judson et al., 2013). Other forms of PGHD, such as home heart rate or blood pressure monitoring, may soon complement patient‐reported symptoms to provide a more complete picture of day to day physiology. In oncology, several targeted therapeutics have on‐target effects associated with disease response, such as hypertension with VEGF/VEGFR inhibitors, or rash with EGF/EGFR inhibitors (Liu and Kurzrock, 2014). A more complete data stream of patient‐reported symptom and vital sign data for participants on clinical trials may allow for a clearer elucidation of on‐target physiological effects of therapeutic interventions.
Treatment‐related toxicity is a common and significant concern related to anti‐cancer therapeutics. In some instances, treatment itself causes substantial physiological perturbation, putting even “fit” individuals at increased risk of morbidity and mortality (Deeg and Sandmaier, 2010; Wood et al., 2013b). In other instances, a treatment may be tolerated well by most, but physiologically “vulnerable” individuals are at increased risk of treatment‐related harm. In some cases, there may be more than one acceptable treatment alternative, but with different levels of therapeutic intensity. PGHD may provide information to help distinguish who is most at risk for treatment‐related toxicity.
Example: A Comprehensive Geriatric Assessment (CGA) is a multi‐domain instrument of PGHD, components of which can be self‐administered by patients inside or outside of clinic. Common domains include functional status, nutritional status, psychological health and social support. The CGA can predict toxicities of medical or surgical anti‐cancer therapies, and distinguishes vulnerability even among individuals who have been assessed by clinicians to have “acceptable” performance status (Kim and Hurria, 2013). CGAs can also unmask deficits in individuals who are actively undergoing cancer treatment. Further, although physical performance items in CGAs such as gait speed are commonly physician‐performed, PGHD offers the ability to obtain physical performance‐based measurements outside of the clinic setting, such as daily physical activity and falls. CGAs are already being used to guide clinical trial participation and treatment recommendations for older adults with lymphoma (Vitolo et al., 2014) and other diseases. Looking to the future, it is likely that an entirely patient‐reported version of a CGA, complemented by novel PGHD streams, could be developed for individuals of all ages in order to predict and measure vulnerability before and during treatment on cancer clinical trials.
A patient‐reported symptom inventory was used to support the approval of Jakafi (ruxolitinib), a kinase inhibitor used in the management of myelofibrosis. PROs are frequently used to support labeling claims of novel therapeutics. Regulatory standards have been established regarding the design of PRO endpoints intended to support label claims (FDA, 2009); the key issues are reliability of the PRO measure, conceptual equivalence between the PRO measure and the endpoint definition, and relevance of the endpoint to the study population. These standards reflect methodological rigor that could guide the development of other types of measurement based endpoints, such as those based on PGHD, particularly those intended to reflect the patient's experience, because they require clear conceptual definition in addition to adequate measurement reliability, in order for the data to be interpretable.
Understanding the long‐term effects of treatment on patients' lives
In some clinical contexts, long‐term health related quality of life impairments are found in a significant minority of patients, despite “cure” of the underlying disease. The mechanism for this finding is not always clear, and may in part relate to symptom burden, among other contributors. PGHD may help to facilitate the longitudinal measurement of HRQOL and functional status outside of clinic, and may identify factors associated with long term HRQOL or functional status impairment.
Example: Patient‐reported symptom profiles can help provide understanding about the nature and severity of the proximal and distal symptom burden following anti‐cancer treatment. Recent work has demonstrated the feasibility of frequent, longitudinal symptom profile reporting even among very ill patient populations, such as individuals undergoing autologous or allogeneic stem cell transplantation (Wood et al., 2013a). Other research has demonstrated the prevalence of high symptom burden in stem cell transplant survivors who report impaired HRQOL (Bevans et al., 2014). Given the prevalence of long‐term HRQOL and functional status impairment in a quantifiable percentage of stem cell transplant recipients (Pidala et al., 2009), integrating longitudinal symptom profiling (Wood et al., 2012) and other PGHD streams such as sleep patterns (Jim et al., 2014), physical activity, diet, and social support into clinical trials may help to provide insight into why some individuals experience long term morbidity.
Unfortunately, it is not always clear why some participants on clinical trials are nonadherent to therapy or withdraw from studies despite the absence of disease progression or death. Presumably, toxicity may play a role in this, but other reasons are likely to contribute. PGHD has the potential to improve documentation of medication adherence, and to facilitate data streams that may provide insight into reasons for study discontinuation. As PGHD technology evolves, there may even be ways to monitor therapeutic drug levels or metabolites from home.
Example: Chronic myeloid leukemia is a disease in which adherence to a daily oral medication is critical to optimize long term outcomes (Ibrahim et al., 2011). Potential reasons for nonadherence range from medication side effects, to psychosocial factors, to financial strain (Dusetzina et al., 2014). PGHD has enabled the use of smartphone‐based technology that can track daily medication adherence (NCT01490983, 2014) as well as patient‐reported symptoms (Johnston et al., 2013). Similar strategies could be used in cancer clinical trials, potentially combined with other PGHD streams such as financial, social, or geolocation data, to understand reasons for treatment nonadherence or discontinuation in other contexts.
Predicting and understanding therapeutic outcomes
So far, we have discussed the integration of PGHD into cancer clinical trials as a way to understand intermediate or ancillary study endpoints. However, previous work has identified the ability of patient‐reported symptoms or HRQOL to predict outcomes of patients on clinical trials. We believe that baseline or early PGHD could be included in multivariate models as a potential variable to predict progression or survival following cancer treatment. Where baseline PGHD is informative regarding trial outcomes, it may have value as a stratification factor for randomization.
Example: In a recent study of 11 different cancer sites from 30 European Organization for Research and Treatment of Cancer (EORTC) randomized clinical trial data sets, investigators found that at least 1 HRQOL domain provided independent prognostic information (beyond clinical and sociodemographic variables) for each cancer site (Quinten et al., 2014). In another study, a symptom‐based lung score predicted overall survival and non‐relapse mortality in stem cell transplant recipients with chronic graft versus host disease (Palmer et al., 2014). Based on the prognostic signal seen to date in diverse cancer settings from infrequently collected HRQOL and symptom data, it is conceivable that PGHD streams of frequent, longitudinal symptom, HRQOL, functional status, and physical activity data may provide important prognostic and perhaps predictive information at multiple time points for patients on therapeutic clinical trials.
Developing interventions to improve adherence and outcomes on clinical trials
In addition to the role of PGHD in an analytical or predictive capacity, PGHD can also be used to facilitate supportive interventions on clinical trials. Some types of PGHD are amenable to targeting through behavioral approaches, with subsequent effects that can be measured with PGHD streams such as physical activity, diet, and sleep patterns. Other types of PGHD can be used to create alerts or to triage subsequent interventions, such as notifying health care providers about study participants who meet pre‐specified patient‐reported symptom or vital sign thresholds. PGHD could also be used to monitor adherence and to alert or remind participants about taking study drugs or following protocol procedures.
Example: In some instances, different PGHD streams can be brought together to develop and track the effects of interventions upon both PGHD and clinical endpoints. For example, a home‐based, unsupervised exercise intervention could be developed to improve the ability of participants to tolerate anti‐cancer therapies, with a goal of improving long‐term functional status and health‐related quality of life. PGHD (examples in parentheses) could be used to track adherence to the exercise intervention (accelerometry), the ability of participants to achieve target heart rates (vital signs), the effects of the intervention upon short term physiology (post‐exercise vital signs, sleep patterns, patient‐reported symptoms), and the effects of the intervention upon long term patient‐centered outcomes (patient‐reported functional status, health‐related quality of life). In a randomized fashion, those receiving the intervention could be compared to controls in order to determine the effect of the intervention upon the achievement of intermediate and long‐term clinical outcomes, with the breadth of collected PGHD used to analyze and interpret the results. A social component could be integrated into this design so that participants could monitor and support one another, and to the extent that these PGHD data are collected in a decentralized way, virtual web‐based recruitment and enrollment strategies could be considered. These features might be attractive to technology‐savvy individuals who are traditionally difficult to enroll onto clinical trials, such as adolescents and young adults (Wood and Lee, 2011).
Though there are many potential applications of PGHD to clinical research, we are in a very early stage of learning how to obtain and use these data effectively. Here, we summarize several of the issues that we think will be critical to address from a methodological standpoint moving forward:
Several PGHD streams are owned by companies with proprietary interests in the devices used to generate the data and the algorithms behind the data generation. What are the costs associated with acquiring sufficient devices per patient to generate the required data streams for a given study? What issues are involved in accessing and uploading raw data from different vendors? How much complexity is required to map and combine different PGHD streams into an analyzable dataset?
From a patient standpoint, will patients reliably wear devices that require proximity to the patient to generate data? Will patients reliably input data for PGHD streams that require patient data entry? Are there population subgroups who are uncomfortable with these types of data collection, and can these concerns be addressed? How will data missingness be handled? As new technologies are developed (e.g. replaceable skin adhesive patches to replace wristbands or belt clips), as PGHD streams take different forms, and as these technologies are applied to different patient populations, these exercises will need to be repeated.
Many PGHD streams have been developed as consumer devices rather than as research‐grade data sources. For example, whether a FitBit, Jawbone Up, Nike + FuelBand, or Actigraph GT3x + all measure activity and/or sleep in the same way, and whether the results from these devices are interchangeable, is unclear. Other emerging PGHD streams are also relatively untested in comparison to gold standard assessments, such as popular dietary/nutrition trackers in relationship to calorimetry. Outside the context of academic research, there have not been strong incentives to perform validation studies of these data sources, but such studies will be required in order to understand data quality. Clinical trialists will need to decide what level of measurement reliability between these devices and their ‘gold standard’, if one exists, is acceptable for each research study, as there may be tradeoffs to consider between feasibility, cost, and data quality.
We do not yet know which PGHD sources will be best suited to which research context. For example, which PGHD are most relevant to men with advanced prostate cancer – perhaps pedometry/accelerometry and patient‐reported symptoms including fatigue and pain? What is the potential role of sleep tracking in the adjuvant vs metastatic cancer setting, and in younger vs older adults? Are dietary PGHD obtained pre‐operatively useful in predicting outcomes after cancer surgery? If a comprehensive PGHD functional assessment can be developed that is analogous to the comprehensive geriatric assessment, in which settings (e.g. stem cell transplantation?) is this most likely to be useful? It is likely that a number of exploratory studies using multiple PGHD streams will need to be conducted in multiple cancer settings in order to determine which data outputs are most relevant to which contexts. In addition to statistical analysis, qualitative methods such as interviews with patients and other stakeholders is extremely productive in identifying relevant domains and in providing context for the interpretation of data (e.g. what does “total daily steps” indicate for patients with advanced cancer).
Some devices that provide PGHD qualify as medical devices (e.g. blood glucose monitors) and are subject to FDA review. Regulatory approval for a new medical device is sought by the manufacturer. However, the use of a device that substantially deviates from the intended or approved use may require additional review. In addition, clinical trial protocols including PGHD may raise new questions about risk and benefits to patients be considered during IRB review. As we design clinical research protocols with more complex collection of PGHD (e.g. multiple streams of data or devices that require the upload data to a web account), or new types of devices, we are paying particular attention to issues of data security and privacy, and other potential risks to the patient.
Perhaps most importantly, considerable effort will need to be devoted to making meaning out of large amounts of PGHD. Many PGHD streams produce continuous data across long periods of time. How these data streams interact with one another, and how they anchor against clinically relevant endpoints, will need to be investigated. For example, when we currently conceptualize “performance status” in clinical oncology, we think about two commonly used scales (Karnofksy Performance Status (KPS) and Eastern Cooperative Oncology Group Performance Status (ECOG PS)) which are scored from 0 to 100 or 0–4 and which are used by clinicians to assess patient level of day to day functioning. Though the accuracy and discriminatory capacity of these scales are limited, they are used to make important cancer‐related decisions (e.g. prescription of chemotherapy) and to evaluate the effects of cancer or cancer treatment upon patient outcomes. Can we conceptualize a new methodology for patient‐reported performance status in clinical oncology? An obvious possibility, simply adapting one of the existing scales to a patient‐reported version, such as the patient‐reported ECOG PS, has been attempted with some success (Basch et al., 2005). Current sources of PGHD now offer the ability to do something similar using a richer series of data sets with the promise of more clinically meaningful outcomes. However, much work will need to be done in order to understand which data sets to combine and how this should be done. For example, pedometry/accelerometry, patient‐reported functional status, and patient‐reported symptoms are potential candidates for such a composite measure, but we will need to learn how these data should be represented and combined, over what period of time, and whether these are the right PGHD streams, in order to draw conclusions about the clinical utility of a patient‐generated performance status index in comparison to existing measures. Other studies will need to be conducted to understand the association of PGHD with additional clinical endpoints. For example, given the expected cytokine dysregulation with T‐cell immunotherapy (Maude et al., 2014), is there a combination of patient‐reported symptoms, sleep patterns, activity levels, and blood pressures that is associated with early treatment effects or tumor responses to this treatment modality? Do certain constellations of PGHD predict long‐term clinical outcomes in these scenarios? In these analyses, how should these data be handled – as absolute values or as changes from baseline? How should data be compared to expected values in similar patient populations? A variety of modeling exercises will need to be conducted to begin to address these questions.
The current environment represents an interesting and exciting opportunity for the incorporation of these new forms of PGHD into cancer clinical trials. With recent announcements by major consumer technology companies, and increasing interest in PGHD throughout academic institutions and government entities, there is an unprecedented convergence of resources and interest in this area. However, we are living in the earliest days of this new era – though we have identified several possible opportunities in this review, much work needs to be done to identify, acquire, validate, combine, and model relevant PGHD streams so that these data can be useful in the research context and ultimately in clinical care.
What does the long‐term future of PGHD look like? In the current big data era, we are increasingly recognizing that new computational strategies and systems biological approaches will be necessary to mine and make sense of the genomic, proteomic, and phenotypic data that we are now generating in clinical care and research. PGHD represents an important new form of data to be added to this new way of looking at health and disease. From a broader perspective, a new paradigm is emerging: each individual has the ability to generate an analyzable “personal data cloud of billions of data points” that will ultimately help to catalog the transitions between, and predictors of, health and disease (Hood and Price, 2014; Chen et al., 2012). Though these data can be aggregated to drive population insights, the amount and complexity of data allows each person to serve as his or her own control over time, creating a series of “n of 1” studies.
The emergence of PGHD offers an exciting opportunity to catalyze the application of “big data” to the context of clinical cancer research. In the future, leveraging the power of multiple continuous, personalized data streams will allow us, as a research and clinical community, to derive maximal insight from the participation of each patient in each trial. Such an approach, we believe, will optimize trial efficiency, generate biological and clinical insights into cancer behavior and treatment response, and, in the end, respect the profound commitment that every participant in a clinical trial makes to the advancement of cancer research.
Wood William A., Bennett Antonia V., Basch Ethan, (2015), Emerging uses of patient generated health data in clinical research, Molecular Oncology, 9, doi: 10.1016/j.molonc.2014.08.006.