Modeling Disease Severity in Multiple Sclerosis Using Electronic Health Records
1Department of Neurology, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
2Harvard Medical School, Boston, Massachusetts, United States of America
3Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts, United States of America
4Department of Biostatistics, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
5Research Computing and Informatics Service, Partners HealthCare, Charlestown, Massachusetts, United States of America
6Department of Pediatrics, Boston Children’s Hospital, Boston, Massachusetts, United States of America
7Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, United States of America
8Center for System Biology, Massachusetts General Hospital, Boston, Massachusetts, United States of America
9Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts, United States of America
10Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America
11Laboratory of Computer Science, Massachusetts General Hospital, Charlestown, Massachusetts, United States of America
12Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
13i2b2/National Center for Biomedical Computing, Partners HealthCare, Boston, Massachusetts, United States of America
Wang Zhan, Editor
University of Maryland, College Park, United States of America
Received May 21, 2013; Accepted September 17, 2013.
This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Using a medical informatics framework and rigorous statistical methodology, our study showcases an approach that begins to harness routine EHR data for accurate identification of patients with a complex neurologic disease and for deriving a highly relevant clinical outcome heretofore only available in research studies. Specifically, our study leverages the EHR data of a large cohort of MS patients to provide the Multiple Sclerosis Severity Score, an important indicator of disease severity that is not part of routine medical records. Although the derived MSSS measure is not yet robust for research, this approach provides the first steps towards harnessing existing EHR data for patient-oriented research in neurological diseases that will enable exploration of the many unique features of the EHR data as EHR systems become widely adopted across the health care landscape.
Our approach embraces the rich complexity of the EHR data. The incorporation of sophisticated codified data and NLP-extracted narrative data improved the performance of the EHR algorithm to identify MS patients when compared to the approach relying only on ICD-9 codes. With this approach, we established a cohort of 5,495 MS patients, including a subset that is part of a patient cohort based at MS Center. This unique “virtual cohort” enables analyses that integrate the new EHR-derived variables with traditional clinical research data. Further, we demonstrated that NLP-extracted narrative data are necessary for generating an informative estimate for MSSS. As a demonstration of its clinical relevance, EHR-derived MSSS captures the difference between the two main subgroups of MS patients: relapsing-remitting patients who generally recover neurological function after a relapse and progressive patients who experience decline in function. With future improvement in EHR data and informatics methods, we will enhance the MSSS algorithm (to reach at least R
0.8) so that this surrogate measure may be potentially integrated into the EHR system to allow better monitor of patient outcomes and for research.
EHR data did not contribute meaningfully to the performance of the BPF algorithm, which can be almost entirely explained by variables obtained from the clinical cohort database: age of first symptom and disease duration. This illustrates the limitation that EHR variables considered here are not sufficient to inform every pertinent outcome measure. To provide a surrogate of brain volume, critical information to supplement EHR data can be obtained using questionnaires to ascertain age of symptom onset and disease duration. Thus, integration of EHR data and data from clinical research tools such as questionnaires provides a path for future investigations that leverage the strengths of both approaches. Brain volume is not routinely measured in clinical care, but it is correlated with disease course and is an important research measure in MS. Surrogate measures of brain volume derived from these combined approaches could enable the exploration of hypotheses that cannot be effectively investigated at smaller sample sizes, despite the use of more accurate measures. In the future, we plan to enhance the algorithm development for whole brain volume by applying automated feature selection methods to the entire narrative text based on the medical ontology systems such as SNOMED-CT instead of only expert-selected EHR variables. Further, disease duration may be derived if the date of the first neurological symptom can be captured by more sophisticated NLP capability.
Our study has two other limitations. First, our algorithms for MS were developed and tested within a single EHR system that links two major tertiary care hospitals and affiliates. We have not yet tested the portability of our algorithms. This is an important next step, as we will seek replication of the EHR algorithms for classifying MS and deriving MS disease outcomes in the EHR systems of other healthcare institutions. If proven portable, this approach promises efficient and cost-effective development of multi-center cohorts to address research questions highly relevant to neurological patients. It is reassuring that our group has developed a similar EHR algorithm for classifying rheumatoid arthritis and demonstrated its portability in two other academic medical centers with limited retraining of the algorithm 
The second limitation involves our current inability to finely dissect the temporal relationship between the EHR data and indicators of MS disease severity. Specifically, the EHR data used for algorithm development represent aggregate information as of the time of the MS data mart creation, and the latest available measures of BPF and MSSS from the MS Center clinical cohort do not necessarily occur after the aggregate information has been collected. Thus, our study demonstrated cross-sectional associations and should not be construed as predictive algorithms as this would imply that the EHR data occurred before the BPF or MSSS measures. As medical informatics technologies continue to improve the parsing of temporal relationships, truly predictive algorithms for brain volume and disease severity will emerge and be translated into the clinical arena to guide patient management.
In the age of personalized medicine, EHR data provide another complementary layer of biomedical data. The challenge is to integrate EHR data with other data to improve patient care. Our study in MS showcases an informatics approach that harnesses routine EHR data to derive MSSS, a well-accepted and clinically meaningful disease measure heretofore available only in research studies. If replicated, our novel informatics approach will enable the development of multi-center cohorts and facilitate testing of a variety of new hypotheses leveraging the unique features of the EHR data to address MS disease activity, comorbidities, treatment response and presymptomatic disease. These efforts also hold the promise of establishing automated monitors of an individual patient’s disease trajectory using EHR and aiding clinician’s task of delivering more individualized patient management. Finally, while MS was used as a proof of principle in this study, our approach has the potential of being applied in other complex neurological diseases.