|Home | About | Journals | Submit | Contact Us | Français|
The increasing use of electronic medical record (EMR) systems for documenting clinical medical data has led to EMR data being increasingly accessed for clinical trials. In this study, a database of patients who were prescribed statins for the first time was developed using EMR data. A clinical data mart (CDM) was developed for cohort study researchers.
Seoul St. Mary's Hospital implemented a clinical data warehouse (CDW) of data for ~2.8 million patients, 47 million prescription events, and laboratory results for 150 million cases. We developed a research database from a subset of the data on the basis of a study protocol. Data for patients who were prescribed a statin for the first time (between the period from January 1, 2009 to December 31, 2015), including personal data, laboratory data, diagnoses, and medications, were extracted.
We extracted initial clinical data of statin from a CDW that was established to support clinical studies; the data was refined through a data quality management process. Data for 21,368 patients who were prescribed statins for the first time were extracted. We extracted data every 3 months for a period of 1 year. A total of 17 different statins were extracted. It was found that statins were first prescribed by the endocrinology department in most cases (69%, 14,865/21,368).
Study researchers can use our CDM for statins. Our EMR data for statins is useful for investigating the effectiveness of treatments and exploring new information on statins. Using EMR is advantageous for compiling an adequate study cohort in a short period.
Recently, there has been a rapid increase in the use of electronic medical records (EMRs). These computerized database (DB) system records eliminate the need for hand writing medical information, such as symptoms, diagnoses, and clinical results, on paper . Concerns exist about the use of EMR for clinical research owing to a lack of random sampling and subsequent generalizability . However, several recent reports have presented the advantages of EMR [3,4,5,6]. EMR data are systematically managed and easily accessed, enabling the ability to collect information about a patient's current medical, past medical, family, and therapeutic histories. EMR data also include interdisciplinary clinical treatments and prescriptions from other departments within the same clinical center. With the increasing use of EMR systems for documenting clinical medical data, clinical trials have also increasingly accessed EMR data [7,8,9].
Randomized controlled trials (RCTs), in which medical interventions are conducted in a targeted patient population, are preferred in clinical trials when testing the efficacy and effectiveness of a drug or treatment, or when documenting the progression of a disease. Strict standards are generally applied to participant selection for RCTs to ensure the validity of the results. However, RCTs are expensive and time-intensive; consequently, study enrollment is often limited. EMR-based studies are similar in structure to a cohort study; however, the use of EMR data in studies allows for quick and simple extraction of large amounts of data that were collected over long periods of time. This capability will become a greater advantage as more EMR data are accumulated over time [10,11,12]. Therefore, EMR-based clinical research requires a standardized clinical data mart (CDM) that various researchers can employ to readily extract necessary data depending on their diverse research objectives.
In this study, 3-hydroxy-3-methylglutaryl-coenzyme A reductase inhibitor (statin) was selected. It is the predominantly prescribed statin for preventing cardiovascular disorders . Statins are used to prevent cardiovascular disease by lowering low density lipoprotein cholesterol (LDL-C) and triglyceride (TG) levels and increasing high density lipoprotein cholesterol (HDL-C) levels [14,15]. Several types of statins exist on the market with differing efficacies, and numerous clinical trials have compared their respective effects [14,15]. The primary purpose of this study was to develop a clinical statin data mart to address various purposes, such as assessments of drug efficacy and safety. We intended to aggregate a large amount of data on patients who were prescribed a statin for the first time to enable researchers to conduct their relevant studies. By establishing a CDM that includes patient personal data, medical history, medication history, and other patient information relating to statins, we strived to facilitate rapid and effective access to diverse patient information and data. We furthermore strived to open a CDM for authorized researchers and other users to share and enhance study results through open technology.
Directly extracting research data via a query from the EMR system can impact the performance of the system for routine hospital use. A clinical data warehouse (CDW) provides numerous benefits to researchers for quality data collection and decision-making capabilities through quick and efficient access to patient information and links to multiple operational data sources. Furthermore, it can be used to discover disease relationships and drugs in addition to repositioning drugs by combining different data sources and validating the consistency of information. Therefore, we established the study dataset to include diagnosis and laboratory data based on a CDW system . The CDW system of Seoul St. Mary's Hospital is currently comprised of 30 tables that include all clinical, prescription, laboratory, radiology, pathology, and other information of all patients at the hospital since 1997. Currently, data exist in the system for approximately 2.8 million patients, 47 million medication prescription events, and laboratory results for 150 million cases. Of the 2.2 billion total records, we developed a research DB from a subset of the data based on a study protocol. It additionally provides comprehensive views of clinical data for specific purposes [17,18].
Data for patients who were prescribed a statin for the first time at Seoul St. Mary's Hospital between January 1, 2009 and December 31, 2015 were extracted from the CDW. Cases were identified as those who did not have a statin prescription for at least six months before a statin was initially prescribed. The date on which the initial statin prescription occurred was defined as visit 0 (index date, baseline) (Fig. 1). Visit 1 (an average of 3 months later) was defined as the next occurrence of a laboratory test and subsequent renewal of the statin prescription within 45 to 135 days of the baseline. Visit 2 was defined as the subsequent visit that occurred within 136 to 225 days after the baseline (an average of 6 months later). Visit 3 was defined as 226 to 315 days after the baseline (an average of 9 months later), and visit 4 was defined as 316 to 405 days after the baseline (an average of 1 year later). Cases were checked when the prescription changed to a different statin type or the statin prescription was suspended during the study period. When the patients visited the hospital or had a blood test performed more than once within one period of a visit, the test results of the dates closest to the 91th, 182th, 273th, and 365th day were retrieved.
This study covered all statin types that are prescribed at Seoul St. Mary's Hospital. According to American College of Cardiology/American Heart Association guidelines , we classified statins based on intensity, type, and dose. The statin types and dosages are as follows: atorvastatin (10, 20, and 40 mg), fluvastatin (40 and 80 mg), pitavastatin (2 and 4 mg), pravastatin (10, 20, and 40 mg), rosuvastatin (5, 10, and 20 mg), simvastatin (20 and 40 mg), and simvastatin plus ezetimibe (10/10 and 20/10 mg). The combination statins and other drugs with other effects were excluded from the study (atorvastatin plus amlodipine or pravastatin plus fenofibrate).
Patient data were extracted, including date of birth, age (when first prescribed statins), sex, department in which the statins were first prescribed, first statin prescription date, days of prescription, etc. Besides total cholesterol, TG, HDL-C, LDL-C, which are indicators relevant to hyperlipidemia, blood tests were also covered in this study, such as blood urea nitrogen, creatinine, aspartate aminotransferase/alanine aminotransferase (AST/ALT), hemoglobin/hematocrit, glycated hemoglobin, alkaline phosphatase, high-sensitivity C-reactive protein, γ-glutamyl transpeptidase, thyroid function test (thyroid stimulating hormone, free thyroxine), and others. In modeling the DB, we structuralized relations between objects of unique features comprising it (Fig. 2).
In addition to hyperlipidemia, various diseases, including cardiovascular disorders, diabetes, and others, have been recently reported as relevant to statins; thus, they were included. Moreover, to study statin side effects, consideration was given to whether a patient used fenofibrate, omega-3 fatty acids, propranolol, thyroxine, warfarin, nicotinic acid, etc., which are known to affect hepatotoxicity.
This study was a retrospective cohort study using EMR data retained by one hospital. All files of extracted data were encoded to prevent personal identification of patients during the extraction process. Only one managing researcher was allowed to access the data; observers and analysts received the data with all personally identifying patient information deleted. Thus, they were unable to identify the actual patient numbers. Data utilized in this study did not include patient's personal information, and there was no risk of physical or psychological damage to the patient subject. Because the data were encoded and anonymized, and because the study was a retrospective cohort type, this study did not affect the patient subjects' rights and welfare. Therefore, informed consent was not required. This study was approved by the Institutional Review Board of the Catholic University of Korea.
We employed initial clinical data of statin obtained from a CDW that was established to support clinical studies. Through a data quality management (DQM) process, the initial clinical data were refined. Because data was extracted from the EMRs of patients, there were many duplicates and errors. For example, the most frequent case was the inclusion of letters or inequality in the extracted clinical laboratory scores field. We performed pre-processing work and DQM operations on these data successively. As part of the DQM, abnormal data (redundant data, out-of-range data, meaningless data, null space, etc.) were identified using clinical judgments as well as statistical methods. The abnormal data were re-confirmed by direct chart review. This enabled further extraction of patient data in accordance with the objective of each clinical study and the use of an optimized DB. In this study, clinical information of patients who were prescribed statins for the first time was extracted from the CDW. A data mart for analysis appropriate to the study objective was established, and relevant DB information was obtained. Various structured or encoded clinical information could be automatically incorporated into the DB. However, unstructured free text content, such as patient height and weight, was manually inputted by direct chart review. To enhance the reliability of the data, problematic values were reviewed and compared with the original data. A data description table was developed (Fig. 3). Personal identification information (name, social security number, etc.) was not included when extracting the requisite clinical data. Patient numbers were collected anonymously during the data collection.
The data included patient personal information, such as height, weight, age, sex, etc. Patient laboratory results included glucose levels, AST/ALT values, and others, as shown in Table 1 for the baseline, visit 1 (an average of 3 months later), visit 2 (an average of 6 months later), visit 3 (an average of 9 months later), and visit 4 (an average of 12 months later) (Fig. 1). The presence of a diagnosis, such as hypertension, diabetes mellitus, etc., and the date when the diagnosis was first provided were together extracted. Accordingly, it was possible to distinguish whether certain diseases occurred before or after the first statin prescription and how soon the diseases occurred after the first statin prescription. Additionally extracted were various medications that were prescribed besides statins or those that could interact with statins.
Data were extracted of a total of 21,368 patients who were prescribed statins for the first time at the hospital over 7 years, from January 2009 to December 2015 (Fig. 4). The percentage of males was 44.2% (9,439/21,368); the percentage of females was 55.8% (11,929/21,368). The mean age was 63±12 years. A total of 17 different statins were extracted. Atorvastatin (10 mg; 21.0%, 4,490/21,368) and rosuvastatin (10 mg; 20.4%, 4,364/21,368) were the most commonly prescribed, followed by simvastatin (20 mg; 11.7%, 2,493/21,368) and pitavastatin (2 mg; 11.1%, 2,364/21,368). Statins were most often first prescribed by the endocrinology department (69.6%, 14,865/21,368), followed by cardiology (13.0%, 2,770/21,368) and neurology (3.9%, 832/21,368).
We selected statins that were initially prescribed when addressing hyperlipidemia. We established a clinical DB using the data of patients who were prescribed statins for the first time at Seoul St. Mary's Hospital. Subjects of EMR clinical data extraction for the statin study were defined; thus, this CDM included the patient personal information, diagnosis information, etc. Clinical researchers can conduct diverse, appropriate, and optimal EMR-based large-scaled retrospective cohort studies using this data depending on their study purposes.
Through a year of laboratory follow-up testing, researchers can analyze effects of statins, such as LDL-C lowering effects, and they can conduct basic research about guidelines for each statin. By including various illnesses and diseases, a risk model of occurrences  of cardiovascular disorders  can be developed. Moreover, research on recent issues, such as correlations between statins and occurrences of diabetes  or cancers , can be quickly and easily conducted. Researchers can perform assessments and analyses of economic efficiency  regarding statins and comparison analyses of diverse adverse drug effects . It is expected that analysis of effects of statins and the causes of disease occurrences will be possible by using a Bayesian network. Cost effectiveness, which depends on the different patient ages, risk types, and risks of side effects, can also be analyzed. Effects of statins that can be evaluated as outcomes include the rate of LDL-C lowering and contraction of cardiovascular disorders .
Various studies using the CDM have been conducted . Prescription rates of each department have been analyzed along with prescription patterns . In a previous study, cardiologists were determined to be the most frequent prescribers of statins . However, our study showed that the initial prescription of a statin was most commonly provided by an endocrinologist. This may be because the endocrinologist prescribes statins for preventing cardiovascular disease. The validity of statin studies using EMR data has been proved . Various studies can thus be conducted, such as those on different side effects that depend on different types/volumes of statins, actual statin prescription examples and problems, development of programs to predict clinical aspects, research on various cases that cannot be conducted with RCT, and rapid studies on rare side effects.
CDM development has become possible because of the dissemination of EMR. Accordingly, clinical studies that use a CDM have various advantages [2,27,28]. EMR data can provide information that is not available with traditional paper medical records, including information about the various treatments for each patient. Extensive quantities of medical data can be easily and quickly extracted by EMR . The use of EMR addresses the disadvantages of both cohort studies (follow-up costs, long study periods, and maintenance of consistency during the study period) and RCTs (loss of follow-up, changes in treatment, long study periods, and expenses). The investigator has direct access to the EMR system and can quickly verify a hypothesis by sampling variables from the system that are based on the hypothesis. As additional varied data are accumulated, studies regarding cause-and-effect relationships of rare factors become possible.
Furthermore, the additional advantage exists of increasing the generalizability of the results. In this study, we believed it was necessary to organize such diverse data by standardizing them. We thus strived to establish a CDM under a systematic plan from the beginning. Moreover, it is possible to develop a multi-center integration data mart to enable researchers of other hospitals to add their statin data to our data mart. We therefore employed a standard protocol. Consequently, clinical research using our CDM has many benefits. It can save time and labor compared with conventional clinical studies, which is the most important aspect. This is because it can help preview results before conventional long-term studies and can thus benefit clinical researchers. A large amount of clinical data can be collected within a short period. Unlike conventional clinical studies that often extrapolate results of the entire population on the basis of a small sample size, in EMR analysis, the amount of data can be significantly larger, thereby offsetting other limitations.
Previous research on the use of EMR data focused primarily on its convenience and ease of access [2,27,28]. However, with the recent accumulation of extensive data in EMR systems, these data can be used in clinical studies. EMR data can be used to determine the efficacy of medications that are currently available. Owing to the ability of inexpensively extracting large quantities of data in a short period, EMR systems will become more valuable for research in the future. Moreover, previously unknown information that clinical researchers did not predict in the study planning process may come to light. Therefore, it seems that EMR systems may lead to new knowledge and theories.
Despite these strengths of the EMR system, EMR-based clinical research has certain limitations. First, some possible cofactors and confounders cannot be accessed from the DB, including compliance with medication prescription, and the severity of potential diseases. However, an EMR-based trial is a reflection of actual practice relating to statin use. A researcher has a standardized study plan that is identified early to minimize confounding factors, which can have a significant impact on the results. Second, our data mart was conducted over a short period of 12 months in a single center. With an increase in the amount of clinical or biochemical data and its intensified variation, additional biochemical variables can become available that would identify patients who respond well to a statin compared to those who do not. To accomplish this, additional data in the EMR and a longer period study are necessary.
As EMR have been widely implemented in hospitals in Korea, we expect a drastic increase in the analysis of accumulated EMR data. To obtain clinically significant data from the data analyzed from large hospitals, a CDW can be an important tool. Standardization of statin prescription can be possible by effectively verifying effects of each statin through various studies using CDM, and collecting domestic data will contribute to ground-based prescription and clinical and health research. Moreover, establishing data and developing algorithms can become foundational to Korean guidelines of dyslipidemia. Clearly, CDM cannot replace RCT. Nonetheless, CDM research in which rapid data extraction is possible can help establish directions of RCT research. New directions of RCT can be roughly established through CDM research prior to RCT. With the advent of technology, we expect that the use of EMR can potentially lead to various types of clinical research.
This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health and Welfare, Republic of Korea (grant number: HC15C1362).
CONFLICTS OF INTEREST: No potential conflict of interest relevant to this article was reported.