We have successfully implemented an acquisition system that is able to extract parameters for QC measures automatically using natural language processing combined with postprocessing algorithms. We have also validated the accuracy of the system (UA/NSTMI test set) and demonstrated its use in a related application (patients with CABG). Overall, the automated system could identify the information about reperfusion strategy and discharge medication in a high agreement with that extracted by cardiologists. The information extraction about lipid management (LDL check-up and follow-up) was unsatisfactory because the doctors in our institution did not check and follow-up lipid profiles as rigorously as the guidelines suggest. Also, we illustrated a particular QC pattern for UA/NSTEMI, that is, a relatively high early invasive approach, adequate use of antiplatelet agents and ACE-I/ARB, and poor performance in β-blocker usage and lipid management. The automated system retrieved the medication information and calculated the attainment rate with accuracy.
The methodology using textual reports as the information source has eliminated the needs of complicated integration during databases operation. More importantly, this approach has taken into account clinical conditions (eg, shock, hypotension, bradycardia, hyperkalemia, azotemia, etc) in which physicians would decline the use of certain medication. The attainment rate to QC standards might be underestimated if contraindicated cases were not excluded from analysis during a medication search. In addition, the acquisition system is able to find the occurrence of various kinds of cardiovascular events for which it was almost impossible to be retrieved unless a tremendous amount of human time and expertise was involved.
One problem related to system evaluation was overtraining. The gold standards concerning the patients with STEMI had been setup before the beginning of the design of the automated acquisition system.3
However, this kind of evaluation might have tuned the system to perform well on the patients with STEMI. It was expected that there would be a degradation of κ-values when applying what was learnt on the patients with UA/NSTEMI. However, since there were about threefold cases in the test set, and the calculation of κ-values was sensitive to the case number, the degradation of performance due to overtraining was not seen.
The comparison between automated and manually derived attainment rates shows similar numbers except on Measure 1. In the test results (UA/NSTEMI), the distinction between early conservative approach and early invasive approach was found to have the lowest κ-value. Unlike patients with STEMI who usually presented to the hospital for the first time, the patients with UA/NSTEMI might have visited the hospital several times, and coronary angiography and coronary angioplasty might be performed either on previous occasions or in the current admission. Thus, most false-positive and false-negative interpretations resulted from the lack of capacity of the automated system in finding the fine details about the temporal information. One of the solutions was to apply a “temporal tag” to each event (timing of previous catheterization, present symptom onset and timing of the catheterization in this admission) to calculate the time gap in between. However, the lack of the temporal resolution in the reperfusion time was mostly attributed to insufficient documentation in electronic discharge notes rather than MedLEE's errors. The cardiologists would have demonstrated an instinctive capacity in exploring the sequence, but the automated acquisition has not obtained such a function yet.
In terms of a medication search, there were more false-negative than false-positive cases. The system searched for medication information in the “Discharge” field, whereas the residents might have written the discharge prescription in the final paragraph of the “Course” field. Also, the disagreement in outcome extraction in UA/NSTEMI set was related to the misclassification between CABG and DISCHARGE. Some patients were discharged and planned to readmit for CABG, and others actually underwent CABG and were then discharged (). Expanding the searching fields of the automated system is expected to reduce the occurrences of misclassification. Occasionally, medication information could be retrieved from other fields than “Discharge.” We did not attempt to parse “Course” or other fields for fear of increasing the complexity and potentially increasing false negatives for medication search.
The errors by MedLEE or errors from the above mapping algorithms were relatively rare. However, due to the similar QC measures in the development set, the test set and the application set, the accuracy could be highly reproducible. On the other hand, there may be doubts as to whether the same accuracy could be maintained if the system were applied to other hospitals or to other medical subdomains.
The automated system is dependent on residents documenting the important facets of the hospitalization in a fairly comprehensive electronic discharge note. This may not be applicable in many parts of the world. Therefore, we are conservative in expanding the generalizability of the system to a wider scale.
Generalization of QC evaluation to other medical applications requires a flexible interface that enables clinical researchers to apply the automated system to develop the strategy for information retrieval and extraction. Currently, our system was designed to extract QC measures for cardiovascular diseases, and its application cannot be generalized to other medical domains yet.
Concerning system evaluation, the two cardiologists should have independently reviewed the discharge notes to provide information about inter-rater reliability. However, it was difficult to perform an item-by-item consensus-forming process between the two independent experts due to multiple measures in hundreds of patients. Instead, the second expert who was independent of the opinions from the first cardiologist and unaware of the system outputs was involved whenever it was necessary. In addition, the simultaneous occurrence of interpretation errors from the first cardiologist and from the system might have slightly decreased the overall accuracy in estimating the attainment rates, but from reviewing the output, we believe errors of that type were relatively rare.