We were able to use an existing NLP system without modification to process a heterogeneous set of outpatient clinical notes into XML-tagged clinical data with a moderate level of de-identification, with 3.2% of total PHI allowed into output. Errors were mainly due to collisions between names of people and places with medical concepts or English terms. Errors were also a result of misclassification of ages as quantities or measurements (such as lab values).
Large numbers of clinical documents are likely to be used in the future for quality improvement, public health surveillance, and research; specific activities may include performing automated reporting using text, quantifying guideline adherence, implementing quality measures, and detecting adverse advents. Using these clinical data for a multitude of purposes increases the likelihood that PHI will leave the protection of a health care facility and transfer to a variety of institutions, such as health departments, research facilities, and other organizations focused on quality and safety. One of the best performing de-identification systems that leave text intact did not recognize approximately 0.18% of PHI; 8
this means that in a practice that produces 2000 notes per month of 250 words per note with 4% being PHI, approximately 450 pieces of PHI could be missed per year in one practice alone. Improvement is likely necessary, but continuing to perfect these existing systems may not be the optimal solution. It may be necessary to approach de-identification by using several methods to complement each other. If the ultimate goal is producing useful de-identified data from clinical text, then combining a traditional de-identification system with MedLEE may afford a solution.
Pipelining two systems that use different strategies in a series may produce better results than achieved by either alone. For example, one could use a system that tags potential PHI, followed by MedLEE processing. This strategy would have the advantage of transforming text to structured data, although how the systems interact is untested. If two systems were used in series, higher PHI removal with may result; by processing PHI differently, each system may catch identifiers that the other misses. We did not estimate the possible impact on the text processing performance of MedLEE due to misclassifying PHI, but implementing this strategy may reduce the problem. For example, if ages are marked as such, then MedLEE will not misinterpret it as a laboratory value. The MedLEE processor does have an advantage over other systems in its ability to convert text to computable data; it has proven its usefulness in other contexts and is likely to perform similarly in the future.
The rate of PHI that was allowed into output using MedLEE is higher than other systems but the PHI that remains in output is often transformed, with the actual text changed to normalized medical terms. The processing errors caused by MedLEE are of a different variety than the type of error that occurs in a system that removes identifiers but leaves the text intact. The context of the PHI may be important; PHI that remains in its original text may have a higher potential for identification than a piece of data that has been tagged incorrectly in structured output. It is also likely true that all identifiers are not equal—allowing a lab date test date into output is very different than a patient's last name, but for ease of quantification, these are considered the same. Regardless, we have learned the important lesson that MedLEE output is not necessarily de-identified and should not be treated as such.
We have demonstrated that an existing NLP system can de-identify clinical notes to some degree with the same tagged, structured output that has demonstrated utility in other contexts. The combination of de-identification of PHI with identification of medical concepts may be useful in a variety of activities, such as research, quality improvement, and public health, or any other task which requires a large amount of detailed clinical data. In the future, we would like to improve the system to reduce the types of errors that allow PHI in output and test out the performance of MedLEE when used in conjunction with an existing de-identification system.