The Obesity Challenge, sponsored by Informatics for Integrating Biology and the Bedside (i2b2), a National Center for Biomedical Computing, asked participants to build software systems that could “read” a patient's clinical discharge summary and replicate the judgments of physicians in evaluating the patient's condition. To create training and evaluation data, physician judges, who had substantial clinical experience treating obese patients, were asked to read 1,250 discharge reports and make judgments about whether each patient had or did not have obesity and fifteen of its comorbidities. The judgments were of two types: textual judgments, based solely on the literal content, and intuitive judgments, based on the judges' “gestalt” interpretation of all information in the text. The 15 comorbidities were asthma, coronary artery disease, congestive heart failure, depression, diabetes, gastroesophageal reflux disease (GERD), gallstones, gout, high cholesterol, hypertension, hypertriglyceridemia, osteoarthritis, obstructive sleep apnea, peripheral vascular disease, and venous insufficiency. The challenge was to replicate the human judges' judgments: the possible textual judgments were “yes” (Y), “no” (N), “possibly” (Q), or “unmentioned” (U); possible intuitive judgments were Y, N, and Q. The metrics used to evaluate participating systems were recall, precision, and the combined F-measure.
Lockheed Martin and SAGE Analytica partnered to create a rule-based system guided by human medical expertise. Methods for processing clinical records, as with free text in other domains, include statistical systems, 1
rule-based systems, and hybrids. Arguably, all approaches require some manual labor, either in developing the rules or the training data. Much work has been published related to the Natural Language Processing (NLP) challenges of 2006 (identifying patient smoking status) 2
and of 2007 (assigning ICD-9-CM tags to radiology reports). 3
In the former, the top performing system (Clark et al.) 4
combined a rule-based extraction engine with machine learning algorithms. In the latter, the top system (Farkas and Szarvas) 5
was a hand crafted rule-based system that combined rule-based with statistical learning models. Because many rule-based systems, including the Lockheed Martin system, performed well in the 2007 challenge, we hypothesized that a rule-based approach integrating medical, epidemiological, and NLP expertise would be able to effectively complete the 2008 task.
An unabridged version of this manuscript is available as an online supplement at http://www.jamia.org