The ChartIndex system was originally developed to concept-index radiology reports. To handle pathology reports the system was modified as follows: (1) Added pathology report section headings and their canonical mappings to the document parsing module to improve report section segmentation; (2) Identified a set of text patterns/fragments commonly found in Stanford pathology reports that the parser should ignore; (3) Added a list of common abbreviations found in Stanford pathology reports to improve disambiguation; (4) Added a new mode of negation detection to handle text fragments where no parse tree is generated; (5) Created indexing rules to handle the report sections typically found in pathology reports - these ChartIndex rules use information such as UMLS semantic types and the matching score assigned by the UMLS concept mapping module to optimize mapping precision and recall. One such indexing rule identifies anatomical site descriptors from the specimen section for concepts with a matching score of 870 or higher, and if no anatomic site descriptor is found in that section, the rule then tries to identify site descriptors in the diagnosis section of the report. This rule helps improve precision because the short phrases in the specimen sections usually give contain anatomical sites together with the procedures used to acquire the specimen, while in the diagnosis sections false positives may be introduced by comments not pertaining to the actual specimen.
The document set for this study consisted of 500 de-identified single-specimen surgical pathology reports, selected at random from more than ten thousand consecutive non-cytology reports from Stanford University Medical Center. Cytology reports were excluded because of the high rate at our institution (>90%) of reports with negative findings. Reports on slides and blocks from other institutions (“outside consults”) were also excluded. Each electronic pathology report consisted of a demographics section, an optional section transmitting the “clinical history” from the surgeon to the pathologist, a required section identifying the “specimen submitted”, a required section listing the “diagnosis”, optional sections commenting on the diagnosis and optional sections describing the gross and microscopic features of the specimen.
To de-identify the reports, patient demographics were removed and the report accession number replaced by an MD5 hash file identifier. For parsing, we chose to examine only the “specimen submitted”, “diagnosis”, and “comment” sections from each report, as these sections contained the anatomic and diagnostic data of interest to this study. These report sections were then further de-identified, using regular expression matching, by replacing specific dates with an arbitrary date, replacing names with an arbitrary name and mapping ages into one of the following categories: newborn, infant, toddler, child, teenager, young adult, adult, mature adult, or elderly. The reports were then manually inspected to verify de-identification.
The de-identified document set was divided into a training set of 100 reports and a test set of 400 reports. The training set was used to train the ChartIndex parser on the typical document segmentation, sentence structure, sentence fragments and abbreviations/acronyms used in pathology reports at Stanford (e.g. “NSA” for “no significant abnormality”.) The 400 reports in the test set were parsed into the XML document format used by ChartIndex, a format based on the HL7 Clinical Document Architecture (CDA)15
. ChartIndex parsed the reports and each noun phrase identified was mapped to one or more UMLS concept descriptors using the National Library of Medicine’s MMTx software. The set of UMLS concept descriptors generated for each report was then filtered to remove all non-SNOMED descriptors. The resulting list of SNOMED descriptors associated with each report section was further filtered to retain only SNOMED descriptors with appropriate UMLS semantic types indicating anatomical sites or diagnoses. For example, SNOMED descriptors associated with the Specimen Submitted section of the reports (where the anatomic source of the specimen is described) were filtered to ensure that only SNOMED concepts with UMLS semantic types in the “Anatomical Structure” semantic class hierarchy (A1.2) were included.
The de-identified SNOMED-indexed reports were then divided into four sets of 100, with each document reviewed independently by two experts from a panel of three pathologists and one internal medicine physician. An example of one of the simpler reports used in the study is shown in . The top portion of this sample contains three sections extracted from the original report. The bottom section contains the ChartIndex-generated SNOMED terms for Tissue/Site and Findings/Diagnosis. Most reports in the study were more complex than this sample, containing several SNOMED terms.
The Expert reviewers were instructed to enter a ‘+’ before each SNOMED term that correctly represented a concept present in the report or enter a ‘−’ if the SNOMED term did not represent a concept in the report. Only anatomic sites and diagnosis concepts were listed.
Two different experts then independently scored each indexed report and any differences in assessments were reconciled by two of the authors using a variant of the Delphi method16
. The Dephi method uses a consensus-building approach in which group communications are structured to allow results from each individual expert to be shared and revised. This enables experts to reconsider their decisions after seeing conflicting results from others. All remaining results variations were then arbitrated by a final expert judge, i.e. one of the physician authors (DPR or HJL).
We calculated the positive predictive value (precision) of the ChartIndex parser and the inter-observer agreement ratio for each set of 100 documents and also for the entire test set. The inter-observer agreement ratio was calculated as the total number of agreed-upon concepts in two expert’s initial responses (before the reconciliation process) divided by the total number of SNOMED concepts in the set. The positive predictive value was calculated as the number of true positives after inter-observer reconciliation divided by the total number of SNOMED CT concepts generated by ChartIndex.