Managing biomedical image metadata is crucial to enabling researchers and clinicians to use the information in images for investigation and medical care. While an institution’s PACS or, for that matter, a collection of PACS on the internet could potentially be used for decision support by retrieving images containing lesions with a given diagnosis, or having similar appearance to those being reviewed by the radiologist, current PACS implementations do not allow query for images containing particular types of lesions or diagnoses. Our work is relevant in addressing these needs. The query functionality of BIMM might be useful in cases when a radiologist is not confident regarding the diagnosis and could enable a researcher who wishes to develop image-processing algorithms to access particular images using the summary BIMM pages. In addition, research groups from geographically separated locations can combine their data using the globally accessible BIMM application. Searchable databases of image metadata could also be important for radiology research and education by enabling radiologists to find patient cohorts with particular image features.
There are several image databases available, and more under development. Most focus on warehousing images, and generally contain little associated image metadata. For example, the National Biomedical Imaging Archive [NCBI
] is a public database of biomedical images, but the only image metadata which this resource manages (other than occasionally supplementary data stored in separate files) is derived from the DICOM header. ARRS GoldMiner17
contains images from journal articles and their associated captions, but no other image metadata is available; image search is based only on terms in the associated (unstructured) captions. BIMM is unique in that it can manage a diversity of image metadata, as specified in the AIM standard.
Some existing systems address the problem of utilizing biomedical image metadata for several applications. The Yale Image Finder uses data mining to extract textual metadata that are present in the images themselves.18
ALPHA is a system prototype implementing scalable semantic retrieval and semantic knowledge extraction and representation using ontologies, allowing semantic and content-based queries as well as ontologic reasoning to facilitate query disambiguation and expansion.19
IML, an image markup language, provides a standard for describing image metadata and annotations and allows queries by an image client.20
The MedImGrid system allows semantic and content-based querying using a scalable grid architecture.5
The BIMM system is distinct from the prior work in that it interfaces with a PACS, it supports the use of controlled terminology, it uses a standard for sharing image metadata, it uses metadata describing image features to retrieve similar images based on these features, it provides Web services for communicating with client applications, and it allows sharing of de-identified images and their annotations using a globally accessible Web interface.
In addition to searching image metadata, BIMM also provides image similarity search capabilities, identifying and ranking similar images using a scoring function based on semantic features associated with images. As shown in Figure , our approach achieved high sensitivity and specificity for lesion diagnosis when using the IOCs of a query image to retrieve similar images. Our preliminary results appear promising, and they reveal some interesting features about variability in the accuracy of the method depending on the diagnosis. The results were best for hemangiomas and cysts, and not as good for metastases. The features describing hemangiomas and cysts tend to be specific and non-overlapping, whereas the features of metastases can overlap that of other lesions.
One limitation of our approach is that it requires human annotation, currently performed using the iPAD application. At the same time, this is a key attribute of our system, as the human perceptual data captured in BIMM is uniquely informative. While it might be challenging to introduce the current implementation of the iPAD interface into the routine radiology workflow, improvements in the user interface could make this practical in the future. Incorporating voice recognition reporting and controlled terminologies such as RadLex and improving the user interface could streamline structured data collection from images. Another limitation is that our dataset of 79 liver lesions is not very large, contains multiple lesions from the same patients, and contains a limited set of diagnoses; larger databases with more varied image types and diagnoses may be more challenging. We are currently building a larger image database to further validate our results. An alternative to Web-based exchange of image metadata is a distributed, large-scale “grid” of computers from multiple administrative domains, e.g., the National Cancer Institute’s caGrid project.21
It provides a set of services, toolkits for building and deploying of new services, and application programming interfaces for developing client applications.
There are improvements that we could make in our algorithm for finding similar images. Instead of simple frequency of matching IOCs, we could use machine-learning or other optimization algorithms to obtain an optimized set of weights for our scoring function to weight the contributions of each matching IOC in the total score. Such optimization would require a larger set of training data, which we will be pursuing as we expand our database of case material. We plan to optimize our scoring function in the future to maximize the system’s similar image retrieval performance. In addition, our current approach uses semantic features for finding similar images; pixel features are also informative (e.g., texture and lesion boundary22
). We will incorporate pixel features into BIMM in the future to improve similar image retrieval results.