|Home | About | Journals | Submit | Contact Us | Français|
Diagnostic radiology requires accurate interpretation of complex signals in medical images. Content-based image retrieval (CBIR) techniques could be valuable to radiologists in assessing medical images by identifying similar images in large archives that could assist with decision support. Many advances have occurred in CBIR, and a variety of systems have appeared in nonmedical domains; however, permeation of these methods into radiology has been limited. Our goal in this review is to survey CBIR methods and systems from the perspective of application to radiology and to identify approaches developed in nonmedical applications that could be translated to radiology. Radiology images pose specific challenges compared with images in the consumer domain; they contain varied, rich, and often subtle features that need to be recognized in assessing image similarity. Radiology images also provide rich opportunities for CBIR: rich metadata about image semantics are provided by radiologists, and this information is not yet being used to its fullest advantage in CBIR systems. By integrating pixel-based and metadata-based image feature analysis, substantial advances of CBIR in medicine could ensue, with CBIR systems becoming an important tool in radiology practice.
Diagnostic radiologists are struggling to maintain high interpretation accuracy while maximizing efficiency in the face of increasing exam volumes and numbers of images per study.1 A promising approach to manage this image “explosion” is to integrate computer-based assistance into the image interpretation process. While substantial progress has been made in computer-aided diagnosis/detection (CAD) for lesions such as breast masses, lung nodules, and colonic polyps; current CAD methods target very specific image features, limiting their broader application to many other scenarios where assistance with image interpretation could be beneficial.
Medical image interpretation consists of three key tasks: (1) perception of image findings, (2) interpretation of those findings to render a diagnosis or differential diagnosis, and (3) recommendations for clinical management (biopsy, follow up, etc.) or further imaging if a firm diagnosis has not been established. The potential for assisted interpretation and decision making is motivated not only by time constraints on readers, but also by the recognition of variations between readers based on perceptual errors, lack of training, or fatigue. Significant inter-observer variation has been documented in numerous studies.2 For example, in mammographic interpretation, there is variation in sensitivity, specificity, and area under the receiver operating characteristic curve among radiologists.3 This variation results partly from the complexity of processing the vast amounts of knowledge needed to interpret imaging findings. Much of radiological practice is currently not based on quantitative image analysis, but on “heuristics” to guide physicians through rules-of-thumb.4 Such heuristics can fail in a variety of circumstances where combinations of features related to diagnosis do not fit expected patterns and practitioners do not recognize the impact of such circumstances. In addition, the heuristics and their use are subject to inter- and intra-reader variability.
While CAD systems are primarily designed to enhance the perceptual component of image interpretation and have been developed for several domains of radiology,5 decision support systems are much broader; they do not focus on detection, but rather on the reasoning process that radiologists go through after detecting an abnormality, often called a “finding.” Decision support integrates the imaging finding(s) with a formal model (or knowledge base) representing disease processes, ideally aiding arrival at an accurate diagnosis.6,7 Radiologists always utilize broader patient-specific or demographic knowledge, such as clinical history or results of other tests, in their decision-making processes; as such, it is expected that decision support systems would incorporate these data as well.
Another emerging technique that may assist radiology interpretation is content-based image retrieval (CBIR). In its broadest sense, CBIR helps users find similar image content in a variety of image and multimedia applications. CBIR applications in multimedia can save the user’s time considerably in contrast to tedious, unstructured browsing. The role for CBIR in medical applications is potentially very powerful: in addition to enabling similarity-based indexing, this framework could provide computer-aided diagnostic support based on image content as well as on other metadata associated with medical images. However, despite its success outside medicine, CBIR has had little impact on radiology to date. Current work in image processing, medical informatics, and information retrieval domains provides building blocks that can markedly increase the relevance of CBIR to radiology practice in the future.
At present there is a substantial gap between CBIR, and its focus on raw image information, and decision support systems, which typically enter the workflow beyond the point of image analysis itself. This gap represents what we believe is a major opportunity to develop decision support systems that integrate image features exploited in CBIR systems. With such integration, CBIR may be a starting point for finding similar images based on pixel analysis, but the process would be augmented by inclusion of image and nonimage metadata as well as knowledge models, broadening the system from “image based search” to “patient” or “case-based” reasoning.8
This article reviews the status of CBIR in radiology and highlights some challenges and opportunities to be addressed to achieve significant scientific and clinical impact in the years to come. In “Current CBIR Technology” section, we provide an overview of the CBIR technology along with a general description of its key components. In “State-of-the-Art Medical CBIR Systems” section, we present an analysis of existing medical CBIR systems. The analysis has been carried out along several axes such as the way visual features are employed for content description, the methods to measure/quantify similarity between medical images, the use of statistical classifiers in a user-interactive context, and medical image segmentation. In “Challenges and Opportunities for CBIR in Radiology” section, we first point out the challenges faced by the generic CBIR technology in the radiology domain, and then we describe some of the opportunities to advance CBIR in context. In “Strategies for Moving Forward in the Next Decade” section, we propose a strategic vision to make CBIR operationally useful in radiology in the near future. “Conclusions” section presents our conclusions.
Content-based image retrieval has been a vigorous area of research for at least the last two decades. The abundance of publications within this period reflects diversity among the proposed solutions and the application domains (see the extensive surveys in 9–11). CBIR has been most successful in nonmedical domains, e.g., the QBIC system employed in the Russian Hermitage Museum (http://www.hermitagemuseum.org/fcgi-bin/db2www/qbicSearch.mac/qbic?selLang=English), the ALIPR (http://alipr.com/) system enabling automatic photo tagging and visual search on the web, and the RIYA (http://www.riya.com/; http://www.like.com/) system for visual shopping.
A generic CBIR system has two main components. The first component represents the visual information contained in image pixels in the form of image features/descriptors and aims at bridging the gap between the visual content and its numerical representation. These representations are designed to encode color and texture properties of the image, the spatial layout of objects, and various geometric shape characteristics of perceptually coherent structures within the image. The second component provides for assessment of similarities between image features based on mathematical analyses, which compare descriptors across different images. Ideally, the computed similarity measures should at least partly parallel the similarity between images when judged by the human visual system alone. In a typical CBIR system, database images are returned and displayed in decreasing order of their computed similarity to a query image provided by the user. Thus basically whatever the application domain is, a CBIR system must provide a means for (a) describing and recording the image content based on pixel/voxel information (image features/descriptors) and (b) assessing the similarity between the query image and the images in the database.
Image features/descriptors are derived from visual cues contained in an image. They are represented as alpha-numeric data in different formats such as vectors or graphs, which stand as compact surrogates for the visual content. One can distinguish two types of visual features. Photometric features exploit color and texture cues and they are derived directly from raw pixel intensities. Geometric features, on the other hand, make use of shape-based cues. In the following, we describe these cues in detail. Table 1 provides a summary of visual features used in the medical domain.
Color The use of color cues in image description dates back to one of the earliest CBIR proposals.12 A global characterization of the image can be obtained by binning pixel color components (in an appropriate color space, e.g., hue–saturation–illumination) into a histogram 13–16 or by dividing the image into subblocks, each of which is then attributed with the average color component vector in that block.17–19 While color is one of the visual cues often used for content description,10,11 most medical images are grayscale. True color-based characterization is applicable only where color photographs are used for diagnosis, such as in ophthalmology, pathology, and dermatology, or when color is used to scale flow velocities or intensity scales such as in nuclear cardiology. Thus for the majority of medical images, color features will not be useful in image retrieval.
Texture Texture features encode spatial organization of pixel values of an image region. The common practice to obtain texture-based descriptors is to invoke standard transform domain analysis tools such as Fourier transform, wavelets, Gabor, or Stockwell filters on local image blocks.14,15,20–22 In addition, one can also derive the so-called Haralick’s texture features such as energy, entropy, coarseness, homogeneity, contrast, etc., from a local image neighborhood16,20,21,23,24 or utilize linear system approaches such as simultaneous autoregressive models.25 In the medical domain, texture-based descriptors become particularly important as they can potentially reflect the fine details contained within an image structure. For example, cysts and solid nodules generally have uniform internal density and signal intensity characteristics, while more complex lesions and infiltrative disorders have heterogeneous characteristics. Some texture features may be below the threshold for humans to appreciate, and computers may be able to extract important texture and pattern information that is not readily visible.
Shape We use the term shape to refer to the information that can be deduced directly from images and that cannot be represented by color or texture; as such, shape defines a complementary space to color and texture. A powerful way of representing shape is through perceptually grouped geometric cues such as edges, contours, joints, polylines, and polygonal regions extracted from an image. Such a grouping can serve as a spatial layout or as a rough sketch by additional postprocessing. It has been successfully used in nonmedical domain by the CIRES (http://amazon.ece.utexas.edu/~qasim/) system,26 which is based upon a combination of higher-level and lower-level computer vision principles. While the term lower-level refers to basic image features (color and/or texture, see above), higher-level analysis benefits from perceptual organization, inference and grouping principles to extract information describing the structural content of an image. In cases when the image contains object(s) that can be clearly separated from the background or surroundings, these geometric cues can be used to extract powerful complete shape descriptors. This approach has empirically proven to be very successful in applications such as visual shopping http://www.riya.com/; http://www.like.com/, logo and trademark retrieval,27 handwritten digit recognition,27 and retrieval of three-dimensional computer models.28 Single object-based shape description has also been successfully employed in several medical CAD applications such as in CT colonography29,30 and lung nodule detection.31
A shape-based representation of the image content in the form of point sets, contours, curves, regions, or surfaces should be available for the computation of shape-based features. Such representations are not usually available in the data directly. Accordingly, as a first step of geometric feature computation, a suitable shape representation should be extracted from the pixel intensity information by region-of-interest detection, segmentation, and grouping. In the medical context, the shape information is one of the strongest factors in detecting a certain disease/lesion or in understanding its evolution. Thus shape-based descriptors are likely to be useful to fulfill the fine detail requirement of medical image retrieval. However, most of the current medical CBIR systems do not exploit the full potential of the shape information as they either use indirect correlates of the shape cue such as texture measurements or employ very global and simple shape description schemes which are incapable of capturing the required classification granularity. The segmentation problem can be seen as the main obstacle toward the use of more elaborate methods for shape analysis. Objects of interests such as anatomical structures or lesions are embedded in complex and arbitrary backgrounds, in which case robust and automatic segmentation presents a great challenge.
In addition to the preceding quantitative visual features of the image content, there are semantic image features that can be derived from a human expert’s observations—the radiologist viewing images. Radiologists describe a variety of information in images through annotations that provide essential semantics describing image content. In addition to these, images can be associated with related information about the individual from which the image was obtained, such as laboratory reports, clinical diagnosis, demographic information, etc. These image and nonimage data, acquired automatically or through manual annotations, are very informative to radiologists in providing their interpretations and thus could provide useful information for CBIR systems as well.
Image similarity measures generally assess a distance between (sets of) image features. Intuitively, shorter distances correspond to higher similarity. The choice of metric depends on the type of image features/descriptors as well as on their representation.
The simplest feature representation is a high-dimensional vector space. Metrics defined on vector spaces (e.g., the Euclidean distance) are common similarity measures. Many CBIR systems employ such vector distances due to their computational simplicity. Despite their popularity, distance definitions require the features to be continuous within a range for the computed distances to correspond to perceptually coherent similarities. This requirement is equivalent to assuming that a linear combination of two feature vectors is a valid feature vector corresponding to a valid shape (and/or a semantically similar image), which cannot be generalized. Instead, the metric used should define a convex space of semantically similar images. This calls for the concept of manifolds and manifold-learning techniques.44–46 Furthermore, in cases where there are possible mismatches between corresponding dimensions of the above-mentioned feature vectors, the standard vector space metrics yield erroneous results. One such case is when the image content is described by multidimensional feature histograms, which are obtained by accumulating the count of feature vectors falling into a fixed number of bins of predefined size, this representation can be rendered more expressive by allowing the bins to vary in number and size. When comparing two such histograms, however, usual vector distances cannot capture true similarity since the bins are of different sizes and they are not aligned. The Earth Mover’s Distance (EMD)47 has been proposed as a solution. EMD takes the variable sizes of the bins into account and remedying the correspondence problem by computing the optimal alignment between the two multidimensional histograms.
Alternative to vector-based description, the graph-based representation of image features is a powerful technique that is capable of representing not only local color/texture/shape features, but also their interrelations such as their relative spatial distribution within the region-of-interest. They require a generic class of special computational approaches for similarity assessment, collectively referred to as graph matching.48
In some medical applications, subtle geometrical differences between imaged structures may be of diagnostic importance to experts. In such cases, a promising approach is to define similarity through the notion of elastic deformations required to transform one shape into another.49,50 The energy required to transform one shape into another is assumed to be inversely proportional to similarity.
Similarity can also be measured through the use of statistical classifiers that categorize new instances using high-level information extracted from a training set of instances with known labels. This constitutes a promising attempt to close the so-called semantic gap between the visual description of an image and its meaning.11 The semantics of an image, defined here as its possible interpretations, is task-dependent. It varies based on what is sought. Consequently, different classifiers need to be trained for different tasks, even on the same dataset. Assuming that the images in the database are categorized, one can employ this measure of attachment as a pseudo-similarity measure between the test image and all those in the database. The limitation is that a fixed set of labels is required. Relevance feedback techniques that have been widely investigated in nonmedical domains51–54 address this limitation by more flexible user-centric labeling schemes. In relevance feedback, during the search session, the user labels a few database items that he/she thinks relevant or irrelevant to his/her query. These labeled items, when employed with multiple-instance learning-based approaches, can serve as high-level information about what the user has in mind in that particular search session. More specifically, the query model can be iteratively updated so that the similarity measure can be refined implicitly as in the Accio! system52 or the retrieval machine can directly optimize the parameters of the similarity model via statistical learning as in.51,54 Either way, the requirement of a fixed set of labels is removed since the labels are user-specific and session-dependent. While tested only in a few medical retrieval systems (see “Relevance Feedback” section), relevance feedback can potentially offer several advantages in the medical domain.
Table 2 presents a summary of a large set of current systems. The table is organized in five columns addressing the most important four components of a medical CBIR system and the imaging modality the system is designed for. The modality defines the type of information that can be extracted from medical images. This limits usable image representations and features. Consequently, the modality targeted by a medical CBIR system is important. Few medical CBIR systems support multimodality.15,55,56 We have included some selected medical image classification systems as their components are relevant to CBIR, which can alternatively be viewed as a classification task with respect to the query image.
We grouped image descriptors into three operational categories: general, mixed, and specialized descriptors.
General descriptors, such as color, texture, pixel/gradient histograms, etc., are commonly used in all CBIR systems, irrespective of the application domain. They form generic and rather robust feature sets. Since no a priori domain-specific knowledge is exploited in their computation, they are primarily used with vector distance-based similarity analyses14,40,55,57 in high-dimensional feature vector spaces or with statistical classifiers.21,24,32,40,55,57,58 However, in a few cases, these general descriptors have been used with elastic deformation-based59 and graph matching-based55 similarity analyses. In the former case, a descriptor is continuously deformed into another so that the deformation energy serves as a similarity measure between the two descriptors. In the latter case, general descriptors describe segmented image regions that are represented as attributed nodes of a graph.
Mixed descriptors not only use the general descriptors but also the annotations.15,17,56,60 Manual or automatic, the type of annotations may vary from textual descriptions of image to diagnoses. Annotations are very powerful descriptors in filling the semantic gap in CBIR systems. For instance in 56, the system (http://www.dim.hcuge.ch/medgift/) aims at bringing the standard information technology tools and infrastructures within the reach of medical practitioners. The strategy adopted in its retrieval implementation relies on the GNU GIFT tool (http://www.gnu.org/software/gift/), which itself borrows ideas from the text retrieval domain. Accordingly, an image is considered as a document containing several visual words. Visual features derived from local and global color/texture cues are mapped to these keywords in order to describe the image in terms of a set of words. As such, the system relies on an annotation tool. With such a text-based description, the image retrieval problem turns into one of standard text-based information retrieval. In 15, the system adopts a content description approach similar to the one considered above. The description scheme relies upon a statistical learning framework, which extracts vocabularies of meaningful medical terms (VisMed terms) associated with the visual appearance of image samples. These terms are segmentation-free image regions described by color and texture features that are meaningful to medical practitioners and that can be learned statistically using a training set of manually cropped image regions. The learned VisMed terms are used to span a new indexing space. A medical image is indexed in terms of a compact spatial distribution (histogram), where each histogram bin corresponds to a VisMed term from the learned vocabulary, and thus has a semantic interpretation.
Specialized descriptors exploit the interrelations between feature sets based on domain-specific knowledge. The information that can be represented through the use of these interrelations is complementary to the other descriptors.16,36,43,61 For example in 16, the system (http://cobweb.ecn.purdue.edu/~cbirdev/WEB_ASSERT/assert.html) addresses the retrieval of pathology bearing regions (PBRs) in lung CT images by a human-in-the-loop approach. Starting from the well-founded assumption that the extraction of PBRs by automatic image segmentation techniques is difficult, if not impossible, the system lets the user delineate PBRs in lung images. The manually extracted regions are then characterized by their grayscale, texture, and shape attributes, which are organized based on their spatial location with respect to certain anatomical landmarks. The system proposed in43 describes the images by the so-called attributed relational graphs (ARGs). In an ARG, the image regions are segregated by graph nodes and their spatial relationships are represented by edges between these nodes. As in 16, the regions of interest are extracted manually but they are not necessarily pathology bearing. Both nodes and edges are labeled by attributes corresponding to the properties of these regions and their relationships, respectively. The attributes can be derived from any type of the visual cues described above, e.g., texture properties, global geometric features of regions (e.g., size, roundness), or features specified in some transform domain (e.g., Fourier coefficients of the extracted region boundary). Typical attributes for the spatial relationships (the edges) can be the distance between two connected regions, their relative orientation, etc. Specialized descriptors are primarily employed with statistical classifiers, however in few cases these descriptors are also used with elastic deformation61 and graph matching-based43 similarity analyses.
One of the biggest challenges in any CBIR system is how to define an appropriate measure assessing the similarity to be used for database indexing and/or similarity-based ranking of the retrieved images with respect to the query. A common and rather straightforward method is to employ vector distances in a high-dimensional normed vector space, commonly a Euclidean space, in which each image is represented with a point corresponding to its image descriptor/feature vector.14,40,55–57,60,62
Procrustes methods are used when descriptors consist of landmark points, which usually delineate a shape boundary.33,34 The shape of object is considered as a member of an equivalence class formed by removing the effects of translation, rotation, and isotropic scaling, collectively denominated as similarity transformations. Thus, the similarity between two shapes is defined up to a rigid body transformation (translation + rotation) and an isotropic scaling. It is possible to extend this notion of similarity to allow nonrigid deformations of the shapes such as via elastic matching methods as in 59,61. In fact, approaches based on continuous (and even diffeomorphic) mappings are becoming increasingly popular in the medical domain.49,63 Yet, elastic matching methods are more suitable for cases of abnormalities of overall size and/or shape of an organ, which limits their applicability to some extent.
Graph matching, as pointed out earlier, refers to a special class of similarity measurement that is only applicable when the image content is represented by a graph.43,55 A graph is a natural way of representing features and their interrelations. For instance in 55, the description (indexing) step uses the selected local features following the Blobworld approach,64 which outputs a graph-based descriptor of the image. Each node of the graph represents an image region and the corresponding local features of that region are used as the attributes of the node. The similarity between the query and database images is computed by an attributed graph-matching scheme.48 Basically, a cost function is first evaluated between the graph nodes in terms of the distance between the attributes of the node. The minimum total cost over possible matchings, defined as the sum of the costs between matched nodes, serves as a similarity measure. Combinatorial algorithms and/or continuous relaxation schemes can then be used to find a pairwise matching of the nodes by (approximately) minimizing the total cost.
In contrast to the above methods that directly measure the similarity in terms of image information alone, classifier-based similarity measures use the classification of a query image with respect to a fixed set of predetermined labels to assess similarity.15–17,21,24,32,36,40,55,57,58 Statistical classifiers need to be pretrained. The membership of the query image to each class is usually used as a feature set representing the image content. However, classifiers can also be used as a preprocessing step to narrow the search space in CBIR systems. For example, in 55, the classifier serves as an automatic medical image annotation tool on its own. It can thus be used to retrieve similar images on a coarse level, e.g., a lung CT would retrieve other lung CT images in the database.
Textual similarity measures refer to the common text-based query/retrieval scheme. They can be applied to CBIR through the use of manual/automatic annotations represented as words.56,60 The burden of manual annotations may be an obstacle in their wide-spread use. Accordingly, automatically obtaining medical annotations would thus be of significant importance. In fact, the medical image annotation track in ImageCLEFmed (http://www.imageclef.org/; http://ir.ohsu.edu/image/), organized as part of the Cross Language Evaluation Forum (CLEF; http://www.clef-campaign.org/) since 2004, serves as a benchmarking platform to streamline the research on medical image annotation.
Segmentation is a key preprocessing step in CBIR systems that describe the image content through regions of interest. The goal is to identify the semantically meaningful regions/objects within an image. Several methods have employed manual segmentation to rule out retrieval errors due to wrong segmentation.17,21,32–34,40,43,58,65 However, despite its advantages, manual segmentation is a tedious task limiting the usability of CBIR systems. This has lead to the development of semiautomatic16,57,61 and automatic55,60 segmentation algorithms to extract the regions of interest. In semiautomatic methods, an initial segmentation usually in the form of boundary delineation is provided by the user. The segmentation is iteratively refined and the user can intervene between iterations by correcting the boundaries if they tend to move away from the desired solution. Yet, some systems preferred to design segmentation-free image descriptors to avoid the complications due to imperfect segmentation.14,15,24,36,38,56,59
A promising technique to fill the semantic gap in the medical CBIR systems is to adopt an expert-in-the-loop approach. This refers to integrating the physician’s high-level expert knowledge into the retrieval process by acquiring his/her relevance judgments regarding a set of initial retrieval results.24,55,58 A relevance judgment is task-dependent. For instance, an image can be relevant to a certain query in terms of its modality, or the anatomical region that it belongs to, or the disease that it depicts, etc. These judgments, provided in the form of discrete labels or ordinal/continuous ratings, are used in a statistical learning algorithm to obtain an iteratively refined similarity measure, which is expected, at the end of the search, to become more suitable for a particular search task, e.g., finding all images of a particular organ or retrieving all cases related to a certain disease.
A comparative assessment of the performances of the reviewed medical CBIR systems is not possible as not only their application domains (imaging modalities that the systems are built for) differ, but also there is a lack of common database to evaluate different systems. The ImageCLEFmed, as mentioned earlier, is one of the few (if not the single) platform to evaluate and compare different systems. The IRMA,55 the medGIFT,56 and the VisMed15 projects are participants of ImageCLEFmed. The evaluation protocols are largely influenced by the tasks addressed by the IRMA system that mainly targets modality and body part similarity-based retrieval (http://www.imageclef.org/; http://ir.ohsu.edu/image/; http://www.clef-campaign.org/). Objective evaluation is even more challenging for the retrieval systems based on patient/case similarity as there is currently no consensus on how to rate the case similarity even manually. Consequently, task specific evaluation platforms are required rather than a generic approach.
In a typical CBIR system (e.g., in multimedia), subtle differences between images are considered as irrelevant for matching, as such, they are often ignored. On the contrary, in medical images such subtle differences can be highly relevant for diagnosis. Thus one of the challenges differentiating radiology CBIR from general purpose multimedia applications is the granularity of classification; this granularity is closely related to the level of invariance that the CBIR system should guarantee. Consequently, CBIR systems in radiology should, at minimum, capture fine details of the image content, particularly because they are related to diagnostic features used by humans in unassisted diagnosis. In addition, computer-derived features that may not be easily discerned by humans may also be useful.68 The sheer breadth of imaging findings is a major challenge, as “diseases” are depicted differently by different modalities, they affect a number of organs and tissues, and the findings themselves are highly varied. For example, some tumors are discrete masses while some are poorly defined and infiltrative. Entire organs may be abnormal due to diffuse disease, such as in cirrhosis or leukemia; and some clinically important features only manifest as abnormalities of overall size or shape of an organ or organ part (e.g., the hippocampus in epilepsy), without changing aspects of the image. For human image interpretation, several image features stand out as likely to be the most relevant, though the relative importance of features will vary across modality and disease targets. Some of the key features include lesion shape, boundaries, density or intensity, presence or absence of enhancement with intravenous contrast material, texture, and whether a lesion is solitary or multiple.
The rich and varied information content in medical images and their implicit knowledge about anatomic structures is not leveraged in image-only approaches to CBIR. Many image search methods in fields other than medicine use index terms (metadata) associated with the images, rather than the image data alone.69,70 The latter type of image search is based on semantics (i.e., meaning) as described by annotations. The class of metadata used in retrieval can be extended to include nonimage data such as laboratory reports, physiological measurements, etc., which are used routinely by radiologists in providing their interpretations, but are widely ignored in current CBIR systems. This domain-specific metadata largely depends on the radiologists’ observations. The current approach to representing these observations/interpretations is unstructured free text reports. Searches based on these reports are limited because there is no enforcement of controlled terminology, and the linkages between image content and report text are loose or nonexistent. Consequently, efficient and effective annotation tools are needed to generate such image metadata. Nonimage clinical data are stored in electronic medical record (EMR) systems. Such data, when linked to images, can be used to associate PACS data with corresponding EMR, hence the full potential of compound image and metadata approaches could be exploited. In fact, given the complexity of medical images themselves and the richness of the associated metadata, a combined approach is likely to be advantageous. Including both semantic and pixel content would help identify not only similar images, structures, or lesions, but also find these within similar patients and clinical situations.
Several attempts have been made to establish the connection between the visual content and domain-specific semantic content in medical images. One notable example is the Essence framework,71 serving as a knowledge repository and exchange platform for medical image databases. In Essence, visual abnormalities and pathologies are extracted and mapped to physician-defined semantic terms using a shared ontology based on the common knowledge from expert radiologists and information from medical references. The system is also capable of refining the shared ontology by adapting the assignment of semantic terms to image features based on individuals’ preferences. Closely related examples include the retrieval strategy adopted in the medGIFT project,56 employing purely text-based schemes for image retrieval, and the VisMed approach,15 which extracts vocabularies of meaningful medical terms associated with visual appearance from image samples (see “Challenges and Opportunities for CBIR in Radiology” section), thus providing a mapping from visual feature space to semantic terms. Controlled terminologies such as RadLex72 have recently been developed, and work is underway to create tools to enable semantic annotation of images using ontologies.73 The emergence of these technologies provides an opportunity to enhance CBIR systems with richer descriptions of images. A system that attempts at combining nonimage clinical data and image data in a real patient-analysis scenario is the Advanced Analytics for Information Management (AALIM) system.74 AALIM is a novel multimodal decision support system that seamlessly extracts, analyzes, and correlates information from patient’s EMRs for purposes of decision support. Since health records today have become multimodal, including images, video, text, and charts, AALIM analyzes multiple modalities to identify similar patient records. Current work in AALIM has focused on the domain of cardiology. Sophisticated feature extraction and search techniques have been developed to extract structured data from a database (demographics, diagnosis codes, vitals, medications), as well as information from free text (cardiology reports), graphical data (EKG waveforms), and imaging studies (ultrasound videos).74–77 Similar patients are then found based on fusion of information across the multimodal sources. Cohorts of patients with the same condition are extracted from large patient archives, matched on demographics and comorbidities.
We believe that a combination of image-based and metadata-based retrieval will perform much better than either independently, and eventually can help overcome many of the obstacles preventing CBIR from being clinically useful. This is much easier said than done. The image content itself has been extensively studied and ultimately contains a fixed amount of information. Basic metadata is easily linked to the images from clinical information or the DICOM header, such as imaging technique, patient age, weight, and gender. Moving further, there is an enormous amount of (potentially) relevant metadata that can be used to augment the image content itself. This may include structured image descriptors provided by radiologists, laboratory data, pathology, genetic profile, detailed clinical history, family history, and physical exam findings. Such a multidimensional database extends the concept of “image” retrieval to a much broader “case” retrieval, wherein each patient along with their images and associated metadata represents a building block of the database.
If each image-based and metadata approaches are challenging enough in isolation, an integrated system may seem intractable, even grandiose. Content representation becomes complex in dealing with image data itself, mathematical representations of the data, and semantic bridges to visual cues. Nonimage metadata may be text based, alpha-numeric (e.g., lab data), or more complex and multidimensional itself (e.g., pathology results, genetic profiles). Inevitably, the (content) representation in this vast case space has to be heterogeneous, yet represented faithfully in a database.
Beyond data collection and storage, a case-based system increases markedly the complexity of making similarity measurements between cases. Here, the major challenge is to combine different types of information in a single and continuous case similarity metric. A useful metric should (a) be continuous as no two cases are the same, thus the similarity measurement cannot be viewed as a classification task, (b) be able to handle the significant differences between types of information, such as scale differences between alpha-numeric features and continuous vs. discrete (e.g. class labels) features, (c) be able to deal with missing data which is expected to be a frequently seen problem due to the high dimensionality of the case space, and (d) should define a convex case space (i.e., case space should be defined as a manifold within the multidimensional feature space).
CBIR in general has advanced considerably on the heels of the dramatic proliferation of digital media, search technology, and the Internet. Much of what has been learned for nonmedical applications can be ported to the medical domain. Medical images contain varied, rich, and often subtle features that are important clinically and that separate this domain from multimedia applications. We believe that with coordinated efforts and a long-range view, current and future technology will fully enable a system integrating pixel- and metadata-based retrieval of similar images which can then be leveraged for diagnostic decision support. With the appropriate selection of clinically relevant disorders and engagement of front line radiologist users, significant advances in medicine and health can be achieved in the next decades. Progress towards clinically useful CBIR in radiology will require a broad-based and multidisciplinary approach. We propose the following guiding principles as a framework for future work:
This survey has focused on the CBIR applications in medical domain, with a review of state-of-the-art approaches, discussion of challenges and opportunities in the medical domain, and speculations for future research.
From a technical perspective, we have put forward the diversity of the proposed solutions for image description and similarity assessment techniques that are considered as the two fundamental components of a CBIR system. The majority of the medical CBIR systems have emerged as adaptations of the multimedia CBIR systems, basic technologies of which are discussed in “Current CBIR Technology” section. The medical CBIR approaches reviewed in “State-of-the-Art Medical CBIR Systems” section reflect the continued efforts of researchers working at the crossroads of visual content analysis and medical imaging in order to adapt existing CBIR techniques and to develop dedicated approaches that take into account the special aspects of the radiology domain. However, due to the diversity of medical image content in terms of modalities, human anatomy and diseases as well as the challenges specific to medical domain as compared to the multimedia applications, this diversity is expected to increase even further to meet the requirements of different subdomains within medicine.
Our analysis of state-of-the-art medical CBIR systems has shown that despite the progress in image description and similarity analysis, medical image segmentation remains as a formidable challenge as in other visual domains. Most of the approaches resort to semiautomatic or manual methods. This not only limits the development of more effective solutions for automated and objective analyses of anatomical structures and/or abnormalities but also adversely affects the scalability of the proposed systems. Another challenge is the lack of large verified datasets for reliable benchmarking, especially for systems having diagnostic purposes. Performance evaluations are usually carried out using a handful of examples, which further delays the adoption of medical CBIR systems in clinical routines.
In “Challenges and Opportunities for CBIR in Radiology” section, we have focused on other challenges that concern the inherent variability in imaging findings. While computer-driven image descriptors and similarity analyses can help reduce the variability, it can be conjectured that image-only approaches have limited resolving power especially for diagnostic purposes. Medical diagnosis is a complex process that should also take into account the rich and varied metadata, patient history, and other relevant (not necessarily visual) information. Accordingly, a medical retrieval system should extend its scope in the type of information it uses and be able to seamlessly incorporate such information. We have conceptualized this data integration problem as finding mixed descriptors and similarity measures on a conceptual case space rather than an image-only space. Should appropriate information processing technologies be developed, the information diversity in medicine is not necessarily a curse but may be an opportunity.
Finally, in “Strategies for Moving Forward in the Next Decade” section, we have drawn attention on operational issues in medical CBIR and proposed strategies to tackle them. We believe and hope these guiding principles would help researchers, in this fast growing field, build more useful and reliable medical search/retrieval systems.
This work has partly been supported by NIH CA72023 and TÜBİTAK KARİYER-DRESS (104E035).