Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3772630

Formats

Article sections

- Abstract
- 1. Motivation
- 2. The design of the semantic space
- 3. Analysis of endoscopic imagery in semantic space
- 4. Discussion
- References

Authors

Related links

Med Image Anal. Author manuscript; available in PMC 2013 October 1.

Published in final edited form as:

Published online 2012 May 29. doi: 10.1016/j.media.2012.04.010

PMCID: PMC3772630

NIHMSID: NIHMS381401

R. Kwitt,^{a,}^{*} N. Vasconcelos,^{b} N. Rasiwasia,^{b} A. Uhl,^{c} B. Davis,^{a} M. Häfner,^{d} and F. Wrba^{e}

The publisher's final edited version of this article is available at Med Image Anal

See other articles in PMC that cite the published article.

A novel approach to the design of a semantic, low-dimensional, encoding for endoscopic imagery is proposed. This encoding is based on recent advances in scene recognition, where semantic modeling of image content has gained considerable attention over the last decade. While the semantics of scenes are mainly comprised of environmental concepts such as *vegetation*, *mountains* or *sky*, the semantics of endoscopic imagery are medically relevant visual elements, such as polyps, special surface patterns, or vascular structures. The proposed semantic encoding differs from the representations commonly used in endoscopic image analysis (for medical decision support) in that it establishes a *semantic space*, where each coordinate axis has a clear human interpretation. It is also shown to establish a connection to Riemannian geometry, which enables principled solutions to a number of problems that arise in both physician training and clinical practice. This connection is exploited by leveraging results from information geometry to solve problems such as 1) recognition of important semantic concepts, 2) semantically-focused image browsing, and 3) estimation of the *average-case* semantic encoding for a collection of images that share a medically relevant visual detail. The approach can provide physicians with an easily interpretable, semantic encoding of visual content, upon which further decisions, or operations, can be naturally carried out. This is contrary to the prevalent practice in endoscopic image analysis for medical decision support, where image content is primarily captured by discriminative, high-dimensional, appearance features, which possess discriminative power but lack human interpretability.

Over the past decade, there has been increased research interest in decision-support systems for endoscopic imagery. In the context of routine examinations of the colon, an important task is to perform *pit pattern* discrimination. This is usually guided by the Kudo criteria [Kudo et al., 1994], based on the observation of a strong correlation between the visual appearance of the highly-magnified mucosa and the visual appearance of dissected specimen under the microscope. Pit pattern analysis not only facilitates *in vivo* predictions of the histology but represents a valuable guideline for treatment strategies in general. The Kudo criteria discriminates between five pit pattern types I-V, where type III is subdivided into III-S and III-L. Types I and II are usually characteristic of *non-neoplas*ti*c* lesions, type III and IV indicate adenomnatous polyps and type V is highly indicative for invasive carcinoma. Apart from incidental image structures, such as colon folds, the pit patterns are the predominant concepts upon which histological predictions are made. While images where one particular pit pattern type is prevalent are fairly rare, mixtures of pit patterns are quite commonly found in practice. The development of decision-support systems for endoscopic imagery is desirable for several reasons.

First, routine examinations often involve unnecessary biopsies or polyp resections, both because physicians are under serious time pressure and because the standard protocol dictates the use of biopsies in cases of uncertainty. This is controversial, since resecting metaplastic lesions is time-consuming and the removal of invasive cancer can be hazardous.

Second, the interpretation of the acquired image material can be difficult, due to high variability in image appearance depending on the type of imaging equipment. Novel modalities, such as high-magnification endoscopy, narrow band imaging (NBI) or confocal laser endomicroscopy (CLE) all highlight different mucosal structures; CLE even provides an *in-vivo* view of deeper tissue layers at a microscopic scale. A critical problem is that visual criteria for assessing the malignant potential of colorectal lesions are still under extensive clinical evaluation and substantial experience [Tung et al., 2001] is usually required to achieve good results under these criteria.

Third, decision-support systems can be a helpful aid to the training of future physicians. Due to differences among endoscopic imaging modalities and endoscope brands, it is advisable to train the physician on data from the very device to be used in practice. However, the learning of Kudo's pit pattern classification requires experienced physicians to go through the time-consuming selection of images representative of the different pit pattern types. This is a tedious process, which becomes unmanageable for large-scale datasets.

For all these reasons, there has been increasing interest in decision-support systems for endoscopic imagery over the last decade. This effort has been predominantly directed to the use of automated image content analysis techniques in the prediction of histopathological results (e.g. [André et al., 2009, 2010; Tischendorf et al., 2010; Kwitt et al., 2011; Häfner et al., 2012]). It has led to a plethora of approaches that first compute a collection of localized appearance features and then input these features to a discriminant classifier, usually a support vector machine. From a purely technical point of view, this problem description is similar to scene recognition problems in the computer vision literature, with the difference that invariance properties of the image representation, such as invariance to rotation or translation, are considered more important in the medical field. A relevant research trend in computer vision is to replace the inference of scene labels from appearance descriptors alone by more abstract, intermediate-level, representations [Fei-Fei and Perona, 2005; Lazebnik et al., 2006; Boureau et al., 2010; Rasiwasia et al., 2006; Rasiwasia and Vasconcelos, 2008; Dixit et al., 2011]. The prevalent approach to scene classification is to learn a codebook of so called *visual words*, from a large corpus of appearance descriptors, and represent each image as an histogram — known as the bag-of-words (BoW) histogram — of codeword indices. These mid-level representations are input to a discriminant classifier for scene label prediction.

In the context of pit-pattern classification, this classification architecture could, in principle, be used to produce a class label, such as *neoplas*ti*c* or *non-neoplas*ti*c*, to be presented to a physician. However, while BoW histograms have state-of-the-art recognition rates for both medical and computer vision applications, they are not generally amenable to human interpretation. This is due to the facts that they 1) are high-dimensional, and 2) define a space whose coordinate axes lack semantic interpretation. This lack of interpretability raises a number of difficulties to the clinical deployment of the resulting decision-support systems. First, while the resulting predictions are valuable, it is not uncommon for the medical community to reject *black-box* solutions that do not provide interpretable information on how these predictions were reached. Second, the lack of insight on the factors that determine the predicted image labels severely compromise their usefulness for physician training. Third, it has been recently argued that a more semantically-focused mid-level representation is conducive to better recognition results (cf. [Schwaninger et al., 2006; Rasiwasia and Vasconcelos, 2008]). Several works have, in fact, shown that an image representation which captures the occurrence probabilities of predefined semantic concepts is not only competitive with BoW, but computationally more efficient due to its lower dimensionality. Since the semantic concepts can be chosen so as to be interpretable by physicians, the approach is also conducive to a wider acceptability by the medical community For example, [André et al., 2012] demonstrated that low-dimensional semantic encodings are highly beneficial to the interpretation of CLE imagery.

The goal of this work is to establish a semantic encoding of endoscopic imagery so as to produce systems for automated malignancy assessment of colorectal lesions of greater flexibility than those possible with existing approaches. We demonstrate the benefits of the proposed encoding on image material obtained during routine examinations of the colon mucosa. The imaging modality is high-magnification chromo-endoscopy which offers a level ofvisualdetail suitable for the categorization of mucosal surface structures into different *pit pattern* types. Some typical images are shown in the top row of Fig. 1. The aforementioned shortcomings of previous approaches are addressed by adapting a recent method [Rasiwasia and Vasconcelos, 2008] from the scene recognition literature to the inference of semantic encodings for endoscopic imagery. Some examples of these encodings are shown in the bottom row of Fig. 1. While the general principle is well established in the computer vision literature, we demonstrate that it is a principled solution for a number of important applications in the domain of endoscopic image analysis. The first is the automated assessment of the malignant potential of colorectal lesions, where the proposed semantic encoding is shown to enable state-of-the-art image classification with substantially increased human interpretability of classifier predictions. The second is a tool to browse endoscopic image databases by typicality of particular pit patterns, allowing trainees in gastroenterology to find *most-representat*iv*e* cases for each pit pattern class. The third is a strategy to determine images which represent the *averag*e-case for a particular pit pattern type. This enables physicians to keep track of whatthey typically see in clinical practice. A preliminary version of this work appeared in Kwitt et al. [2011].

We start by introducing some notation. We denote by
={(*I _{i}*,

The generative model for the proposed semantic encoding is shown in Figure 2(a). Visual features * x_{i}* are independently drawn from concepts

$$\begin{array}{c}qb\left({\mathit{\text{x}}}_{i}\right)=\text{arg}\underset{t\in \mathcal{\text{T}}}{max}{p}_{T|\mathit{\text{X}}}\left(t\right|{\mathit{\text{x}}}_{i})\\ =\text{arg}\phantom{\rule{0.03em}{0ex}}\underset{t\in \mathcal{\text{T}}}{\text{max}}\frac{{p}_{\mathit{\text{X}}|T}\left({\mathit{\text{x}}}_{i}\right|t)}{{\sum}_{w}{p}_{\mathit{\text{X}}|\mathit{\text{T}}}({\mathit{\text{x}}}_{i}|w)}.\end{array}$$

(1)

This assumes equal prior probability for all concepts, but could be easily extended for a non-uniform prior. The mapping *q _{b}*: χ →

$$\widehat{s}={(\frac{{o}_{1}+\alpha -1}{{\sum}_{w}({o}_{w}+\alpha -1)},\cdots ,\frac{{o}_{C}+\alpha -1}{{\sum}_{w}({o}_{w}+\alpha -1)})}^{\prime}$$

(2)

Note that *α* acts as a regularization parameter. In the terminology of Rasiwasia and Vasconcelos [2008], *ŝ* is denoted the *semantic multinomial (SMN)* of image *I*. This establishes a mapping Π: *χ ^{N}* →

Learning of the Π mapping requires estimates of the concept-conditional distributions *P _{x}*

The proposed implementation of semantic encoding relies on Gaussian mixture models to estimate the concept-conditional probability densities *P*_{x}_{|}* _{T}*(

The semantic image encoding of Section 2 was applied to three application scenarios of potential interest for endoscopic image analysis: 1) assessment of the malignant potential of colorectal lesions by recognizing non-neoplastic and neoplastic lesions, 2) semantically-focused browsing for *most-representat*iv*e* cases and 3) determination of *average-case* image representatives per semantic concept.

Our data material are colonoscopy images (either 624 × 533 or 586 × 502 pixel), acquired throughout 2005-2009 in the Department of Gastroenterology and Hepathology of the Medical University of Vienna. All images were captured with a high-magnification Olympus Evis Exera CF-Q160ZI/L endoscope, using a magnification factor of up to 150 ×. The original dataset consists of 327 images of a total of 40 patients. To obtain a larger sample size, 256 × 256 pixel regions were manually extracted (with minimum overlap) to obtain a final dataset of 716 images. All images were converted to grayscale for further processing. Examination of the lesions was performed after dye-spraying with indigo-carmine, a routine procedure to enhance structural details. Biopsies were taken of lesions classified as pit pattern types I, II and V, since I and II need not be removed and type V cannot be removed, as explained in Section 1. Lesions of type III-S, III-L and IV were removed endoscopically. Table 1 lists the number of patients and images for a dataset split into non-neoplastic (i.e. types I, II) and neoplastic lesions (types III-V).

As a first application scenario, we demonstrate a method to classify the images into non-neoplastic and neoplastic lesions, given the proposed semantic encoding by SMNs. It is safe to claim that this is the most well-studied application scenario in the endoscopic image analysis literature. Many specifically-tailored low-level appearance features have been proposed recently, mainly considering the problem from a pure pattern classification perspective. While the discriminant classifier at the end of the pipeline is often optimized for a particular feature type, our approach is based on generic SMNs. This leads to a natural classification method that is independent of the underlying appearance-level representation.

Although it would be possible to train a support vector machine with an RBF kernel on the semantic representation, this would not respect the structure of the underlying Riemannian manifold. In fact, a RBF kernel based on the Euclidean distance would correspond to assuming that the SMNs reside in flat Euclidean space. This would ignore the fact that the SMNs are parameters of multinomial distributions, and thus represent points on the multinomial manifold. Better performance can usually be obtained by adapting the similarity measure to the structure of the manifold. For a Riemannian manifold the natural similarity measure is the associated geodesic distance. Although geodesics tend to be difficult to compute — and rarely have closed-form solution — this is not the case for the semantic manifold. In this case, it is possible to exploit the well-known isomorphism

$$F:{\mathbb{P}}^{C-1}\to {\mathbb{S}}^{C-1},\phantom{\rule{0.03em}{0ex}}\mathit{\text{s}}\mapsto 2\sqrt{s}$$

(3)

between the Riemannian manifolds (^{C}^{−1},
) and (
^{−1},*δ*), where
^{-1} is the (*C* − 1) sphere (of radius two) and *δ* the Euclidean metric inherited when embedding
^{−1} in * ^{C}*. Under this isometry, the geodesic distance between

$${d}_{\mathcal{\text{I}}}({\mathit{\text{s}}}^{i},{\mathit{\text{s}}}^{j})={d}_{\delta}(F({\mathit{\text{s}}}^{i}),F({\mathit{\text{s}}}^{j}))=2\text{arccos}(\langle \sqrt{{\mathit{\text{s}}}^{i}},\sqrt{{\mathit{\text{s}}}^{j}}\rangle ).$$

(4)

This provides a closed-form solution for computing distances between SMNs on the semantic manifold. It is also possible to prove (see Appendix A) that the kernel defined by the negative of this geodesic distance

$$k({\mathit{\text{s}}}^{i},{\mathit{\text{s}}}^{j}):=-{d}_{\mathcal{\text{I}}}(\mathit{\text{s}},{\mathit{\text{s}}}^{j})$$

(5)

satisfies all the requirements of a conditionally positive-definite (cpd) kernel, see [Schölkopf, 2000]. This is interesting because cpd kernels can be used in the standard SVM architecture and share many of the closure properties of positive-definite kernels [Schölkopf and Smola, 2001]. These properties enable the use of weighted sums of kernels or even a spatial pyramid variant of (5), as proposed in Grauman and Darrell [2005] or Lazebnik et al. [2006].

In summary, the use of the kernel of (5) within a SVM classifier is aprincipled approach to the combination of 1) a semantic space image representation based on low-level appearance features with 2) a state-of-the-art kernel-based discriminant classifier that respects the structure of the semantic space.

A quantitative evaluation of the proposed classification strategy was performed with a *leave-one-pa*ti*ent-out protocol* recently adopted by many works [André et al., 2011; Häfner et al., 2012; Kwitt et al., 2011]. This is in contrast to previous studies that have primarily followed a simple *leave-one-sample-out* evaluation protocol [Kwitt et al., 2010; Häfner et al., 2010; André et al., 2009]. Leave-one-patient-out is more restrictive in the sense that *all* images from one patient are left out during SVM training. In a leave-one-sample-out protocol, only *one* image (over the whole collection) is left out per cross validation run. When there are several images from the same patient, this can bias the reported classification rates. In addition, leave-one-patient-out further implies that there is no bias with respect to 256 × 256 pixel regions coming from the same image as well. For those reasons, the comparison of classification rates is restricted to recently published results, on the same database, that follow the leave-one-patient-out protocol.

The only parameter of the proposed classifier that requires tuning is the SVM cost factor *C*, which we optimize on the training data in each leave-one-patient-out iteration using ten linearly spaced values of log *C* [−2,4]. We note that the proposed kernel is advantageous in the sense that it requires no tuning of kernel parameters such as the bandwidth of RBF kernels. Table 2 lists the average recognition rate (i.e., accuracy) for non-neoplastic vs. neoplastic lesion classification, together with sensitivity and specificity values. Sensitivity is defined as the total number of correctly classified images showing neoplastic lesions divided by the total number of images showing neoplastic lesions. The definition of specificity follows accordingly. The proposed approach is compared to a recent study of Häfner et al. [2012] which implements the *Opponent-Color Local-Binary-Pattern (OCLBP)* features of Mäenpää et al. [2002], the *Joint Color Multiscale LBP (JC-MB-LBP)* features of Häfner et al. [2009] as well as a new feature called *Local Color Vector Pattern (LCVP)*. Note that all the rates reported for these approaches were obtained from color (RGB) images, while we only use grayscale image information^{2}. To assess whether an approach produces statistically significant class predictions with respect to the top result (bold), we employ a McNemar test at 5% significance. Rejection of the null-hypothesis (i.e., *no* statistically significant difference) is indicated by an underlined classification accuracy in Table 2.

Comparison of leave-one-patient-out classification results to various state-of-the-art methods, on the database used in this work. In case there is *no* statistically significant difference (at 5% significance) in the class predictions with respect to the **...**

We emphasize that, in contrast to the previous studies, we have made no effort to find the optimal appearance features for pit patterns. The experiment is instead designed to demonstrate that classification is possible with a much lower-dimensional representation (cf. last column of Table 2) that *can* be interpreted by humans. In fact, the proposed approach is not a direct competitor to previously published works, since we pursue a somewhat orthogonal research direction: to facilitate *image understanding* of endoscopic imagery at a semantic level, and not so much to derive new features. The proposed approach is not restricted to SIFT features, and any future improvements in the development of features that capture medically relevant visual structures (e.g. pit patterns) can be used to design improved semantic spaces.

Providing future gastroenterologists with a tool to browse endoscopic imagery by *typical*it*y* of particular semantic concepts is a core motivation for the use of semantic image representations. The proposed representation addresses this problem very naturally. To identify the images most-characteristic of concept *t _{i}* (e.g. pit pattern III-L), it suffices to identify the subregion of the semantic simplex whose SMNs represent images where the t

Identifying the images, represented by SMNs, which are m*ost-characteristic* for concept *t*_{1} (i.e. pit pattern type I).

To evaluate the proposed strategy for semantically-focused browsing, we first perform a visual comparison of the browsing results to *text-book* illustrations of the different pit pattern types, shown in the top row of Fig. 5. The SMNs were sorted along the dimension corresponding to each concept, and the *K* top-ranked images were extracted. To establish a realistic clinical scenario we ensured that the extracted images do *not* belong to the same patient. We call this *patient pruning* of the result set. Figure 5 shows the images obtained when browsing for the *K* = 5 top-ranked images of each pit pattern. Images that remain after the patient pruning step are highlighted. Since the database images are not uniformly distributed over patients, the pruning step did not produce an equal number of browsing results per concept. Nevertheless, a comparison to the textbook illustrations reveals the desired correspondences, e.g., the characteristic gyrus-like structures of pit pattern IV, the round pits of pit pattern I, and the complete loss of structure for pit pattern V. Also presented is an incorrect browsing result for pit pattern III-L, depicting an image of type IV. This is a typical error due to the difficulty of distinguishing types I III-L and IV, which have similar structural elements.

Browsing result for querying the top *K* = 5 m*ost-characteristic* images per pit pattern. The subset of all images that remains after patient pruning is highlighted. The top row shows a schematic *textbook* visualization of the Kudo criteria for pit pattern **...**

In addition to the visual inspection of the results in Fig. 5, we conducted a more objective evaluation using the ground-truth caption vectors of each image. This was based on the *average error rate* of the system when browsing the *K* top-ranked images per concept. A leave-one-patient-out protocol was used: 1) the patient's images were removed from the database, 2) SMNs were estimated from the remaining images, 3) the *K* top-ranked images per concept were extracted (now using the whole database) and 4) the patient pruning step was performed. The average error rate was then calculated as the percentage of images (averaged over all leave-one-patient-out runs) in the final browsing result of concept *t _{i}* which do not match the corresponding ground-truth caption vectors (i.e., zero entry at the

The final application scenario considered in this work is to derive a semantic encoding representative of the *average-case* image within a given image collection (e.g., grouped by pit pattern). While Section 3.3 focused on the *corners* of the semantic simplex, i.e., the very characteristic cases, we now focus on the average-case images. Fig. 7 illustrates the difference between searching for the *most-characteristic* image for a particular concept versus searching for its *average-case* representative. Both have a clear merit: the first is of value for training purposes while the second facilitates visualization of what a physician typically sees in clinical practice.

Illustration of the difference between the *most-characteristic* image of a pit pattern type (here, type IV) and its *average-case*.

As in the previous application scenarios, it is possible to draw on resources from information geometry to tackle the problem in a principled way. Rather than computing the arithmetic mean of a collection of SMNs, which would not respect the structure of the semantic manifold, we compute its Frechét mean. The Frechét mean of a collection of *N* points **a**_{1},…, * a_{N}* on a general (connected) Riemannian manifold (
,

$$\mu =\text{arg}\underset{a\in \mathcal{M}}{\text{min}}\sum _{i=1}^{N}{\omega}_{i}{d}_{g}^{2}(\mathit{\text{a}},{\mathit{\text{a}}}_{i})$$

(6)

where *g* denotes the Riemannian metric and *d _{g}* denotes the geodesic distance among two points, induced by

$${\mu}_{k+1}={\text{exp}}_{{\mu}_{k}}[\frac{1}{N}\sum _{i=1}^{N}{\text{log}}_{{\mu}_{k}}({\mathit{\text{a}}}_{i})]$$

(7)

**μ**_{0} is a suitable initial value (e.g., chosen randomly from * a_{i}*,

To compute the Frechét mean from the SMNs corresponding to images labeled with a particular concept, we can again exploit the isometry between (^{C}^{−1},
) and (
^{−1},*δ*) (cf. Section 3.2). The *SMN Frechét mean* is then computed as

$${\mu}_{k+1}={\text{exp}}_{{\mu}_{k}}\left[\frac{1}{N}\sum _{i=1}^{N}{\text{log}}_{{\mu}_{k}}\frac{F({s}^{i})}{\Vert F({s}^{i})\Vert}\right]$$

(8)

On the unit-sphere, the Riemannian exponential and log map are given by

$${\text{log}}_{x}(\mathit{\text{y}})=\frac{\text{arccos}(\langle \mathit{\text{x}},\mathit{\text{y}}\rangle )}{\sqrt{1-{\langle \mathit{\text{x}},\mathit{\text{y}}\rangle}^{2}}}(\mathit{\text{y}}-\langle \mathit{\text{x}},y\rangle \mathit{\text{x}})$$

(9)

$${\text{exp}}_{x}(\mathit{\text{y}})=\text{cos}(\Vert \mathit{\text{y}}\Vert )\mathit{\text{x}}+\text{sin}(\Vert \mathit{\text{y}}\Vert ){\Vert \mathit{\text{y}}\Vert}^{-1}\mathit{\text{y}}$$

(10)

Since * μ_{k}* resides on the unit-sphere, it is necessary to project the Frechet mean back onto the simplex

The Frechét mean computation was used in combination with the geodesic distance to determine the *average-case* image per concept. The strategy is as follows: given the Frechet mean * _{t}* from the SMNs of the images labeled with concept t
, it is possible to find the image

$${r}_{t}=\text{arg}\underset{{\mathit{\text{s}}}_{i}\in {\mathcal{\text{D}}}_{t}}{\text{min}}{d}_{\mathcal{I}}(\overline{\mathit{\text{s}}},{\mathit{\text{s}}}_{i})$$

(11)

Fig. 8 shows the images closest to the average-case (upper part, top row) in terms of the Frechét mean as well as the Frechét mean of the corresponding SMNs (upper part, bottom row) itself, for each category. For comparison, the figure also shows the most-characteristic samples (lower part, top row) and corresponding SMNs (lower part, bottom row). When compared to the most-characteristic images, the introductory graphic of Fig. 1, or the browsing result of Fig. 5, the average-case images appear less typical of each pit pattern type. This reflects the observation that pit patterns rarely occur in a pure form, and mixtures of pit patterns are a very common occurrence in clinical practice.

In this work, we have proposed a new approach for endoscopic image analysis. It was argued that focusing on discriminative appearance features to predict histological findings leads to *black-box* decision-support systems which might lack acceptance by the medical community As a possible solution to this problem, we adopted a recent semantically-focused scene-recognition approach from computer vision to establish a semantic encoding of endoscopic images. Based on this encoding, and the induced semantic space, we demonstrated that classification, semantically-focused browsing, and the computation of class representatives can be implemented in a principled manner by drawing on results from information geometry. Experiments on a collection of high-magnification colonoscopy images have shown that classification in semantic space achieves accuracies similar to highly-optimized, appearance-feature based approaches with significantly lower feature space dimensionality. Moreover, the proposed strategy for browsing images by semantic typicality has been shown capable of retrieving the most-characteristic images per concept with small browsing error. Finally an analysis of the average-case images per concept revealed that the average-case may in fact be harder to interpret or categorize due to less distinctive structures compared to the textbook illustrations. In the future, we intend to develop a *difficulty* measure, similar to [André et al., 2010], for classifying endoscopic imagery possibly based on some sort of average geodesic distance to the most-representative samples.

- We establish a principled approach for semantic encoding of endoscopic imagery
- We show how to perform semantically-focused browsing
- We show how to perform histological predictions in semantic space
- We show how to identify average-case images for medically relevant visual details
- All tasks allow a natural formulation using principles from information geometry

This work is sponsored, in part, by NIH/NIBIB sponsored *National A*lli*ance of Medical Image Computing* (NA-MIC, PI: Kikinis, 1U54EB005149-01, http://www.na-mic.org) and NIH/NCI sponsored *Image Registration for Ultrasound-Based Neurosurgical Navigation* (NeuralNav/TubeTK, PI: Aylward, Wells, 1R01CA138419-01, http://public.kitware.com/Wiki/NeuralNav).

We provide a thorough proof that the kernel defined in (5) is conditionally positive-definite (cpd).

**Theorem 1** (Power Series of Dot-Product Kernels [Smola et al., 2000; Schölkopf and Smola, 2001]). *Let k*:
^{d}^{−1} ×
^{d}^{−1} → *denote a dot-product kernel on the unit sphere in a d-dimensional Hilbert space. The kernel is positive definite (pd*) *if and only if there is a function f*: → *such that k(*** x,y**) =

$$f(t)=\sum _{n=0}^{\infty}{c}_{n}{t}^{n}\mathit{\text{with}}{c}_{n}\ge 0.$$

(A.1)

According to this theorem, it suffices to check the coefficients of the Taylor series expansion of *f* for non-negativity to proof that *k* is a pd kernel. Note, that by *dot-product* we refer to the standard scalar product in * ^{d}*.

**Proposition 1**. The semantic kernel

$${k}_{s}(\text{x},\text{y})=-2\text{arccos}(\langle \sqrt{\text{x}},\sqrt{\text{y}}\rangle )\phantom{\rule{0.03em}{0ex}}\mathit{\text{w}}x,y\in [0,{1)}^{d}$$

(A.2)

and ║**x**║_{1} = 1, ║**y**║_{1} = 1 *is conditionally positive definite (cpd)*.

*Proof*. First, we note that for any *x* [0, 1)* ^{d}* with ║

$$g\left(t\right)=\pi -2\text{arccos}\left(t\right),$$

(A.3)

we can write the semantic kernel of (A.2) as

$${k}_{s}(\mathit{\text{x}},\mathit{\text{y}})=g(\langle \sqrt{\mathit{\text{x}}},\sqrt{\mathit{\text{y}}}\rangle )-\pi .$$

(A.4)

Our first step is to show that the dot-product kernel
${k}_{g}(\mathit{\text{x}},\mathit{\text{y}}):=g(\langle \sqrt{\mathit{\text{x}}},\sqrt{\mathit{\text{y}}}\rangle )$ induced by *g* satisfies the condition of Theorem 1 and is thus pd. Hence, we first build the Taylor series expansion of arccos(*x*) around 0, i.e.

$$\text{arros}(x)=\frac{\pi}{2}-\sum _{n=0}^{\infty}\frac{\left(2n\right)!}{{2}^{2n}{(n!)}^{2}}\frac{1}{2n+1}{x}^{2n+1}\phantom{\rule{0.03em}{0ex}}\forall x:|x|<1.$$

(A.5)

The convergence condition |*x*| < 1 is satisfied, as |* x*,

$$g(t)=\sum _{n=0}^{\infty}{c}_{n}{t}^{2n+1}\phantom{\rule{0.03em}{0ex}}\text{with}\phantom{\rule{0.03em}{0ex}}{c}_{n}=\frac{(2n)!}{{2}^{2n}{(n!)}^{2}}\frac{2}{2n+1}$$

(A.6)

and we immediately see ∀*n: c _{n}* ≥ 0. It follows that the dot-product kernel

$${k}_{s}(\mathit{\text{x}},\mathit{\text{y}})=\underset{\mathit{\text{cpd}}}{\underset{\}}{{k}_{g}(\mathit{\text{x}},\mathit{\text{y}})}}+\underset{\mathit{\text{cpd}}}{\underset{\}}{(-\pi )}}$$

(A.7)

completes the proof.

^{1}LEAR implementation: http://lear.inrialpes.fr/people/dorko/downloads.html

^{2}The results reported by Häfner et al. [2012] are slightly higher when using LAB

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

- André B, Buchner TVA, Shahid M, Wallace M, Ayache N. An image retrieval approach to setup difficulty levels in training systems for endomicroscopy diagnosis. MICCAI; 2010. [PubMed]
- André B, Buchner TVA, Wallace M, Ayache N. A smart atlas for endomicroscopy using automated video retrieval. Med Image Anal. 2011;15:460–476. [PubMed]
- André B, Vercauteren T, Ayache N. Endomicroscopic video retrieval using mosaicing and visual words. ISBI; 2010.
- André B, Vercauteren T, Buchner AM, Wallace M, Ayache N. IEEE Trans on Med Imag. 2012. Learning semantic and visual similarity for endomicroscopy video retrieval. To appear. [PubMed]
- André B, Vercauteren T, Perchant A, Buchner AM, Wallace MB, Ayache N. Endomicroscopic image retrieval and classification using invariant visual features. ISBI; 2009.
- Arthur D, Vassilvitskii S. k-means++: The advantages ofcareful seeding. SODA; 2007.
- Boureau YL, Bach F, LeCun Y, Ponce J. Learning mid-level features for recognition. CVPR; 2010.
- Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Roy J Statistical Society – Series B. 1977;39:1–38.
- Dixit M, Rasiwasia N, Vasconcelos N. Adapted gaussian mixtures for image classification. CVPR; 2011.
- Fei-Fei L, Perona P. A bayesian hierarchical model for learning natural scene categories. CVPR; 2005.
- Grauman K, Darrell T. Pyramid match kernels: Discriminative classification with sets of image features. ICCV; 2005.
- Häfner M, Liedlgruber M, Uhl A, Gangl M, Vecsei A, Wrba F. Pit pattern classication using extended local binary patterns. ITAB; 2009.
- Häfner M, Liedlgruber M, Uhl A, Vecsei A, Wrba F. Color treatment inendoscopic image classification using multi-scale local color vector patterns. Med Image Anal. 2012;16:75–86. [PMC free article] [PubMed]
- Häfner M, Liedlgruber M, Uhl A, Wrba F, Vecsei A, Gangl A. Endoscopic image classification using edge-based features. ICPR; 2010.
- Kudo S, Hirota S, Nakajima T, Hosobe S, Kusaka H, Kobayashi T, Himori M, Yagyuu A. Colorectal tumours and pit pattern. J Clin Pathol. 1994;47:880–885. [PMC free article] [PubMed]
- Kwitt R, Rasiwasia N, Vasconcelos N, Uhl A, Häfner M, Wrba F. Learning pit pattern concepts for gastroenterological training. MICCAI; 2011. [PubMed]
- Kwitt R, Uhl A, Häfner M, Gangl A, Wrba F, Vécsei A. Predicting the histology of colorectal lesions in a probabilistic framework. MMBIA; 2010.
- Lazebnik S, Schmid C, Ponce J. Beyond bags of features: Spatial pyramid matching for recognizing scene categories. CVPR; 2006.
- Lebanon G. Riemannian Geometry and Statistical Machine Learning Ph D thesis. Carnegie Mellon University; 2005.
- Lowe D. Distinctive image features from scale-invariant keypoints. IJCV. 2004;60:91–110.
- Mäenpää T, Pietikäinen T, Viertola M. Separating color and ppattern information for color texture discrimination. ICPR; 2002.
- Maron O. Multiple-instance learning for natural scene classification. ICML; 1998.
- Pennec X. Intrinsic statistics on riemannian manifolds: basic tools for geometric measurements. J Math Imaging Vis. 2006;25:127–154.
- Rasiwasia N, Moreno P, Vasconcelos N. Query by semantic example. ACM CIVR; 2006.
- Rasiwasia N, Vasconcelos N. Scene classification with low-dimensional semantic spaces and weak supervision. CVPR; 2008.
- Schölkopf B. The kernel trick for distances. NIPS; 2000.
- Schölkopf B, Smola A. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press; Cambridge, MA, USA: 2001.
- Schwaninger A, Vogel J, Hofer F, Schiele B. A psychophysically plausible model for typicality ranking of natural scenes. ACM Trans Appl Percept. 2006;3:333–353.
- Smola A, Ovari Z, Williamson R. Regularization with dot-product kernels. NIPS; 2000.
- Tischendorf J, Gross S, Winograd R, Hecker H, Auer R, Behrens A, Trautwein C, Aach T, Stehle T. Computer-aided classification of colorectal polyps based on vascular patterns: a pilot study. Endoscopy. 2010;42:203–207. [PubMed]
- Tung SY, Wu CS, Su MY. Magnifying colonoscopy in differentiating neoplastic from nonneoplastic lesions. Am J Gastroenterol. 2001;96:2628–2632. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |