Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2895276

Formats

Article sections

- Abstract
- 1. Introduction
- 2. The Adaptive Scale Kernel Consensus (ASKC) estimator
- 3. Structure and motion recovery with ASKC
- 4. Experiments
- 5. Conclusions
- References

Authors

Related links

Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. Author manuscript; available in PMC 2010 July 1.

Published in final edited form as:

Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2008 June 23; 2008: 1–7.

doi: 10.1109/CVPR.2008.4587687PMCID: PMC2895276

NIHMSID: NIHMS184362

See other articles in PMC that cite the published article.

To correctly estimate the camera motion parameters and reconstruct the structure of the surrounding tissues from endoscopic image sequences, we need not only to deal with outliers (e.g., mismatches), which may involve more than 50% of the data, but also to accurately distinguish inliers (correct matches) from outliers. In this paper, we propose a new robust estimator, Adaptive Scale Kernel Consensus (ASKC), which can tolerate more than 50 percent outliers while automatically estimating the scale of inliers. With ASKC, we develop a reliable feature tracking algorithm. This, in turn, allows us to develop a complete system for estimating endoscopic camera motion and reconstructing anatomical structures from endoscopic image sequences. Preliminary experiments on endoscopic sinus imagery have achieved promising results.

Endoscopic anterior skull based surgery has the potential to significantly reduce patient morbidities associated with operating on the undersurface of the front third of the brain. Of the anterior skull based approaches, the endoscopic transnasal approach to the sphenoid sinus, which is a small structure and is surrounded by major blood vessels, is most mature and utilized. Surgery in this area is technically challenging and requires an accurate appreciation of the patient’s anatomy. Failure to correctly interpret a patient’s anatomy can result in catastrophic outcomes.

Traditional navigation systems [1, 2] rely on an external tracking system and fiducial or anatomical landmarks for registration. These systems have many fundamental limitations [3] in terms of accuracy and flexibility with the workflow in the operating room. Another approach to surgical navigation systems is to directly register endoscopic images to the patient anatomy [3, 4, 5]. However, this is nontrivial because endoscopic images involve a number of challenges such as low texture, abundant specularities and extreme illumination changes from the light source attached to the endoscope, and blurring from the movement of the endoscope. These difficulties may result in a number of outliers (including both feature localization errors and mismatches) which can not be easily handled by traditional robust statistical methods such as LMedS [6] and RANSAC [7].

To recover the surface structure of surrounding tissues and further to register this information against a preoperative volumetric image (such as CT or MRI), we need not only to accurately estimate the motion of an endoscopic camera from endoscopic image sequences but also to correctly distinguish inliers from outliers. This can be realized by employing advanced techniques from robust statistics.

Various robust estimation techniques have appeared in the literature during the last decades. Maximum-likelihood estimators (M-estimators) [8] minimize the sum of symmetric, positive-definite functions of residuals with a unique minimum at zero. The Least Median of Squares (LMedS) estimator [6] minimizes the median of squared residuals. However, it has been shown that the breakdown points of M-estimators and LMedS are no more than 50%,. Chen and Meer [9] modified the cost function of the M-estimators to create a projection based M-estimator (pbM-estimator). The authors of [10] and [11] further improved the performance of the pbM-estimator by modifying its objective function. All of these modifications are concentrated on the projection pursuit paradigm [9]. RANSAC [7] and its variant MSAC [12] can resist the influence of more than 50% outliers. However, the performance of RANSAC and MSAC depends on a user-specified *error tolerance* (or the scale of inliers), which is not known *a priori* in many practical environments. MUSE [13], MINPRAN [14], ALKS [15], RESC [16] and ASSC [17] can deal with more than 50% outliers. However, MUSE needs a lookup table for the scale estimator correction. MINPRAN and ALKS are computationally expensive and cannot effectively deal with multiple structures with extreme outliers. RESC needs the user to tune many parameters. ASSC weights all inliers equally, thus it is less efficient.

The main contributions in this paper are: (1) we employ kernel density estimation techniques to create a new robust estimator, Adaptive Scale Kernel Consensus (ASKC) which can simultaneously estimate both the model parameters and the scale of inliers. ASKC can be treated as a generalized form of RANSAC [7] and ASSC [17] (see Section 2 for details); (2) we propose an effective feature tracking approach; and, (3) we integrate the robust ASKC estimator and the feature tracking approach into a complete system for estimating endoscopic camera motion and performing surface reconstruction of sinus anatomy from endoscopic image sequences. Experiments show our system has achieved promising results.

Given a model parameter estimate , the fixed bandwidth kernel density estimate with the kernel *K*(.) and a bandwidth *h* can be written as [18]:

$${\widehat{f}}_{\widehat{\mathit{\theta}}}(r)=\frac{1}{\mathit{\text{nh}}}{\displaystyle \sum _{i=1}^{n}K\text{}\left(\frac{r-{r}_{i,\widehat{\mathit{\theta}}}}{h}\right)}$$

(1)

where {*r*_{i,}}_{i=1,…,n} is the residuals and *n* is the number of data points.

An alternative is to select a different bandwidth *h* = *h*() *h*_{} for each value of . The variable bandwidth kernel density estimate can be written as:

$${\widehat{f}}_{\widehat{\mathit{\theta}}}(r)=\frac{1}{n}{\displaystyle \sum _{i=1}^{n}\frac{1}{{h}_{\widehat{\mathit{\theta}}}}K\text{}\left(\frac{r-{r}_{i,\widehat{\mathit{\theta}}}}{{h}_{\widehat{\mathit{\theta}}}}\right)}$$

(2)

In this paper, we consider two popular kernels, the Epanechnikov kernel *K _{E}*(

$${K}_{E}(r)=\{\begin{array}{cc}\frac{3}{4}(1-{\Vert r\Vert}^{2})\hfill & \Vert r\Vert \le 1\hfill \\ 0\hfill & \Vert r\Vert >1\hfill \end{array},\text{with}{k}_{E}(r)=\{\begin{array}{cc}1-r\hfill & 0\le r\le 1\hfill \\ 0\hfill & r1\hfill \end{array}$$

(3)

$${K}_{N}(r)=\frac{1}{{(2\pi )}^{\frac{1}{2}}}\text{exp}(-\frac{1}{2}{\Vert r\Vert}^{2}),\text{with}{k}_{N}(r)=\text{exp}(-\frac{r}{2})\text{}r\ge 0$$

(4)

The Epanechnikov kernel yields the minimum asymptotic mean integrated square error (AMISE) measure. However, the Epanechnikov profile is not differentiable at the boundary. As pointed out by the authors of [19], the path of the mean shift procedure employing a normal kernel follows a smooth trajectory.

Although we are interested in investigating the properties of ASKC with the Epanechnikov kernel (termed as ASKC1) and the normal kernel (termed as ASKC2) in this paper, our method can employ arbitrary kernels.

As noted above, the bandwidth *h* is a crucial parameter in kernel density estimation. An over-smoothed bandwidth selector with the scale estimate _{θ} is suggested in [20].

$${\widehat{h}}_{\mathit{\theta}}={\left[\frac{243{\displaystyle {\int}_{-1}^{1}K{(r)}^{2}\mathit{\text{dr}}}}{35n{\displaystyle {\int}_{-1}^{1}{r}^{2}K(r)\mathit{\text{dr}}}}\right]}^{1/5}{\widehat{\sigma}}_{\mathit{\theta}}$$

(5)

It is recommended that the bandwidth is set as *c _{h}*

Robust scale estimators (such as the median [6], the MAD [9], or the robust *k* scale estimator [15]) can be employed to yield a scale estimate. The authors of [17] have shown that TSSE, which employs the mean shift and the mean shift valley procedure, can effectively estimate the scale under multiple modes. The valley closest to zero detected by the mean shift valley procedure on the ordered absolute residuals can be a sensitive point to determine the inliers/outliers dichotomy.

In our method, we use a procedure similar to TSSE. We use a robust *k* scale estimator (the *k* value is set to 0.1 so that at least 10 percent of the data points are included in the shortest window) to yield an initial scale estimate. In [17], the authors use the Epanechnikov kernel for both the mean shift and the mean shift valley approaches. This can be different in our case when we use different kernels.

Figure 1 shows the procedure of the TSSE-like scale estimator. When the model parameter estimate is incorrect, the detected valley is far away from the origin and the kernel density estimate at the origin is lower. In contrast, when the estimate is correct, the residual value corresponding to the detected valley is closer to the origin and the kernel density at the origin is higher.

We assume that inliers involve a relative majority of the data, i.e., inliers may involve less than 50% of the data but they involve more data points than structured pseudo-outliers. Our method considers the kernel density at the origin point as its objective function. Given a set of residuals {*r*_{i,}}_{i=1,…,n} subject to , the objective function of ASKC is:

$${\widehat{f}}_{\widehat{\mathit{\theta}}}\text{}({r}_{0}^{*})=\frac{1}{\mathit{n}}{\displaystyle \sum _{i=1}^{n}\frac{1}{{h}_{\widehat{\mathit{\theta}}}}K\text{}\left(\frac{{r}_{i,\widehat{\mathit{\theta}}}}{{h}_{\widehat{\mathit{\theta}}}}\right)}$$

(6)

The ASKC estimator can be written as:

$$\widehat{\mathit{\theta}}=\underset{\widehat{\mathit{\theta}}}{\text{argmax}}{\widehat{f}}_{\widehat{\mathit{\theta}}}\text{}({r}_{0}^{*})=\underset{\widehat{\mathit{\theta}}}{\text{argmax}}\frac{1}{n}{\displaystyle \sum _{i=1}^{n}\frac{1}{{h}_{\widehat{\mathit{\theta}}}}K\text{}\left(\frac{{r}_{i,\widehat{\mathit{\theta}}}}{{h}_{\widehat{\mathit{\theta}}}}\right)}$$

(7)

If we consider the RANSAC estimator [7]:

$$\widehat{\mathit{\theta}}=\underset{\widehat{\mathit{\theta}}}{\text{argmax}}\text{}{\widehat{\mathbf{n}}}_{\widehat{\mathit{\theta}}}$$

(8)

and the ASSC estimator [17]:

$$\widehat{\mathit{\theta}}=\underset{\widehat{\mathit{\theta}}}{\text{argmax}}\text{}({\widehat{\mathbf{n}}}_{\widehat{\mathit{\theta}}}/{\widehat{S}}_{\widehat{\mathit{\theta}}})$$

(9)

where ** _{}** is the number of inliers within an error tolerance (for RANSAC) or the scale of inliers (for ASSC) and

$${K}_{U}(r)=\{\begin{array}{cc}C\hfill & \Vert r\Vert \le 1\hfill \\ 0\hfill & \text{otherwise}\hfill \end{array}$$

(10)

where *C* is a normalization constant.

More specifically, RANSAC is one case of ASKC with the uniform kernel and a fixed bandwidth, and ASSC is another case of ASKC with the uniform kernel and a variable bandwidth. However, the efficiency of the uniform kernel is low as it weights all inliers equally.

To get the solution of equation (7), we need to sample a set of candidates. We can employ a random sampling scheme [6, 7], or a guided sampling technique [21].

Figure 2 shows a histogram of ASKC scores (equation 6) computed from 10000 random samples from the data in Figure 1 (a). It shows that most of the samples have small score values which means that the samples are most likely contaminated with outliers. To improve the computational efficiency, it is not necessary to run the TSSE-like procedure for all samples. We only run the TSSE-like procedure for the samples with high ASKC scores. With this strategy, only about 7% of the 10000 samples are further processed with the TSSE-like procedure.

The procedure of the ASKC estimator is shown in Figure 3. In step 3, the purpose using data other than the sample candidate is to avoid extreme low scale estimates. In step 5, an additional TSSE-like procedure may refine the scale estimate for heavily contaminated data.

In this subsection, we test the performance of the ASKC estimator employing the Epanechnikov kernel (ASKC1) and the normal kernel (ASKC2) and we compare the performance of ASKC1/ASKC2 with those of several other robust estimators (ASSC, RESC, and LMedS).

In the first example, we generate three lines (each line contains 40 data points) and 380 random outliers. We apply the robust estimators to sequentially extract all three lines. As shown in Figure 4, both RESC and LMedS fail to extract any line. ASSC extracts one line but fails in two. ASKC1/ASKC2 successfully extract all three lines.

In the second example, we use 3D data. There are 500 data points including 4 planes (each contains 50 data points) and 300 randomly distributed outliers. Likewise, we sequentially extract all planes with the robust estimators.

Figure 5 shows that ASKC1/ASKC2 correctly extract all planes. In contrast, RESC succeeds on 2 planes while ASSC succeeds on 3. LMedS fails to extract any plane.

We now consider a camera observing a 3D point **X** on a surface from two camera positions, the point **X** will project to two image locations **x**_{1} = (*u*_{1},*v*_{1},1)^{T} and **x**_{2} = (*u*_{2},*v*_{2},1)^{T}. The following condition holds [22]:

$${\mathbf{x}}_{2}{{}^{\mathrm{T}}\mathcal{K}}_{2}^{-\mathrm{T}}{[\mathbf{\Gamma}]}_{x}\mathbf{R}{\mathcal{K}}_{1}^{-1}{\mathbf{x}}_{1}=0$$

(11)

where _{1} and _{2} are respectively the intrinsic camera matrices corresponding to the two images. [**Γ**]_{x} is the skew matrix of the translation vector **Γ** and **R** is the rotation matrix.

The essential matrix **E**=[**Γ**]_{x}**R** encodes the motion information of the camera. Given the camera matrices, the essential matrix **E** can be estimated using the nonlinear five-point algorithm [23]. The camera motion (**R** and **Γ** ) can be recovered from **E** by the Singular Value Decomposition (SVD) approach [22] and the translation can only be estimated up to a scale factor (we use **Γ̃** to represent the estimated scaled translation vector and **Γ** = λ**Γ̃**). The scale λ can be recovered by registering the reconstructed 3D model to a pre-operative CT scan [3].

To estimate the motion parameters of a camera between a pair of images, we need to robustly detect features in the images and then match these features. We employ the SIFT feature detector [24] in our method. To find the matches between feature points, we use the SVD matching algorithm [25]. The reason that we employ the SVD matching algorithm rather than the SIFT matching function [24] is that we have found that the SVD matching approach can return more correct matches.

Figure 6 shows one example where ASKC can correctly estimate the epipolar geometry and the scale of inliers, and select most correct matches even when outlier percentage is larger than 70%.

We need to track SIFT features through a video sequence to derive the projection matrix at each frame and further recover the structure. To track a set of SIFT features {*S _{i}*}

(1) If a feature ${\mathcal{S}}_{j}^{F-1}$ of the match $({\mathcal{S}}_{j}^{F-1},{\mathcal{S}}_{j}^{F})$ has a correspondence with ${\U0001d4c1}_{i}^{F-1}=({u}_{i}^{F-1},{v}_{i}^{F-1},{w}_{i}^{F-1},{s}_{i}^{F-1})$ in the feature list L^{F−1}, the status of ${\mathcal{S}}_{j}^{F}$ in the list L^{F} is labeled as “*active*” and ${s}_{i}^{F}=1$. In this case, the location $({u}_{i}^{F},{v}_{i}^{F})\text{of}{\U0001d4c1}_{i}^{F}$ is updated by the image coordinates $({u}_{{\mathcal{S}}_{j}}^{F},{v}_{{\mathcal{S}}_{j}}^{F})\text{of}{\mathcal{S}}_{j}^{F},\text{and}{w}_{i}^{F}={w}_{i}^{F-1}+1.{\mathrm{C}}^{F}$ is updated with ${\mathcal{C}}_{i}^{F}={\mathcal{C}}_{i}^{F-1}\cup ({u}_{{\mathcal{S}}_{j}}^{F},{v}_{{\mathcal{S}}_{j}}^{F})$.

(2) If there is no correspondence between ${\U0001d4c1}_{i}^{F-1}\text{and}{\{{\mathcal{S}}_{j}^{F-1}\}}_{j=1,\dots ,n,\text{}{s}_{j}^{F}}$ is labeled as “*inactive*” and ${s}_{i}^{F}=-1$. We set ${w}_{i}^{F}=0$. When the number of times that the value of ${w}_{i}^{F}$ continuously remains zero is larger than a threshold, we assume the feature is out of view and it is removed from the list ^{F} and the chain matrix ^{F}.

(3) If there is no correspondence between ${\mathcal{S}}_{j}^{F-1}\text{and}{\{{\U0001d4c1}_{i}^{F-1}\}}_{i=1,\dots ,m\text{'}}$, we add the new feature ${\mathcal{S}}_{j}^{F}$ to L^{F} and ${s}_{i}^{F}$ is labeled as “*new*”. We set $({u}_{m\prime +1}^{F},{v}_{m\prime +1}^{F})=({u}_{{S}_{j}}^{F},{v}_{{S}_{j}}^{F}),{w}_{m\prime +1}^{F}=1\text{and}{s}_{m\prime +1}^{F}=0.\text{}{\mathrm{C}}^{F}$ is initialized with ${\mathcal{C}}_{m\prime +1}^{F}=({u}_{{S}_{j}}^{F},{v}_{{S}_{j}}^{F})$ and the value of *m*’ is update (*m*′=*m*′+1).

Figure 7 summarizes the procedures of the SIFT feature tracking algorithm. The trajectories of the tracked SIFT features on an endoscopic sinus image sequence are shown in Figure 8. We can see that most significant SIFT features are tracked. Even when the image is seriously blurred, there are still sufficient SIFT features tracked.

We assume a calibrated camera is used and the optical distortion is removed by undistortion [27].

Let **X**_{i} = (*X _{i}*,

$${\mathbf{x}}_{i}^{F}={\mathbf{P}}_{F}{\mathbf{X}}_{i}$$

(12)

Let the first camera be at the center of the world coordinate, we have:

$${\mathbf{P}}_{1}=\mathcal{K}[\mathbf{I}|0]\text{and}{\mathbf{P}}_{F}=\mathcal{K}[{}^{1}{\mathbf{R}}_{F}|{}^{1}{\mathbf{\Gamma}}_{F}]$$

(13)

where ^{1}**R**_{F} and ^{1}**Γ*** _{F}* are respectively the rotation and the translation of the camera at the

At the beginning, the structure is initialized using two selected frames through triangulation [22].

For a new frame *F*, we relate it to its previous frame *F* − 1. Assuming we have known **P**_{F−1} = [^{1}**R**_{F−1} | ^{1}**Γ**_{F−1}] at the frame *F*−1, **P**_{F} can be written as:

$${\mathbf{P}}_{F}=\mathcal{K}[{}^{F-1}{\mathbf{R}}_{F}{}^{1}{\mathbf{R}}_{F-1}|{}^{F-1}{\mathbf{R}}_{F}{}^{1}{\mathbf{\Gamma}}_{F-1}+{\mathrm{\lambda}}_{F}{}^{F-1}{\tilde{\mathbf{\Gamma}}}_{F}]$$

(14)

Let:

$${\mathcal{C}}_{F}={}^{F-1}{\mathbf{R}}_{F}{}^{1}{\mathbf{\Gamma}}_{F-1},\mathcal{K}\equiv {[{\U0001d4c0}_{1}\text{}{\U0001d4c0}_{2}\text{}{\U0001d4c0}_{3}]}^{\mathrm{T}}\text{and}{\mathbf{P}}_{F}=\left[\begin{array}{ccc}{p}_{11}^{F}\hfill & {p}_{12}^{F}\hfill & {p}_{13}^{F}\hfill \\ {p}_{21}^{F}\hfill & {p}_{22}^{F}\hfill & {p}_{23}^{F}\hfill \\ {p}_{31}^{F}\hfill & {p}_{32}^{F}\hfill & {p}_{33}^{F}\hfill \end{array}\right]$$

(15)

From equations (12), (14) and (15), we can derive:

$$\begin{array}{c}{u}_{i}^{F}=\frac{{p}_{11}^{F}{\mathit{X}}_{i}+{p}_{12}^{F}{\mathit{Y}}_{i}+{p}_{13}^{F}{\mathit{Z}}_{i}+{\U0001d4c0}_{\mathbf{1}}{\mathcal{C}}_{F}+{\mathrm{\lambda}}_{F}(i){\U0001d4c0}_{\mathbf{1}}{}^{F-1}{\tilde{\mathbf{\Gamma}}}_{F}}{{p}_{31}^{F}{\mathit{X}}_{i}+{p}_{32}^{F}{\mathit{Y}}_{i}+{p}_{33}^{F}{\mathit{Z}}_{i}+{\U0001d4c0}_{\mathbf{3}}{\mathcal{C}}_{F}+{\mathrm{\lambda}}_{F}(i){\U0001d4c0}_{\mathbf{3}}{}^{F-1}{\tilde{\mathbf{\Gamma}}}_{F}}\hfill \\ {v}_{i}^{F}=\frac{{p}_{21}^{F}{\mathit{X}}_{i}+{p}_{22}^{F}{\mathit{Y}}_{i}+{p}_{23}^{F}{\mathit{Z}}_{i}+{\U0001d4c0}_{2}{\mathcal{C}}_{F}+{\mathrm{\lambda}}_{F}(i){\U0001d4c0}_{2}{}^{F-1}{\tilde{\mathbf{\Gamma}}}_{F}}{{p}_{31}^{F}{\mathit{X}}_{i}+{p}_{32}^{F}{\mathit{Y}}_{i}+{p}_{33}^{F}{\mathit{Z}}_{i}+{\U0001d4c0}_{\mathbf{3}}{\mathcal{C}}_{F}+{\mathrm{\lambda}}_{F}(i){\U0001d4c0}_{\mathbf{3}}{}^{F-1}{\tilde{\mathbf{\Gamma}}}_{F}}\hfill \end{array}$$

(16)

If we define the following:

$$\begin{array}{c}{\mathbf{A}}_{i}^{F}=\left(\begin{array}{c}{u}_{i}^{F}{\U0001d4c0}_{\mathbf{3}}{}^{F-1}{\tilde{\mathbf{\Gamma}}}_{F}-{\U0001d4c0}_{\mathbf{1}}{}^{F-1}{\tilde{\mathbf{\Gamma}}}_{F}\hfill \\ {v}_{i}^{F}{\U0001d4c0}_{\mathbf{3}}{}^{F-1}{\tilde{\mathbf{\Gamma}}}_{F}-{\U0001d4c0}_{\mathbf{2}}{}^{F-1}{\tilde{\mathbf{\Gamma}}}_{F}\hfill \end{array}\right)\hfill \\ {\mathbf{B}}_{i}^{F}=\left(\begin{array}{c}{p}_{11}^{F}{\mathit{X}}_{i}+{p}_{12}^{F}{\mathit{Y}}_{i}+{p}_{13}^{F}{\mathit{Z}}_{i}+{\U0001d4c0}_{\mathbf{1}}{\mathcal{C}}_{F}-{u}_{i}^{F}{p}_{31}^{F}{\mathit{X}}_{i}-{u}_{i}^{F}{p}_{32}^{F}{\mathit{Y}}_{i}-{u}_{i}^{F}{p}_{33}^{F}{\mathit{Z}}_{i}-{u}_{i}^{F}{\U0001d4c0}_{\mathbf{3}}{\mathcal{C}}_{F}\hfill \\ {p}_{21}^{F}{\mathit{X}}_{i}+{p}_{22}^{F}{\mathit{Y}}_{i}+{p}_{23}^{F}{\mathit{Z}}_{i}+{\U0001d4c0}_{\mathbf{2}}{\mathcal{C}}_{F}-{v}_{i}^{F}{p}_{31}^{F}{\mathit{X}}_{i}-{v}_{i}^{F}{p}_{32}^{F}{\mathit{Y}}_{i}-{v}_{i}^{F}{p}_{33}^{F}{\mathit{Z}}_{i}-{v}_{i}^{F}{\U0001d4c0}_{\mathbf{3}}{\mathcal{C}}_{F}\hfill \end{array}\right)\hfill \end{array}$$

(17)

We can calculate the scale value λ_{F} by:

$${\mathrm{\lambda}}_{F}\text{}(i)={({\mathbf{A}}_{F,i}^{\mathrm{T}}{\mathbf{A}}_{F,i})}^{-1}{\mathbf{A}}_{F,i}^{\mathrm{T}}{\mathbf{B}}_{F,i}$$

(18)

However, as both the feature’s location {**x**_{i}} and the 3D points may be in error, we estimate λ_{F} in a robust way:

$${\widehat{\mathrm{\lambda}}}_{F}=\underset{{\mathrm{\lambda}}_{F}(i)}{\text{argmax}}\frac{1}{n}{\displaystyle \sum _{j=1}^{n}\frac{1}{{h}_{j}}K\text{}\left(\frac{{r}_{j}}{{h}_{j}}\right)}$$

(19)

where ${r}_{j}={\displaystyle \sum \left|{\mathbf{x}}_{j}^{F}-{\mathbf{P}}_{\mathrm{F}}({\mathrm{\lambda}}_{F}\text{}(i)){\widehat{\mathbf{X}}}_{i}\right|}$ and *h _{j}* is estimated from equation (5) with the robust

After _{F} is estimated, the 3D points {**X**_{i}} having correspondences to the tracked SIFT features are refined:

$${\widehat{\mathbf{X}}}_{i}=\mathit{\text{Minimize}}{\displaystyle \sum _{j=0}^{{w}_{i}-1}{\displaystyle \sum \left|{\mathbf{x}}_{i}^{F-j}-{\widehat{\mathrm{P}}}_{F-j}{\widehat{\mathbf{X}}}_{i}\right|}}$$

(20)

Newly appearing 3D points are initialized and added to the structure. Figure 9 gives an outline of the reconstruction algorithm.

We collected endoscopic sinus image data on a cadaverous porcine specimen. Images were captured using a Storz Telecam, 202212113U NTSC with a zero degree rigid rod monocular endoscope, 7210AA. An external tracking system (Optotrack, Northern Digital Corp. Waterloo) was used to measure and record the motion of the endoscope during the procedure of image acquisition and we use the Optotrack motion data as the ground truth to which the estimated endoscopic motion was compared. Images from a standard optical calibration target were also recorded using the endoscope before the data collection was performed. We perform an offline calibration [26] of the endoscope using a Matlab Camera Calibration Toolkit [27].

To evaluate the performance of our system, first, we compare our proposed robust estimator ASKC (ASKC1/ASKC2) with five other robust estimators (LMedS, MSAC, RANSAC, RESC and ASSC) in motion estimation. Following [12], we used a median scale estimator for MSAC. For RANSAC, we specify the error tolerance value with which optimal results are achieved.

To get quantitative results, we apply the methods to one hundred pairs of endoscopic sinus images. The distance between the positions of the endoscopic camera in each pair of images is larger than 1mm. To measure the accuracy of the motion estimation, both translation error and rotation error are tested. We use a formula similar to that of [28].

Each of the methods is run for the 100 pairs of images. The median error values, the mean error values and the standard variances of the estimate errors in translation and rotation are used to evaluate the performance of the methods.

From Table 1, our methods (ASKC1/ASKC2) achieve the most accurate results among the comparative methods. LMedS and MSAC achieve the worst results as the median scale estimator is not robust to more than 50% outliers. RANSAC with a user-specified error tolerance achieves better results than LMedS and MSAC, but worse than the rest. This is because RANSAC requires different error tolerance values for different image pairs and it is hard to find a global optimal value. The results of ASSC are better than those of RESC but less accurate than those of ASKC1/ASKC2. Between ASKC1 and ASKC2, ASKC2 outperforms ASKC1 in the translation estimation while ASKC1 is slightly better in rotation estimation.

We test our reconstruction algorithm with a sinus image sequence including 130 frames with the frame size of 640×480. The endoscope performed several movements (sideways, forward, and backward) during the acquisition of the image sequence. The image sequence was digitally captured at a rate of roughly 30 frames per second. As a result, the baselines between the consecutive frames are too close together which results in ill-conditioned epipolar geometry estimation. To avoid this problem, we only consider a set of key frames that are far enough apart for the motion and structure recovery.

We use the reconstruction algorithm proposed in subsection (3.3) to recover the structure of the sinus. We choose to use ASKC2 in the system but ASKC1 can also be employed in the system.

Figure 10 shows the reconstruction results on the sinus image sequence. As we can see the main structure of the surrounding tissues of the sinus is recovered by our system. In comparison, when we use the LMedS estimator and estimate the projection matrices {**P**_{i}}_{i=1,2, …, N} by the approach in [29] (we call it as M1), it fails to recover the structure and most recovered 3D points are clustered in a small area pointed out by the arrow (see the middle and right column of the bottom row in Figure 10).

In this paper, we present a new robust estimator (ASKC) that can tolerate more than 50% (or even 80%) outliers. We also propose a reliable feature tracking algorithm that can track features even when images involve significant blurring, illumination changes and geometry distortion. We integrate ASKC and the feature tracking approach to a complete system for motion and structure recovery from sinus endoscopic image sequences. The primarily experiments show that ASKC outperforms several other robust estimators (including LMedS, MSAC RANSAC, RESC, and ASSC) and our system has achieved promising results.

This work has been supported by the National Institutes of Health under grant number 1R21EB005201 - 01A1.

1. Kosugi Y, et al. An Articulated Neurosurgical Navigation System Using MRI and CT Images. T-BME. 1988:147–152. [PubMed]

2. Scholtz M, et al. Development of an Endoscopic Navigating System Based on Digital Image Processing. Journal of Computer Aided Surgery. 1998;3(3):134–143. [PubMed]

3. Burschka D, et al. Scale-Invariant Registration of Monocular Endoscopic Images to CT-Scans for Sinus Surgery Medical Image Analysis. 2005;9(5):413–439. [PubMed]

4. Helferty JP, Higgins WE. Technique for Registering 3D Virtual CT Images to Endoscopic Video. ICIP. 2001:893–896.

5. Mori K, et al. A Method for Tracking the Camera Motion of Real Endoscope by Epipolar Geometry Analysis and Virtual Endoscopy System. MICCAI. 2001:1–8.

6. Rousseeuw PJ, Leroy A. Robust Regression and outlier detection. New York: John Wiley & Sons; 1987.

7. Fischler MA, Rolles RC. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Comm. ACM. 1981;24(6):381–395.

8. Huber PJ. Robust Statistics. New York: Wiley; 1981.

9. Chen H, Meer P. Robust Regression with Projection Based M-estimators. ICCV. 2003:878–885.

10. Subbarao R, Meer P. Heteroscedastic projection based M-estimators. Workshop on EEMCV. 2005

11. Rozenfeld S, Shimshoni I. The Modified pbM-estimator Method and a Runtime Analysis Technique for the RANSAC Family. CVPR. 2005:1113–1120.

12. Torr P, Murray D. The Development and Comparison of Robust Methods for Estimating the Fundamental Matrix. IJCV. 1997;24(3):271–300.

13. Miller JV, Stewart CV. MUSE: Robust Surface Fitting Using Unbiased Scale Estimates. CVPR. 1996:300–306.

14. Stewart CV. MINPRAN: A New Robust Estimator for Computer Vision. PAMI. 1995;17(10):925–938.

15. Lee K-M, Meer P, Park R-H. Robust Adaptive Segmentation of Range Images. PAMI. 1998;20(2):200–205.

16. Yu X, Bui TD, Krzyzak A. Robust Estimation for Range Image Segmentation and Reconstruction. PAMI. 1994;16(5):530–538.

17. Wang H, Suter D. Robust Adaptive-Scale Parametric Model Estimation for Computer Vision. PAMI. 2004;26(11):1459–1474. [PubMed]

18. Silverman BW. Density Estimation for Statistics and Data Analysis. London: Chapman and Hall; 1986.

19. Comaniciu D, Meer P. Mean Shift: A Robust Approach towards Feature Space A Analysis. PAMI. 2002;24(5):603–619.

20. Wand MP, Jones M. Kernel Smoothing. Chapman & Hall; 1995.

21. Tordoff B, Murray DW. Guided Sampling and Consensus for Motion Estimation. ECCV. 2002:82–96.

22. Hartley R, Zisserman A. Multiple View Geometry in Computer Vision. Cambridge University Press; 2004.

23. Nistér D. An Efficient Solution to the Five-point Relative Pose Problem. PAMI. 2004;26(6):756–770. [PubMed]

24. Lowe DG. Distinctive Image Features from Scale-Invariant Keypoints. IJCV. 2004;60(2):91–110.

25. Delponte E, et al. SVD-matching using SIFT Features. Graphical Models. 2006;68(5–6):415–431.

26. Zhang Z. A Flexible New Technique for Camera Calibration. PAMI. 2000;22(11):1330–1334.

27. Bouget J-Y. The matlab camera calibration toolkit. http://www.vision.caltech.edu/bouguetj/calib_doc/

28. Tian T, Tomasi C, Heeger D. Comparison of Approaches to Egomotion Computation. CVPR. 1996:315–320.

29. Pollefeys M, et al. Visual Modeling with a Hand-held Camera. IJCV. 2004;59(3):207–232.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |