Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2857599

Formats

Article sections

- Abstract
- 1 INTRODUCTION
- 2 THE PROPOSED METHOD
- 3 ASKC FOR MOTION ESTIMATION AND POSE ESTIMATION
- 4 EXPERIMENTS
- 5 CONCLUSIONS
- REFERENCES

Authors

Related links

IEEE Trans Pattern Anal Mach Intell. Author manuscript; available in PMC 2010 April 21.

Published in final edited form as:

PMCID: PMC2857599

NIHMSID: NIHMS184347

Hanzi Wang, School of Computer Science, The University of Adelaide, Adelaide SA 5005, Australia. Email: gro.eeei@gnaw.iznah;

Recommended for acceptance by B. Triggs.

For information on obtaining reprints of this article, please send gro.retupmoc@imapt, and reference IEEECS Log Number TPAMI-2009-01-0005.

The publisher's final edited version of this article is available at IEEE Trans Pattern Anal Mach Intell

See other articles in PMC that cite the published article.

In this paper, we present a new Adaptive-Scale Kernel Consensus (ASKC) robust estimator as a generalization of the popular and state-of-the-art robust estimators such as RANdom SAmple Consensus (RANSAC), Adaptive Scale Sample Consensus (ASSC), and Maximum Kernel Density Estimator (MKDE). The ASKC framework is grounded on and unifies these robust estimators using nonparametric kernel density estimation theory. In particular, we show that each of these methods is a special case of ASKC using a specific kernel. Like these methods, ASKC can tolerate more than 50 percent outliers, but it can also automatically estimate the scale of inliers. We apply ASKC to two important areas in computer vision, robust motion estimation and pose estimation, and show comparative results on both synthetic and real data.

Robust statistical techniques play an important role in many areas of computer vision. Applications include segmentation [1], [11], [28], [29], optical flow calculation [2], [15], fundamental matrix estimation [24], [28], as well as many others. The key point of robust methods is to extract the parameters of an underlying structural model while tolerating the influence of outliers, which do not obey the assumed model.

A number of robust estimation techniques (e.g., [3], [4], [7], [13], [17], [18], [19], [24], [26], [27], [29]) have been proposed during the last decades. The Maximum-likelihood estimators (M-estimators) [7], the Least Median of Squares (LMedS) estimator [17], and the RANdom SAmple Consensus (RANSAC) estimator [4] are some of the most popular robust estimators that have been widely used in many areas. However, M-estimators and LMedS cannot deal with data involving more than 50 percent outliers; RANSAC can tolerate more than 50 percent outliers, but it requires the user to specify an error tolerance which is not known a priori in many practical environments. The value of this tolerance has a significant influence on the performance of RANSAC [25], [28]. M-estimator SAmple Consensus (MSAC) [24] improves the performance of RANSAC by modifying its cost function, but it also requires a user-specified error tolerance. Minimum Unbiased Scale Estimator (MUSE) [13], MINmize the Probability of RANdomness (MINPRAN) [19], Adaptive Least Kth order Squares (ALKS) [11], Residual Sample Consensus (RESC) [29], and Adaptive Scale Sample Consensus (ASSC) [28] can tolerate more than 50 percent outliers. However, MUSE is biased in the scale estimation, and thus, it needs a lookup table to correct the estimated scale. MINPRAN and ALKS are computationally expensive and will break down for multistructure data with a large percentage of outliers. The RESC estimator requires the user to tune many parameters in its objective function. ASSC assigns equal weights to all inliers, and thus, is less effective in estimating the model parameters. Recently, Chen and Meer [3] modified the cost function of the M-estimators to create projection-based M-estimators (pbM-estimators). Rozenfeld and Shimshoni [18] and Subbarao and Meer [20] proposed some variants to improve the performance of pbM. Lavva et al. [10] applied a modified pbM to 3D range image segmentation. All of these modifications [3], [10], [18], [20] are based on the projection pursuit paradigm [8], [9].

In [28], the first author developed the ASSC estimator which uses the ratio of the number of inlier samples to the scale of inliers as its objective function. We now describe a new robust estimator, Adaptive Scale Kernel Consensus (ASKC). ASKC assumes that, with the correct model parameters, the mode at the origin in the residual space has the maximum kernel density. Thus, ASKC estimates the parameters maximizing the kernel density value at the origin of the residual space. As we employ the nonparametric kernel density estimation techniques in the objective function of ASKC, ASKC is based on kernel consensus and is more effective than ASSC and RANSAC which are based on sample consensus. Moreover, ASKC provides a generalized framework for robust model fitting, from which a set of new robust estimators can be derived by employing different kernels. ASKC unifies a set of popular robust estimators, such as RANSAC, ASSC, and Maximum Kernel Density Estimator (MKDE) [26].

To show the ability of ASKC, we apply it to two important areas in computer vision: robust motion estimation and robust pose estimation. We compare the performance of ASKC with those of several other popular robust methods (LMedS, RANSAC, MSAC, RESC, pbM, and ASSC). Experimental results on both synthetic and real data show our method can achieve better results than other methods.

The problem of model fitting can be described as follows: given a set of data points {(* x_{i}, y_{i}*)}

The classical *linear regression* (LR) model can be written as follows:

$${y}_{i}={x}_{i1}{\theta}_{1}+\cdots +{x}_{\mathit{\text{id}}}{\theta}_{d}+{e}_{i}={\mathit{x}}_{i}\mathit{\theta}+{e}_{i}\text{\hspace{1em}}(i=1,\dots ,N).$$

(1)

The measurement error term *e _{i}* is usually assumed to be independent and identically distributed, and the (identical) distribution is a zero-mean Gaussian

With an LR model, a *d*-dimensional data point * x_{i}* can be mapped to the 1D residual space. Given an estimate , the residual

$${r}_{i,\widehat{\theta}}={y}_{i}-{x}_{i1}{\widehat{\theta}}_{1}-\cdots -{x}_{\mathit{\text{id}}}{\widehat{\theta}}_{d}\text{\hspace{1em}and}\phantom{\rule{thinmathspace}{0ex}}\mathbf{M}(\widehat{\mathit{\theta}},\mathit{X})\mapsto {\mathit{R}}_{\widehat{\mathit{\theta}}},$$

(2)

where **R**_{} = [*r*_{1,},…, *r*_{N,θ̂̂}].

Given an estimate and a set of residuals **R**_{} = {*r*1_{,},…, *r*_{N,}}, the *fixed bandwidth* kernel density estimate at γ with a kernel *K* (.) and a bandwidth *h* is computed as follows:

$${\widehat{\mathbf{f}}}_{h}(\gamma ;{\mathit{R}}_{\widehat{\theta}})=\frac{1}{\mathit{\text{Nh}}}{\displaystyle \sum _{i=1}^{N}K\left(\frac{\gamma -{r}_{i,\widehat{\theta}}}{h}\right)}.$$

(3)

An alternative is to select a different bandwidth *h* = *h*() *h*_{¸̂} for a different parameter estimate *¸̂*. The *variable bandwidth* kernel density estimate can be written as:

$${\widehat{f}}_{{h}_{\widehat{\mathit{\theta}}}}(\gamma ;{\mathit{R}}_{\widehat{\theta}})=\frac{1}{N}{\displaystyle \sum _{i=1}^{N}\frac{1}{{h}_{\widehat{\mathit{\theta}}}}K\left(\frac{\gamma -{r}_{i,\widehat{\mathit{\theta}}}}{{h}_{\widehat{\theta}}}\right)}.$$

(4)

We will utilize *radial basis function* (RBF) kernels which satisfy the following condition:

$$K(\beta )={c}_{k}k({\Vert \beta \Vert}^{2})\text{\hspace{1em}}k(\beta )\ge 0\text{\hspace{1em}for\hspace{1em}}0\le \beta \le 1,$$

(5)

where *k*(*β*) is the profile of the kernel. *c _{k}* is a normalization constant to make

There are many kernels available in the literature. In this paper, we only consider two popular RBF kernels, the Epanechnikov kernel *K _{E}*(

$$\begin{array}{cc}\hfill \text{\hspace{1em}}{K}_{E}(\beta )\hfill & =\{\begin{array}{cc}{c}_{k}(1-{\Vert \beta \Vert}^{2}),\hfill & \Vert \beta \Vert \le 1,\hfill \\ 0,\hfill & \Vert \beta \Vert >1,\hfill \end{array}\hfill \\ \text{with}\phantom{\rule{thinmathspace}{0ex}}{K}_{E}(\beta )\hfill & =\{\begin{array}{cc}1-\beta ,\hfill & 0\le \beta \le 1,\hfill \\ 0,\hfill & \beta >1,\hfill \end{array}\hfill \end{array}$$

(6)

$${K}_{G}(\beta )={c}_{k}\phantom{\rule{thinmathspace}{0ex}}\text{exp}(-\frac{1}{2}{\Vert \beta \Vert}^{2}),\text{with}\phantom{\rule{thinmathspace}{0ex}}{k}_{G}(\beta )=\text{exp}(-\frac{\beta}{2})\text{\hspace{1em}}\beta \ge 0.$$

(7)

Although this paper investigates the properties of ASKC with the Epanechnikov kernel (henceforth ASKC1) and the normal kernel (henceforth ASKC2), our method can easily employ an arbitrary kernel.

The bandwidth *h* (or *h***θ**) is a crucial parameter in kernel density estimation. Let θ be a scale estimate of inliers for a given ** R_{θ}**. An oversmoothed bandwidth estimate is written as follows [21], [22], [25]:

$${\widehat{\mathit{h}}}_{\mathit{\theta}}={\left[\frac{243{\displaystyle {\int}_{-1}^{1}K{(\beta )}^{2}d\beta}}{35N{\left({\displaystyle {\int}_{-1}^{1}{r}^{2}K(\beta )d\beta}\right)}^{2}}\right]}^{1/5}{\widehat{\sigma}}_{\mathit{\theta}}.$$

(8)

It is recommended that the bandwidth is set as *c _{h}*

In this paper, we use a TSSE-like procedure to estimate the scale of inliers. TSSE employs a mean-shift procedure and a mean-shift valley (MSV) procedure to estimate the scale for multiple-mode data. Our implementation operates as follows: First, we use a robust *k* scale estimator to compute an initial scale estimate

$${\widehat{\sigma}}_{0}={|{r}_{\theta}|}_{k}/{\mathrm{\Psi}}^{-1}[(1+k)/2],$$

(9)

where |*r*_{θ}|_{k} is the (*k* × *N*) th residual of the ordered absolute residuals.Ψ^{−1}[.] is the argument of the normal cumulative density function. For example, when the *k* value is set to 0.1 or 0.2, at least 10 or 20 percent of the data points are included in the shortest window.

Once we have an initial scale estimate, we refine the scale estimate by running TSSE. However, in addition to the Epanechnikov kernel employed in [28] for the mean shift and the mean-shift valley procedures, we may employ the normal kernel (or other kernels) as well. In the case that the normal kernel is used, mean-shift mode-seeking can be written as:

$${\overrightarrow{\mathbf{m}}}_{{h}_{\theta},G}(\gamma )=\left[\raisebox{1ex}{$\sum _{i=1}^{N}{r}_{i,\mathit{\theta}}\mathit{\text{exp}}}\left(\frac{\Vert \gamma -{r}_{i,\mathit{\theta}}{\Vert}^{2}}{{h}_{\mathit{\theta}}}\right)$}\!\left/ \!\raisebox{-1ex}{$\sum _{i=1}^{N}\mathit{\text{exp}}\left(\frac{\Vert \gamma -{r}_{i,\theta}{\Vert}^{2}}{{h}_{\theta}}\right)$}\right.\right]-\gamma .$$

(10)

The sequence of successive locations {*γ _{i}*}

$${\gamma}_{i+1}={\gamma}_{i}+\zeta \left(\raisebox{1ex}{${\gamma}_{i}-{\displaystyle \sum _{i=1}^{N}{r}_{i,\mathit{\theta}}\mathit{\text{exp}}\left(\frac{\Vert {\gamma}_{i}-{r}_{i,\mathit{\theta}}{\Vert}^{2}}{{h}_{\mathit{\theta}}}\right)}$}\!\left/ \!\raisebox{-1ex}{$\sum _{i=1}^{N}\mathit{\text{exp}}\left(\frac{\Vert {\gamma}_{i}-{r}_{i,0}{\Vert}^{2}}{{h}_{\theta}}\right)$}\right.\right),$$

(11)

where ζ is an adjustment factor which keeps the MSV vectors consistent in direction.

Fig. 1 shows that when the estimate is correct (i.e., the parameters of the red horizon line are used in Fig. 1a), the kernel density at the origin point is higher (Fig. 1b). And the ratio of the kernel densities at the detected peak and valley is higher. In contrast, when the model parameter estimate is incorrect, the kernel density estimate at the origin is low and the ratio of the kernel densities at the detected peak and valley is lower (Fig. 1c).

We consider the kernel density value at the origin () in the residual space as our objective function. Given a set of residuals *R*_{θ} = {*r*_{i,θ}}_{i=1,…, N} subject to θ, we define the objective function of ASKC based on (4) as:

$${\mathit{F}}_{{h}_{\mathit{\theta}}}({\mathit{R}}_{\theta})={\widehat{f}}_{{h}_{\mathit{\theta}}}(\mathit{\mathcal{O}};{{R}}_{\theta})=\frac{1}{N}{\displaystyle \sum _{i=1}^{N}\frac{1}{{h}_{\theta}}K\left(\frac{{r}_{i,\theta}}{{h}_{\theta}}\right).}$$

(12)

The ASKC estimator can be written as:

$$\widehat{\mathit{\theta}}=\underset{\theta}{\text{arg}\phantom{\rule{thinmathspace}{0ex}}\text{max}}\phantom{\rule{thinmathspace}{0ex}}{\mathit{F}}_{{h}_{\mathit{\theta}}}({\mathit{R}}_{\theta})=\underset{\theta}{\text{arg}\phantom{\rule{thinmathspace}{0ex}}\text{max}}\frac{1}{N}{\displaystyle \sum _{i=1}^{N}\frac{1}{{h}_{\mathit{\theta}}}K\left(\frac{{r}_{i,\mathit{\theta}}}{{h}_{\theta}}\right).}$$

(13)

In the fixed bandwidth case, (13) can be rewritten as follows (which we termed the MKDE in [26]):

$$\widehat{\mathit{\theta}}=\underset{\mathit{\theta}}{\text{arg}\phantom{\rule{thinmathspace}{0ex}}\text{max}}\phantom{\rule{thinmathspace}{0ex}}\mathit{F}({\mathit{R}}_{\theta})=\underset{\mathit{\theta}}{\text{arg}\phantom{\rule{thinmathspace}{0ex}}\text{max}}\phantom{\rule{thinmathspace}{0ex}}\widehat{f}(\mathit{\mathcal{O}};{\mathit{R}}_{\theta})=\underset{\mathit{\theta}}{\text{arg}\phantom{\rule{thinmathspace}{0ex}}\text{max}}\frac{1}{\mathit{\text{Nh}}}{\displaystyle \sum _{i=1}^{N}K\left(\frac{{r}_{i,\theta}}{h}\right).}$$

(14)

In [26], we have shown that the scale estimate of inliers or the bandwidth has a much weaker influence on the performance of MKDE than that of RANSAC, while the computational efficiency of MKDE is comparable to that of RANSAC.

To calculate the solution of (13) (or (14) for the fixed bandwidth case), we employ a sampling procedure to produce a set of candidates. We can employ either a random sampling scheme [17] or a guided sampling technique [23].

Fig. 2 gives the detailed procedure of the ASKC estimator. The purpose of using residuals from _{i} (i.e., the data other than the sample candidate) rather than * X* is to avoid influence of the sampled subset on the residual distribution. An additional TSSE-like procedure in step 6 may refine the scale estimate for heavily contaminated data. To improve the computational efficiency, it is unnecessary to run the TSSE-like procedure for all samples. In our case, we only run the TSSE-like procedure for the samples with high ASKC scores ${\widehat{f}}_{{\widehat{h}}_{\theta}}^{*}(\ge 0.5{\widehat{f}}_{{\widehat{h}}_{\theta}}^{\text{max}})$.

Consider the RANSAC estimator [4],

$$\widehat{\mathit{\theta}}=\underset{\mathit{\theta}}{\text{arg}\phantom{\rule{thinmathspace}{0ex}}\text{max}}\phantom{\rule{thinmathspace}{0ex}}{\mathbf{n}}_{\mathit{\theta}},$$

(15)

and the ASSC estimator [28],

$$\widehat{\mathit{\theta}}=\underset{\mathit{\theta}}{\text{arg}\phantom{\rule{thinmathspace}{0ex}}\text{max}}({\mathbf{n}}_{\mathit{\theta}}/{\widehat{S}}_{\mathit{\theta}}),$$

(16)

where **n _{θ}** is the number of data points within an error tolerance (for RANSAC) or the scale of inliers (for ASSC) and

From the above equations, we can see that RANSAC and ASSC are actually special cases of ASKC with the uniform kernel:

$${K}_{U}(\beta )=\{\begin{array}{cc}{c}_{k}\hfill & \Vert \beta \Vert \le 1,\hfill \\ 0,\hfill & \text{otherwise},\hfill \end{array}\text{\hspace{1em}with}\phantom{\rule{thinmathspace}{0ex}}{k}_{U}(\beta )=\{\begin{array}{cc}1,\hfill & 0\le \beta \le 1,\hfill \\ 0,\hfill & \beta >1.\hfill \end{array}$$

(17)

More specifically, RANSAC is one special case of ASKC with a fixed bandwidth and the uniform kernel. Replacing *h*_{θ} in (13) with *Ŝ*_{θ} in (16), ASSC is another special case of ASKC with a variable bandwidth and the uniform kernel.

The effectiveness of the uniform kernel is low because it weights all inliers equally. In comparison, ASKC with the Epanechnikov kernel or the normal kernel (or other kernels such as triangular kernel or quartic kernel) is more effective than ASSC and RANSAC.

The pbM-estimators [3] and our method share some similarities because both employ the kernel density estimate techniques. However, the differences between our method and pbM are significant:

- pbM places emphasis on the projection pursuit paradigm and works in the projection space along the direction of parameter vector. ASKC considers the kernel density of the mode in the residual space.
- The methods in [3] find the optimal projection direction by an inner optimization and an outer optimization, i.e., they seek the mode by maximizing the density in the projection space, which, in turn, maximizes the projection index. In contrast, ASKC directly seeks the mode by maximizing the kernel density at the origin in the residual space. Our method is thus more computationally efficient (see Table 1).
- pbM employs MAD in the bandwidth estimation. However, MAD can break down when outliers occupy more than 50 percent, which will lead to oversmoothing in the kernel density estimate. ASKC uses a more robust TSSE-like scale estimator to estimate the bandwidth.
- After finding the mode with the maximum density in the projection space, pbM searches for two local minima (one on each side of the mode) by a heuristic procedure and the points with values within the two minima are treated as inliers. ASKC searches for one local minimum in the residual space and the local minima is adaptively found by a mean-shift valley procedure.

Estimating relative camera motion from two views with a calibrated camera, and determining camera poses relating 2D images to known 3D geometry are two classical problems in computer vision. There are some existing solutions in the literature for motion estimation (e.g, [5], [14]) and pose estimation (e.g., [12]). In this section, we show a way to apply ASKC to robust motion estimation and pose estimation.

Considering a set of 3D points **P** = {**p**_{i}}_{i=1,…, N} = {(*x _{i}, y_{i}, z_{i}*)

$${({\mathbf{q}}_{i}^{*})}^{t}{K}_{2}^{-t}\mathbf{E}{K}_{1}^{-1}{\mathbf{q}}_{i}=0,$$

(18)

where **E** is the essential matrix. *K*_{1} and *K*_{2} are, respectively, the intrinsic camera matrices corresponding to the two images. The essential matrix can be written as **E** = [**T**]_{x}**R**, where [**T**_{x} is the cross-product matrix of the translation vector **T** and **R** is the rotation matrix.

Given the camera matrices and a set of feature matches, **E** can be estimated using the nonlinear five-point algorithm [14]. The camera motion (**R** and **T**) can be recovered from **E** by Singular Value Decomposition (SVD) [6] and **T** can only be estimated up to a scale factor (i.e., ‖*T*˜ ‖ = 1).

Letting ${\overline{\mathbf{q}}}_{i}={K}_{1}^{-1}{\mathbf{q}}_{i}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}{\overline{\mathbf{q}}}_{i}^{*}={K}_{2}^{-1}{\mathbf{q}}_{i}^{*}$, we define the residual of a corresponding pair as:

$${r}_{i}(\mathbf{T},\mathbf{R};{\overline{\mathbf{q}}}_{i},{\overline{\mathbf{q}}}_{i}^{*})={({\overline{\mathbf{q}}}_{i}^{*})}^{t}{[\mathbf{T}]}_{\mathbf{x}}\mathbf{R}{\overline{\mathbf{q}}}_{i}.$$

(19)

Assuming a unique solution exists, the robust estimate of the camera motion can be written as:

$$[\widehat{\tilde{\mathbf{T}}},\widehat{\mathbf{R}}]\propto \underset{\mathbf{T},\mathbf{R}}{\text{arg}\phantom{\rule{thinmathspace}{0ex}}\text{max}}{\displaystyle \sum _{i=1}^{N}\frac{1}{{h}_{(\mathbf{T},\mathbf{R})}}K\phantom{\rule{thinmathspace}{0ex}}\left(\frac{{r}_{i}(\mathbf{T},\mathbf{R};{\overline{\mathbf{q}}}_{i},{\overline{\mathbf{q}}}_{i}^{*})}{{h}_{(\mathbf{T},\mathbf{R})}}\right).}$$

(20)

The pose estimation problem can be described as: Given a set of 2D–3D correspondences {(**q**_{i}, **p**_{i})}_{i=1,…,N′}, estimate the camera pose (rotation **R** and translation vector **T**) which maps the 3D reference points to 2D image coordinates. Let

$$\mathbf{R}={({r}_{1}^{t},{r}_{2}^{t},{r}_{3}^{t})}^{t}\in \mathit{\text{SO}}\phantom{\rule{thinmathspace}{0ex}}(3)\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\mathbf{T}={({\mathbf{t}}_{x},{\mathbf{t}}_{y},{\mathbf{t}}_{z})}^{t}\in {\mathit{R}}^{3}.$$

(21)

We have [12]:

$${u}_{i}=\frac{{r}_{1}^{t}{\mathbf{p}}_{i}+{\mathbf{t}}_{x}}{{r}_{3}^{t}{\mathbf{p}}_{i}+{\mathbf{t}}_{z}};{v}_{i}=\frac{{r}_{2}^{t}{\mathbf{p}}_{i}+{\mathbf{t}}_{y}}{{r}_{3}^{t}{\mathbf{p}}_{i}+{\mathbf{t}}_{z}}.$$

(22)

Let ${({u}_{i}^{\u2020},{v}_{i}^{\u2020})}^{t}$ be an observed image point. Then, the residual of the corresponding pair is defined as:

$${r}_{i}(\mathbf{T},\mathbf{R};{\mathbf{q}}_{i},{\mathbf{p}}_{i})=\left|{u}_{i}^{\u2020}-\frac{{r}_{1}^{t}{\mathbf{p}}_{i}+{\mathbf{t}}_{x}}{{r}_{3}^{t}{\mathbf{p}}_{i}+{\mathbf{t}}_{z}}\right|+\left|{v}_{i}^{+}-\frac{{r}_{2}^{t}{\mathbf{p}}_{i}+{\mathbf{t}}_{y}}{{r}_{3}^{t}{\mathbf{p}}_{i}+{\mathbf{t}}_{z}}\right|.$$

(23)

We employ [12] to generate solutions from each sample candidate.

In this section, we evaluate the performance of the ASKC estimator employing the Epanechnikov kernel (ASKC1) and the normal kernel (ASKC2) in model fitting (including 2D line fitting and 3D plane fitting), motion estimation, and pose estimation. We compare the performance of ASKC1/ASKC2 with those of several other popular robust estimators (LMedS, RANSAC, MSAC, RESC, and ASSC). We also compare with pbM^{1} in line/plane fitting and motion estimation where the application of pbM is straightforward. As the value of the error tolerance is not known a priori, we use a median scale estimator for RANSAC and MSAC (suggested by [24]). In Section 4.3, we also test the performance of RANSAC with a manually chosen optimal scale (termed as RANSAC2). The scale of inliers (σ) for the synthetic data used in this section is set to 0.2.

We generate a four-line signal containing 50 data points in each line and 300 randomly distributed outliers. Thus, the total number of the data points is 500 and there are 90 percent outliers (including pseudo-outliers and random outliers) for each line at the beginning.^{2} We apply the robust estimators to sequentially fit and extract all the four lines. The results (except for the result by RANSAC due to space limits) are shown in Fig. 3. LMedS, MSAC and RANSAC cannot successfully fit/extract any line. RESC and pbM correctly fit one line but fail to extract any line because the scale is wrongly estimated. ASSC extracts three lines but fails in one. ASKC1/ASKC2 successfully fits/extracts all four lines.

We sequentially fit and extract four 3D planes with the comparative robust estimators. Among the total 500 data points, the number of points in each of the four planes is 45. The number of random outliers is 320. Thus, the outlier percentage for each plane is 91 percent at the beginning.

Fig. 4 shows that only the proposed ASKC1/ASKC2 correctly fits and extracts all the four planes. In contrast, ASSC succeeds on two planes but wrongly fits/extracts two planes. MSAC and RESC correctly fit one plane but fail to extract any plane due to incorrect scale estimates. LMedS, RANSAC, and pbM fail to fit and extract all of the four planes.

It is interesting to investigate the computational efficiency of the robust estimators. We compare the time complexity of the above methods in Table 1. All methods were coded in MATLAB except for pbM which was coded in C++. We use 3,000 random samples (1,000 for pbM) in line fitting and 6,000 random samples (2,000 for pbM) in plane fitting. We only compare the time to fit and extract the first structure (line/plane) because some methods break down after this step.

As shown in Table 1, ASKC1/2 is a little slower than LMedS, RANSAC, and MSAC but significantly faster than RESC, ASSC, and pbM. The reason that ASKC is faster than ASSC is because of step 5 in Fig. 2. ASKC is about 20–100 times faster than pbM (without considering that pbM was coded in C++ while ASKC coded in MATLAB) because it is computationally expensive to implement the step of local simplex optimization in pbM which is not used in ASKC. Between ASKC1 and ASKC2, we can see than ASKC1 is slightly faster than ASKC2.

We evaluate the performance of the robust estimators in motion estimation on a set of endoscopic sinus images. The endoscopic sinus image data were collected on a cadaverous porcine specimen with a calibrated monocular endoscope. The ground truth of the endoscope motion was recorded by an external Optotrak tracking system (Northern Digital Corp., Waterloo, Ontario). The endoscopic images involve a number of challenges such as low texture, specularities, illumination changes, and motion blurring (see Figs. 5a and 5b). As a result, the obtained feature-based matches between image pairs contain a large number of outliers or mismatches (see Fig. 5c). Thus, it requires the methods to be highly robust to outliers.

(a), (b) Two examples showing the challenges present in endoscopic sinus images. A snapshot showing the feature matches (c) and the matches selected by ASKC1 (d) on the left undistorted image.

To obtain the quantitative results, we apply the comparative methods to 130 endoscopic sinus image pairs. The distance between the positions of the endoscopic camera in each pair of images is larger than 1 mm. The averaged number of the matched features is 922. The approximate outlier percentage is about 61.5 to 84.8 percent. We use the median, the mean, and the standard variances of the estimate errors in translation and rotation to evaluate the performance of the methods. The translation error (E_{T}) is computed as E_{T} = ‖−_{true} ‖ × 100, where (with ‖‖ = 1) is the estimated translation direction vector and _{true} (with ‖_{true}‖ = 1) is the ground truth translation direction vector. Similarly, we write the rotation error (**E**_{τ}) as **E**_{τ} = ‖**τ** − **τ**_{true}‖ × 100, where τ is a quaternion representation of the rotation matrix **R**.

Table 2 shows the quantitative results. As we can see, ASKC1/ASKC2 achieves the most accurate results among the comparative methods (including RANSAC with a manually chosen optimal scale). And between ASKC1 and ASKC2, ASKC2 generally outperforms ASKC1 in the translation and rotation estimation. LMedS, MSAC, and RANSAC with the median scale estimator achieve the worst results because the median is not robust to data with more than 50 percent outliers. The results of ASSC are better than those of RESC but less accurate than those of ASKC1/ASKC2. ASKC2 with the five-point algorithm [14] achieves better results than ASKC2* with the eight-point algorithm [6]. When the eight-point algorithm is employed for both pbM and ASKC2*, ASKC2* achieves better results than pbM.

We generate a set of synthetic data with various percentages of outliers to evaluate our methods in pose estimation. Among the total 100 2D–3D correspondences, the percentage of outliers (i.e., mismatches) is increased from 1 to 80 percent with interval being 1 percent. We repeat the experiments 20 times. We report errors in terms of rotation, translation, and RMS reprojection errors of inliers. We use **E**_{τ} = ‖τ −τ_{true}‖ × 100 for the rotation error measure and ${\mathrm{E}}_{\mathbf{T}}=\frac{2\Vert \mathbf{T}-{\mathbf{T}}_{\text{true}}\Vert}{\Vert \mathbf{T}\Vert +\Vert {\mathbf{T}}_{\text{true}}\Vert}\times 100$ for the translation error measure. The RMS reprojection error is written as

$$\sqrt{\raisebox{1ex}{$\sum _{i\in \text{inliers}}|{\mathbf{q}}_{i}^{\u2022}-{\widehat{\mathbf{q}}}_{i}{|}^{2}$}\!\left/ \!\raisebox{-1ex}{${N}_{\text{inliers}}$}\right.,}$$

where ${\mathbf{q}}_{i}^{\u2022}$ and are, respectively, the real reprojection point and the recovered reprojection point. The mean and the standard variance of the overall estimate errors are shown in Table 3.

From Table 3, we can see that LMedS, MSAC, and RANSAC yield the largest mean and standard variance errors. The results of RESC are more accurate than LMedS, MSAC, and RANSAC, but less accurate than ASSC, ASKC1, and ASKC2. ASSC achieves less accurate results than ASKC1/ASKC2. The results of ASKC1 and ASKC2 are similar, while ASKC2 achieves slightly better results than ASKC1.

This paper has presented a new robust estimator (ASKC) which is based on the nonparametric kernel density estimation techniques. We also provide a generalized framework in which a set of new robust estimators can be derived by employing various kernels. We show that ASKC can be treated as a generalized form of several traditional and recently developed robust estimators such as RANSAC, ASSC, and MKDE. We apply ASKC to robust motion and pose estimation and test the proposed method on both synthetic and real data. Experiments show that ASKC has achieved better results than several other popular robust estimators.

Parts of the work described in this paper appeared in [26], [27]. The authors thank the anonymous reviewers of those papers, and the *TPAMI* reviewers, Professors David Suter and Russell H. Taylor for their valuable comments and suggestions. The authors also thank Professor Darius Burschka for providing the endoscopic sinus image data. The work was supported by the US National Institutes of Health under grant number R21EB005201 and was done when H. Wang was at Johns Hopkins University.

^{1}The code is available at: http://www.caip.rutgers.edu/riul/research/code/PBM/index.html.

^{2}When the inliers of one line are removed from the data, the percentage of outliers for the remaining lines decreases.

Hanzi Wang, School of Computer Science, The University of Adelaide, Adelaide SA 5005, Australia. Email: gro.eeei@gnaw.iznah.

Daniel Mirota, Department of Computer Science, The Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218. Email: ude.uhj.sc@nad.

Gregory D. Hager, Department of Computer Science, The Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218. Email: ude.uhj.sc@regah.

1. Bab-Hadiashar A, Gheissari N. Range Image Segmentation Using Surface Selection Criterion. IEEE Trans. Image Processing. 2006 July;vol. 15(no. 7):2006–2018. [PubMed]

2. Bab-Hadiashar A, Suter D. Robust Optic Flow Computation. Int’l J. Computer Vision. 1998;vol. 29(no. 1):59–77.

3. Chen H, Meer P. Robust Regression with Projection Based M-Estimators. Proc. IEEE Int’l Conf. Computer Vision. 2003:878–885.

4. Fischler MA, Bolles RC. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Comm. ACM. 1981;vol. 24(no. 6):381–395.

5. Li H, Hartley R. Five Point Algorithm Made Easy. Proc. Int’l Conf. Pattern Recognition. 2006:630–633.

6. Hartley R, Zisserman A. Multiple View Geometry in Computer Vision. second ed. Cambridge Univ. Press; 2004.

7. Huber PJ. Robust Statistics. Wiley; 1981.

8. Huber PJ. Projection Pursuit (with Discussion) Annals of Statistics. 1985;vol. 13:435–525.

9. Jones MC, Sibson R. What Is Projection Pursuit? (with Discussion) J. Royal Statistical Soc. A. 1987;vol. 150:1–37.

10. Lavva I, Hameiri E, Shimshoni I. Robust Methods for Geometric Primitive Recovery and Estimation from Range Images. IEEE Trans. Systems, Man, and Cybernetics, Part B. 2008 June;vol. 38(no. 3):826–845. [PubMed]

11. Lee K-M, Meer P, Park R-H. Robust Adaptive Segmentation of Range Images. IEEE Trans. Pattern Analysis and Machine Intelligence. 1998 Feb;vol. 20(no. 2):200–205.

12. Lu C, Hager GD, Mjolsness E. Fast and Globally Convergent Pose Estimation from Video Images. IEEE Trans. Pattern Analysis and Machine Intelligence. 2000 June;vol. 22(no. 6):610–622.

13. Miller JV, Stewart CV. MUSE: Robust Surface Fitting Using Unbiased Scale Estimates. Proc. IEEE Conf. Computer Vision and Pattern Recognition. 1996:300–306.

14. Nistér D. An Efficient Solution to the Five-Point Relative Pose Problem. IEEE Trans. Pattern Analysis and Machine Intelligence. 2004 June;vol. 26(no. 6):756–770. [PubMed]

15. Papenberg N, Bruhn A, Brox T, Didas S, Weickert J. Highly Accurate Optic Flow Computation with Theoretically Justified Warping. Int’l J. Computer Vision. 2006;vol. 67(no. 2):141–158.

16. Rousseeuw PJ, Croux C. Alternatives to the Median Absolute Deviation. J. Am. Statistical Assoc. 1993;vol. 88(no. 424):1273–1283.

17. Rousseeuw PJ, Leroy A. Robust Regression and Outlier Detection. John Wiley & Sons; 1987.

18. Rozenfeld S, Shimshoni I. The Modified pbM-Estimator Method and a Runtime Analysis Technique for the RANSAC Family. Proc. IEEE Conf. Computer Vision and Pattern Recognition. 2005:1113–1120.

19. Stewart CV. MINPRAN: A New Robust Estimator for Computer Vision. IEEE Trans. Pattern Analysis and Machine Intelligence. 1995 Oct;vol. 17(no. 10):925–938.

20. Subbarao R, Meer P. Heteroscedastic Projection Based M-Estimators. Proc. Workshop Empirical Evaluation Methods in Computer Vision. 2005

21. Terrell G. The Maximal Smoothing Principle in Density Estimation. J. Am. Statistical Assoc. 1990;vol. 85:470–477.

22. Terrell G, Scott DW. Over Smoothed Density Estimates. J. Am. Statistical Assoc. 1985;vol. 80:209–214.

23. Tordoff B, Murray DW. Guided Sampling and Consensus for Motion Estimation. Proc. European Conf. Computer Vision. 2002:82–96.

24. Torr P, Murray D. The Development and Comparison of Robust Methods for Estimating the Fundamental Matrix. Int’l J. Computer Vision. 1997;vol. 24(no. 3):271–300.

25. Wand MP, Jones M. Kernel Smoothing. Chapman & Hall; 1995.

26. Wang H. Maximum Kernel Density Estimator for Robust Fitting. Proc. IEEE Int’l Conf. Acoustics, Speech, and Signal Processing. 2008:3385–3388.

27. Wang H, Mirota D, Ishii M, Hager G. Robust Motion Estimation and Structure Recovery from Endoscopic Image Sequences with an Adaptive Scale Kernel Consensus Estimator. Proc. IEEE Conf. Computer Vision and Pattern Recognition. 2008 [PMC free article] [PubMed]

28. Wang H, Suter D. Robust Adaptive-Scale Parametric Model Estimation for Computer Vision. IEEE Trans. Pattern Analysis and Machine Intelligence. 2004 Nov;vol. 26(no. 11):1459–1474. [PubMed]

29. Yu X, Bui TD, Krzyzak A. Robust Estimation for Range Image Segmentation and Reconstruction. IEEE Trans. Pattern Analysis and Machine Intelligence. 1994 May;vol. 16(no. 5):530–538.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |