Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. Author manuscript; available in PMC 2010 September 22.
Published in final edited form as:
Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2006 July 5; 1: 1283–1288.
doi:  10.1109/CVPR.2006.131
PMCID: PMC2943644

Groupwise point pattern registration using a novel CDF-based Jensen-Shannon Divergence


In this paper, we propose a novel and robust algorithm for the groupwise non-rigid registration of multiple unlabeled point-sets with no bias toward any of the given point-sets. To quantify the divergence between multiple probability distributions each estimated from the given point sets, we develop a novel measure based on their cumulative distribution functions that we dub the CDF-JS divergence. The measure parallels the well known Jensen-Shannon divergence (defined for probability density functions) but is more regular than the JS divergence since its definition is based on CDFs as opposed to density functions. As a consequence, CDF-JS is more immune to noise and statistically more robust than the JS.

We derive the analytic gradient of the CDF-JS divergence with respect to the non-rigid registration parameters for use in the numerical optimization of the groupwise registration leading a computationally efficient and accurate algorithm. The CDF-JS is symmetric and has no bias toward any of the given point-sets, since there is NO fixed reference data set. Instead, the groupwise registration takes place between the input data sets and an evolving target dubbed the pooled model. This target evolves to a fully registered pooled data set when the CDF-JS defined over this pooled data is minimized. Our algorithm is especially useful for creating atlases of various shapes (represented as point distribution models) as well as for simultaneously registering 3D range data sets without establishing any correspondence. We present experimental results on non-rigid registration of 2D/3D real point set data.

1. Introduction

Matching point patterns is ubiquitous in many fields of Engineering and Science e.g., medical imaging, sports science, archaeology, and others. Point sets are widely used in computer vision to represent boundary points of shapes contained in images or any other salient features of objects contained in images. Given two or more images represented using the salient features contained therein, most often than not, one is interested in matching these (feature) point patterns to determine a linear or a nonlinear transformation between the coordinates of the feature point sets. Such transformations capture the changes in the pattern geometry characterized by the given feature point set. An important problem in this context is how can we achieve an unbiased registration of the two or more point sets given i.e., if one arbitrarily chooses any one of the given data sets as a reference, the estimated registration transformation would be biased toward this chosen reference and it would be desirable to avoid such a bias. In this paper, we address this problem of achieving an unbiased (non-rigid) registration of the given multiple (feature point) data sets.

There are numerous techniques in literature on point pattern matching. Iterated closest point (ICP) technique [1] is a popular approach in the vision community to achieve point set registrations using the assumption that closest points correspond to each other. Despite its popularity, ICP has several shortcomings. The non-differentiable cost function associated with ICP introduces the local convergence problem which requires sufficient overlap between the data-sets and a close initialization. Also, a naive implementation of ICP is know to be prone to outliers which prompted more robust variation such as [5].

In [2], Chui et al., present unlabeled non-rigid registration algorithm for a pair of unlabeled point sets. The main strength of their work is the ability to jointly determine the correspondences and non-rigid transformation between two point sets using a combination of techniques such as deterministic annealing and soft-assign. Their work however is not stable in the presence of outliers and computational complexity of the technique is rather high. Recently, Glaunes et al. [6] have approached the problem of matching point sets by converting the problem into an image matching problem. That is, they represent each point-set as a sum of Dirac delta functions and then ”soften” the Dirac delta functions by using a Gaussian kernel. Subsequently, they use an L2 norm on the ”images” and minimize this distance (plus a diffeomorphic regularizer). Since ”smoothed” delta functions are used to represent each point-set, this approach suffers from not having a noise model and there is essentially no way to estimate the degree of smoothness of the Gaussian kernel.

Another class of approaches involve methods that align the two given point sets without explicitly establishing a point correspondence. The idea is to use a kernel density estimator to estimate the density function fit to each of the two given data sets and then apply an information theoretic measure to compute the (dis)similarity between them. Recently, Tsin and Kanade [9] propose a kernel correlation based point set registration approach where the cost function is proportional to the correlation of two kernel density estimates. In [7], Bing et al., introduced a novel algorithm for rigidly and non-rigidly registering pairs of data sets. They derived a closed form expression for the distance between mixtures of Gaussians reprepresenting the point set data. This analytic cost function is then minimized over the space of desired transformations. Their algorithm is very fast in comparison to existing methods on point set registration and the results shown are quantitatively satisfactory.

All the methods discussed thus far are restricted to registering pairs of point set data. They are not directly extendible to groupwise registration of multiple point sets. Groupwise point set registration has become more popular recently . In [4], a groupwise registration method is presented. However, in their work, the stability of the registration result is not guaranteed in the case of data with outliers, and hence a good stopping criterion is required. Unlike their approach, we do not need to first solve a correspondence problem in order to subsequently solve the non-rigid registration problem.

The rest of paper is organized as follows: In section 2, we present the derivation of the CDF-JS and also the analytic derivative of the CDF-JS with respect to parameters of the non-rigid transformations required to register the multiple data sets. This is followed by a description of the groupwise registration algorithm. In section 3, we present several examples of the groupwise registration algorithm applied to several synthetic as well as real data sets in 2D and 3D. Finally, we conclude in section 4.

2. The Groupwise Registration Model

In this section, we present the details of our proposed non-rigid registration model. The basic idea is to model each point set by a probability distribution, then quantify the distance between these probability distributions using an information-theoretic measure. This similarity measure is then optimized over the space of coordinate transformations yielding the desired transformations. We will begin by deriving this new similarity measure, namely the CDF-JS.

2.1. CDF-based Jensen-Shannon Divergence

Jensen-Shannon (JS) divergence serves as a measure of cohesion between multiple probability distributions. Given N probability distributions Pa, a [set membership] {1,…, N}, the JS-divergence of Pa is defined by


where π = {π1, π2, …, πNa > 0, ∑ πa = 1} are the weights of the probability distributions Pa and H(Pa) is the Shannon entropy. The two terms on the right hand side of Eqn.(1) are: the entropy of P := ∑ πa Pa (the π- convex combination of the Pas ) and the same convex combination of the respective entropies.

It is easy to put the Jensen-Shannon divergence in to the framework of hypothesis testing. To see this, we construct a likelihood ratio between i.i.d. samples drawn from a mixture (∑a πa Pa) and i.i.d. samples drawn from a heterogeneous collection of distributions (P1, P2, …, PN) with the samples being indexed by the specific member distribution in the family from which they are drawn. Assume that N1 samples are drawn from P1, N2 from P2 etc. Let the total number of pooled samples be defined as M=defa=1NNa. The likelihood ratio then is,


In contrast to the typical statistical test relative to a threshold, we seek the maximum of (2) which is equivalent to minimizing the negative logarithm of (2). That is, we seek to maximize the probability that the samples are drawn from the mixture rather than from separate members of the family (P1, P2, …, PN). In the context of groupwise matching of point-sets, this makes eminent sense since maximizing the above ratio is tantamount to increasing the chance that all of the observed point-sets are warped versions of the same underlying warped and pooled data model. The notion of the pooled data model is defined as follows. In our process of groupwise registration, the warping does NOT have a fixed target data set. Instead, the warping is between the input data sets and an evolving target which we call the pooled model. The target evolves to a fully registered pooled data set at the end of the optimization (of the CDF-JS) process. The pooled model then consists of input data sets which have undergone groupwise matching and are now fully registered with each other. The connection to the JS-divergence arises from the fact that the negative logarithm of the above ratio (Eqn.2) asymptotically converges to the JS-divergence when the samples are assumed to be drawn from the mixture ∑a πa Pa.

The JS divergence has many important properties. However, extension of this notion to continuous distribution poses some challenges. A straight forward extension of the discrete case to continuous distributions is to replace the Shannon entropy by the differential entropy. This definition raises the following concerns: (1) The entropy of a discrete distribution is always positive, while the differential entropy of a continuous random variable may take any value on the extended real line. Therefore, entropies are not well defined when defining JS for mixed (continuous and discrete) distributions. (2) It is ”inconsistent” in the sense that the differential entropy of a uniform distribution in an interval of length a is log a, which is zero if a = 1, negative if a < 1, and positive if a > 1. For additional discussion on some of these issues the reader is referred to [8]. In this work we propose an alternative measure of divergence between multiple random variables based on cumulative distribution functions and dub it the CDF-JS divergence. The main objective of our study is to develop a new divergece measure between cumulative distributions of random variables which would be consistent across discrete and continuous domains.

The key idea in the development of the divergence between CDFs is to use the recently introduced concept of entropy defined on CDFs in Wang et al., [10]. For conveninence, we reproduce the definition of this entropy here. Let x be a random variable in R, the cumulative residual entropy (CRE) of x, is defined as:


Where R+ = (x [set membership] R; x ≥ 0). The distribution function is more regular because it is defined in an integral form unlike the density function, which is defined as the derivative of the distribution. The definition also preserves the well established principle that the logarithm of the probability of an event should represent the information content in the event. Using the definition of KL-divergence between CDFs, we derived the formula for the CDF-JS divergence between N probability distributions Pa, a [set membership] {1, …, N} which is summarized in the following theorem:

Theorem 1 Given N probability distributions Pa, a [set membership] {1, …, N}, the CDF-JS divergence of Pa is given by


The proof is not difficult and is omitted due to space constraints.

2.2. CDF-JS Divergence for Groupwise Point-sets Registration

Let the point-set data being registered be denoted by {Xa, a [set membership] {1, …, N}}. Each point-set Xa consists of points {xiaD,i{1,,na}}. Assume that each point set Xa is related to the pooled and registered data via an unknown function transformation fa, μa is the set of the transformation parameters associated with each function fa. To register the all these point data sets, we need to recover these transformation parameters. This problem can modeled as an optimization problem with the objective function being the CDF-JS divergence between the distributions of the deformed point-sets, represented as Pa = p(fa(Xa)), the atlas construction problem can now be formulated as,


In Eqn.(5), the weight parameter λ is a positive constant, the operator L determines the kind of regularization imposed. For example, L could correspond to a thin-plate spline, a Gaussian radial basis function, etc. Each choice of L is in turn related to a kernel and a metric of the deformation.

Following the approach in [3], we choose the thin-plate spline (TPS) to represent the non-rigid deformation. Given n control points x1, … , xn in Rd, a general nonrigid mapping f : RdRd represented by thin-plate spline can be written analytically as: f(x) = WU(x) + Ax + t. Here Ax + t is the linear part of f. The nonlinear part is determined by a d × n matrix, W. And U(x) is an n × 1 vector consisting of n basis functions Ui(x) = U(x, xi) = U(‖xxi‖) where U(r) is the kernel function of thin-plate spline. Therefore, the cost function for non-rigid registration can be formulated as an energy functional in a regularization framework, where the regularization term in Eqn.(5) is governed by the bending energy of the thin-plate spline warping and can be explicitly given by trace(WKWT) where K = (Kij), Kij = U(xi, xj) describes the internal structure of the control point sets. In our experiments, the clusters are used as control points. Other schemes to choose control points may also be considered. Note the linear part can be obtained by an initial affine registration, then an optimization can be performed to find the parameter W.

Having introduced the cost function and the transformation model, now the task is to design an efficient way to estimate the empirical CDF-JS divergence and derive the analytic gradient of the estimated divergence in order to achieve the optimal solution efficiently.

2.3. Estimating the Empirical CDF-JS

We will use the parzen window technique to estimate the cumulative distribution function and its derivative. The calculation of CDF-JS divergence requires the estimation of the cumulative probability distributions of each point-set. Without lost of generality, we only discuss the derivation for the 2D case, which can be extended to 3D case easily. For each point xia with the coordinates [xia,yia] in the point set Xa, it is transformed by the function fa to fa(xia,μa)=[fa(xia,μa),fa(yia,μa)]. Let β(3) be a cubic spline Parzen window. The smoothed probability density function pa(l, k; μa) of the point-set Xa, a [set membership] {1, …, N}} is given by


where αa is a normalization factor that ensures ∫ ∫ p(l, k)dldk = 1, [l, k] are the coordinate values in the X and Y axis respectively, the transformed point coordinates [fa(xia,μa),fa(yia,μa)] is normalized by the minimum coordinate value, x0, y0, and the range of each bin, ΔbX, ΔbY . From the density function, we can calculate the cumulative residual distribution function by the formula Pa(l>λ,k>γ;μa)=λγpa(l,k;μa)dldk, where λ [set membership] Lx, γ [set membership] Ly, Lx, Ly are discrete sets of coordinate values in the X and Y axis respectively. To express it in further detail, we have the following,


where Φ(.) is the cumulative residual function of cubic spline kernel which is defined as follows,


Note that dΦ(u)du=β(3)(u).

Having specified the distribution function of the data, we can then rewrite Eqn. (5) as follows, (For simplicity, we choose πa=1N,a={1,2,,N}. )


where P is the cumulative residual distribution function for the density function 1Npa(l,k;μa), which can be expressed as


2.4. Optimizing the CDF-JS Divergence

Computation of the gradient of the energy function is necessary in the minimization process when employing a gradient-based scheme. If this can be done in analytical form, it leads to an efficient optimization method. We now present the analytic form of the gradient of the CDF-JS divergence (our cost function):


Each component of the gradient maybe found by differentiating Eqn (7) with respect to the transformation parameters. It can be easily shown that P(λ,γ;μa)μa=1NPa(λ,γ;μa)μa. Based on these facts, it is straight forward to derive the gradient of the CDF-JS divergence with respect to the transformation parameters μa, which is given by

𝒞μa=λγ[1+logP]P(λ,γ;μa)μa    +1Nλγ[1+logPa]Pa(λ,γ;μa)μa=1NaλγPa(λ,γ;μa)μalogPaP

As a byproduct of the groupwise registration using the CDF-JS divergence, we get the atlas of the given population of data sets, which is simply obtained by substituting the estimated transformation functions fa in to the formula for the atlas p(𝒜)=a=1NπaPa. Note that our algorithm can be applied to yield a biased registration in situations that demand such a solution. This is achieved by fixing one of the data sets (say the reference) and estimating the transformation from this to the novel scene data. We will present experimental results on point-set alignment between two given point-sets as well as atlas construction from multiple point-sets in the next section.

3. Experiment Results

We now present experimental results on the application of our algorithm to both synthetic and real data sets. First, to demonstrate the robustness and accuracy of our algorithm, we show the alignment results by applying the CDF-JS divergence to the matching of pairs of point-sets. Then, we will present the atlas construction results achieved via groupwise registration of multiple point sets, in the second part of this section.

3.1. Pair-wise Point Set Alignment

First, to see how our method behaves in the presence of noise and outliers, we designed the following procedure to generate a corrupted template point set from a model set. For a model set with n points, we control the degree of corruption by (1) discarding a subset of size (1 − ρ)n from the model point set, (2) applying a rigid transformation (R,t) to the template, (3) perturbing the points of the template with noise (of strength ε), and (4) adding (τ − ρ)n spurious, uniformly distributed points to the template. Thus, after corruption, a template point set will have a total of τn points, of which only ρn correspond to points in the model set. Since ICP is known to be sensitive to outliers, we only compare our method with the more robust KC method in terms of the sensitivity of noise and outliers. The comparison is done via a set of 2D experiments. At each of several noise levels and outliers strengths, we generate five models and six corrupted templates from each model for a total of 30 pairs at each noise and outlier strength setting. For each pair, we use our algorithm and the KC method to estimate the known rigid transformation which was partially responsible for the corruption. Results show when the noise level is low, both KC and our method have strong resistance to outliers. However, we observe that when the noise level is high, our method exhibits stronger resistance to outliers than the KC method, as shown in Figure 1. A 3D example is also presented in Figure 2.

Figure 1
Robustness to outliers in the presence of large noise. Errors in estimated rigid transform vs. proportion of outliers ((τ − ρ)/(ρ)) for both our method and KC method.
Figure 2
Robustness test on 3D swan data. ’o’ and ’+’ indicate the model and scene points respectively. Note that the scene point-set is corrupted by noise and outliers.

3.2. Groupwise Registration for Atlas Construction Results

In this section, we begin with a simple yet illustrative example of the application of our algorithm for 2D atlas construction. After this example, we describe a 3D implementations on real hippocampal data sets. The structure we are interested in this experiment is the corpus callosum as it appears in MR brain images. Constructing an atlas for the corpus callosum and subsequently analyzing the individual shape variation from ”normal” anatomy has been regarded as potentially valuable for the study of brain diseases such as agenesis of the corpus callosum(ACC), and fetal alcohol syndrome(FAS).

We manually extracted points on the outer contour of the corpus callosum from seven normal subjects, (as shown Figure 3, indicated by ”o”). The recovered deformation between each point-set and the mean shape are superimposed on the first two rows and the left image in third row in Figure 3. The resulting atlas (mean point-set) is shown in third row of Figure 3, and is superimposed over all the point-sets. As we described earlier, all these results are computed simultaneously and automatically. This example clearly demonstrate that our joint matching and atlas construction algorithm can simultaneously align multiple shapes (modeled by sample point-sets) and compute a meaningful atlas/mean shape.

Figure 3
Experiment results on 7 2D corpus collasum point sets. The first two rows and the left image in third row show the deformation of each point-set to the atlas, superimposed with initial point set (show in ’o’) and deformed point-set (shown ...

Next, we present results on 3D hippocampal point-sets. Four 3D point-sets were extracted from epilepsy patients with left anterior temporal lobe foci identified with EEG. An interactive segmentation tool was used to segment the hippocampus from the 3D brain MRI scans of 4 subjects. The point-sets differ in shape, with the number of points 450, 421, 376, 307 in each point-set respectively. In the first four images of Figure 4, the recovered nonrigid deformation between each hippocampal point-set to the atlas is shown along with a superimposition on all of the original data sets. In second row of the Figure 4, we also show the scatter plot of original point-sets along with all the point-sets after the non-rigid warping. An examination of the two scatter plots clearly shows the efficacy of our recovered non-rigid warping. Note that validation of what an atlas shape ought to be in the real data case is a difficult problem and we relegate its resolution to a future paper.

Figure 4
Atlas construction from four 3D hippocampal point sets. The first row and the left image in second row shows the deformation of each point-set to the atlas (represented as cluster centers), superimposed with initial point set (show in green ’o’) ...

4. Conclusions

In this paper, we presented a novel and robust algorithm that utilize an information theoretic measure, namely CDF based Jensen-Shannon divergence, to simultaneously register multiple unlabeled point-sets (represented by probability distribution). The primary technical challenge in groupwise non-rigid registration of point-sets is the correspondence problem. Our work avoids the correspondence problem by minimizing the CDF based Jensen-Shannon (JS) divergence between the point sets represented as probability distributions. Other advantages of using the CDF-JS divergence over existing methods in literature for groupwise point-sets non-rigid registration is that, the CDF-JS divergence is symmetric, and allows for use of unequal cardinality of the given point sets to be registered. The cost function optimization is achieved very efficiently by computing analytic gradients of the same and utilizing them in a quasi-Newton scheme. We compared our algorithm performance with competing methods on real and synthetic data sets and showed significantly improved performance in the context of robustness to noise and outliers in the data. Experiments were depicted with both 2D and 3D point sets from medical and non-medical domains. Our future work will focus on generalizing the non-rigid deformations to diffeomorphic mappings.


This research was in part funded by the NIH grant RO1 NS046812 to BCV and the NSF grant NSF0307712 to AR.


1. Besl PJ, McKay ND. A method for registration of 3-D shapes. IEEE Trans. Patt. Anal. Machine Intell. 1992 February;14(2):239–256.
2. Chui H, Rangarajan A. CVPR2000. 2000. A new algorithm for non-rigid point matching; pp. 2044–2051.
3. Chui H, Rangarajan A. A new point matching algorithm for non-rigid registration. Computer Vision and Image Understanding (CVIU) 2003;89:114–141.
4. Chui H, Rangarajan A, Zhang J, Leonard CM. Unsupervised learning of an atlas from unlabeled point-sets. IEEE Trans. Pattern Anal. Mach. Intell. 2004;26(2):160–172. [PubMed]
5. Fitzgibbon AW. BMVC2001. 2001. Robust registration of 2D and 3D point sets.
6. Glaunes J, Trouve A, Younes L. CVPR. volume 2. 2004. Diffeomorphic matching of distributions: A new approach for unlabelled point-sets and sub-manifolds matching; pp. 712–718.
7. Jian B, Vemuri B. ICCV2005. 2005. A robust algorithm for point set registration using mixture of gaussians; pp. 1246–1251. [PMC free article] [PubMed]
8. Jumarie G. Relative Information. Springer; 1990.
9. Tsin Y, Kanade T. ECCV2004(3) 2004. A correlation-based approach to robust point set registration; pp. 558–569.
10. Wang F, Vemuri BC, Rao M, Chen Y. IPMI. 2003. A new & robust information theoretic measure and its application to image alignment; pp. 388–400. [PubMed]