PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
J Geom Mech. Author manuscript; available in PMC 2010 November 23.
Published in final edited form as:
J Geom Mech. 2010 June 1; 2(2): 119–158.
doi:  10.3934/jgm.2010.2.119
PMCID: PMC2990239
NIHMSID: NIHMS224205

INFORMATION-THEORETIC INEQUALITIES ON UNIMODULAR LIE GROUPS

Abstract

Classical inequalities used in information theory such as those of de Bruijn, Fisher, Cramér, Rao, and Kullback carry over in a natural way from Euclidean space to unimodular Lie groups. These are groups that possess an integration measure that is simultaneously invariant under left and right shifts. All commutative groups are unimodular. And even in noncommutative cases unimodular Lie groups share many of the useful features of Euclidean space. The rotation and Euclidean motion groups, which are perhaps the most relevant Lie groups to problems in geometric mechanics, are unimodular, as are the unitary groups that play important roles in quantum computing. The extension of core information theoretic inequalities defined in the setting of Euclidean space to this broad class of Lie groups is potentially relevant to a number of problems relating to information gathering in mobile robotics, satellite attitude control, tomographic image reconstruction, biomolecular structure determination, and quantum information theory. In this paper, several definitions are extended from the Euclidean setting to that of Lie groups (including entropy and the Fisher information matrix), and inequalities analogous to those in classical information theory are derived and stated in the form of fifteen small theorems. In all such inequalities, addition of random variables is replaced with the group product, and the appropriate generalization of convolution of probability densities is employed. An example from the field of robotics demonstrates how several of these results can be applied to quantify the amount of information gained by pooling different sensory inputs.

Key words and phrases: Information Theory, Lie Group, Convolution, Inequalities

1. Introduction

Shannon’s brand of information theory is now more than six decades old, and some of the statistical methods developed by Fisher, Kullback, etc., are even older. Similarly, the study of Lie groups is now more than a century old. Despite their relatively long and roughly parallel history, surprisingly few connections appear to have been made between these two vast fields. The only attempts to do so known to the author include those of Johnson and Suhov [37, 38] from an information-theoretic perspective, Willsky [82] from an estimation and controls perspective, and Maksimov [50] and Roy [61] from a probability perspective.

The goal of this paper is therefore to present analytical foundations for “information theory on Lie groups.” As such, fifteen small theorems are presented that involve the structure and/or group operation of Lie groups. Unlike extensions of information theory to manifolds, the added structure inherent in Lie groups allow us to draw much stronger parallels with inequalities of classical information theory, such as those presented in [25, 28].

To the best of the author’s knowledge the only work that uses the concept and properties of information-theoretic entropy on Lie groups is that of Johnson and Suhov [37, 38]. Their goal was to use the Kullback-Leibler divergence between probability density functions on compact Lie groups to study the convergence to uniformity under iterated convolutions, in analogy with what was done by Linnik [47] and Barron [4] in the commutative case. The goal of the present paper is complementary: using some of the same tools, many of the major defined quantities and inequalities of (differential) information theory are extended from Rn to the context of unimodular Lie groups, which form a broader class of Lie groups than compact ones.

The goal here is to define and formalize probabilistic and information-theoretic quantities that are currently arising in scenarios such as robotics [45, 52, 65, 72, 57, 40, 79], bacterial motion [7, 77], and parts assembly in automated manufacturing systems [9, 22, 63, 68]. The topics of detection, tracking, estimation and control on Lie groups have been studied extensively over the past four decades. For example, see [10, 11, 15, 29, 39, 41, 49, 67, 19, 57, 79, 51, 3, 82] (and references therein). Many of these problems involve probability densities on the group of rigid-body motions. Indeed, the author’s own work on highly articulated robotic arms and polymer/DNA statistical mechanics involve computing probabilities and entropies on the group of rigid-body motions [23, 22, 58, 79, 80, 86]. However, rather than focusing only on rigid-body motions, a general information theory on the much broader class of unimodular Lie groups is presented here with little additional effort.

Several other research areas that would initially appear to be related to the present work have received intensive interest. Decades ago Amari and Csiszár developed the concept of information geometry [1, 26] in which the Fisher information matrix is used to define a Riemannian metric tensor on spaces of probability distributions, thereby allowing those spaces to be viewed as Riemannian manifolds. This provides a connection between information theory and differential geometry. However, in information geometry, the probability distributions themselves (such as Gaussian distributions) are defined on a Euclidean space, rather than on a Lie group. The presentation provided here opens up the possibility of defining information geometries on spaces of functions on Lie groups.

A different kind of connection between information theory and geometry has been established in the context of medical imaging and computer vision in which probability densities on manifolds are analyzed using information-theoretic techniques [59]. However, a manifold generally does not have an associated group operation, and so there is no natural way to “add” random variables.

Very little has been done along the lines of developing information theory on Lie groups, which in addition to possessing the structure of differential manifolds, also are endowed with group operations. Indeed, it would appear that applications such as deconvolution on Lie groups [16] and the field of Simultaneous Localization and Mapping (or SLAM) [72] have preceded the development of formal information inequalities that take advantage of the Lie-group structure of rigid-body motions.

This paper attempts to address this deficit with a two-pronged approach: (1) by collecting some known results from the functional analysis literature and reinterpreting them in information-theoretic terms (e.g. Gross’ log-Sobolev inequality on Lie groups); and (2) by defining information-theoretic quantities such as entropy and Fisher information matrix, and deriving inequalities involving these quantities that parallels those in classical information theory.

The remainder of this paper is structured as follows: Section 2 starts with two concrete examples of Lie groups. This leads in to the more general review of unimodular Lie groups presented in Section 3. A discussion of harmonic analysis and probability theory on unimodular Lie groups is presented in Section 4. Section 5 defines entropy and relative entropy for unimodular Lie groups and proves some of their properties under convolution and marginalization over subgroups and coset spaces. The concept of the Fisher information matrix for probability densities on unimodular Lie groups is defined in Section 6 and several elementary properties are proven. This generalized concept of Fisher information is used in Section 7 to establish the de Bruijn inequality for unimodular Lie groups. These definitions and properties are combined with recent results by others on log-Sobolev inequalities in Section 8. Section 9 derives a version of the Cramér-Rao bound for concentrated pdfs on Lie groups. Section 10 illustrates the efficacy of a subset of these theorems in the context of a mobile-robot localization problem. Finally, Section 11 summarizes the results and reviews how these equalities might be applied to other problems of practical interest.

2. Definition and examples of unimodular Lie groups

A Lie group G with operation ○ is called unimodular if there exists a volume element dg such that for every h [set membership] G and every function f:GC integrable with respect to dg that

Gf(hg)dg=Gf(gh)dg=Gf(g)dg.
(1)

It is known that whenever the above condition holds, that [71, 19]

Gf(g1)dg=Gf(g)dg
(2)

follows. Many possible spaces of functions on a unimodular Lie group, G, can exist. This includes those that are absolutely integrable (which is implicit in the above equations), forming the space L1(G). There are those that are square integrable, meaning that

Gf(g)2dg<.

The space of all such functions is L2(G). There is also the space of smooth functions C(G). Since the matrix Lie groups of interest in applications are also smooth Riemannian manifolds, the space of smooth functions, C(G), can be defined with respect to this manifold structure. The focus here is probability density functions (i.e., non-negative real-valued functions that integrate to unity),

f(g)0gGandGf(g)dg=1,

which automatically means that f [set membership] L1(G). In addition, we will restrict the focus to probability densities fN(G), the class of “nice” functions defined by

N(G)=L1(G)L2(G)C(G).

While this may seem like a severe restriction, in practice bandlimited Fourier expansions of any function in L2(G) are also in L1(G) and C(G). Setting the finite bandlimit high enough ensures the desired accuracy in the approximation of any f [set membership] L2(G) while simultaneously being in L1(G) and C(G).

When using coordinates, the volume element for an n-dimensional Lie group is expressed as dg = [mid ]J(q)[mid ]dq1dq2(...)dqn where J(q) is a Jacobian and [mid ]J(q)[mid ] is its determinant. The details of how to compute J(q) for specific matrix Lie groups are provided below, and the relationship of J(q) to more standard quantities such as adjoint matrices and the modular function is reviewed in Section 3. The above conditions then can be stated in terms of conditions on [mid ]J(q)[mid ]. Examples of this computation are provided in the subsections that follow.

Examples of unimodular Lie groups are examined in detail below. Readers who are already familiar with these examples can skip directly to Section 3. Perhaps one reason why there has been little cross-fertilization between the theory of Lie groups and information theory is that the presentation styles in these two fields are very different. Whereas Lie groups belong to pure mathematics, information theory emerged from engineering. Therefore, this section reviews some on the basic properties of Lie groups from a concrete engineering perspective. The focus is on matrix Lie groups, and the treatment of Lie-theoretic concepts is reduced to simple matrix-algebraic notation. All of the groups considered are therefore matrix Lie groups. Likewise, some notations used in information theory, such as H(X) to denote the entropy of a random variable, are replaced with S(ρ), where ρ is the probability density describing X. This notation, while not standard in information theory, fits better in the context of Lie groups.

2.1. Example 1: The rotation group

Consider the set of 3×3 rotation matrices

SO(3)={RR3×3RRT=I,detR=+1}.

Here SO(3) denotes the set of special orthogonal 3 × 3 matrices with real entries.

Given any 3-parameter description of rotation, the angular velocity of a rigid body can be obtained from a rotation matrix. Angular velocity in the body-fixed and space-fixed reference frames can be written respectively as1

ωr=Jr(q)q.andωl=Jl(q)q.

where q is any parametrization of SO(3).

The Lie algebra so(3) consists of skew-symmetric matrices of the form

X=(0x3x2x30x1x2x10)=i=13xiXi.
(3)

The skew-symmetric matrices {Xi} form a basis for the set of all such 3 × 3 skew-symmetric matrices, and the coefficients {xi} are all real. The “vee” operator converts Xi into a natural unit basis vector in R3. That is, (Xi)[logical or] = ei and so X[logical or] = x. The opposite of the “vee” operator is the “hat” operator defined by x^=X.

Lie algebras and Lie groups are related in general by the exponential map. For matrix Lie groups (which are the only kind of Lie groups that will be discussed here), the exponential map is the matrix exponential function. In this specific case,

exp:so(3)SO(3).

It is well known (see [19] for derivation and references) that this yields the parametrization

R(x)=eX=I+sinxxX+(1cosx)x2X2
(4)

where x=(x12+x22+x32)12.

The Jacobian matrices Jr(x) and Jl(x) are computed from the parametrization R(x) and the definition of the [logical or] operation as

Jl(x)=[(Rx1RT),(Rx2RT),(Rx3RT)].

and

Jr(x)=[(RTRx1),(RTRx2),(RTRx3)].

This gives a hint as to why the subscripts l and r are used: if derivatives with respect to parameters appear on the ‘right’ of RT, this is denoted with an r, and if they appear on the ‘left’ then a subscript l is used.

Whereas the set of all rotations together with matrix multiplication forms a noncommutative (R1R2R2R1 in general) Lie group, the set of all angular velocity vectors ωr and ωl (or more precisely, their corresponding matrices, ω^r and ω^l) together with the operations of addition and scalar multiplication form a vector space. Furthermore, this vector space is endowed with an additional operation, the cross product ω1 × ω2 (or equivalently the matrix commutator [ω^1,ω^2]=ω^1ω^2ω^2ω^1). This makes the set of all angular velocities a Lie algebra, which is denoted as so(3) (as opposed to the Lie group, SO(3)).

An interesting and useful fact is that except for a set of measure zero, all elements of SO(3) can be captured with the parameters within the open ball defined by ||x|| < π. The nature of the set of values ||x|| = π are irrelevant in the computations that are performed here because it constitutes a set of measure zero, which is of no consequence when performing computations with functions in N(G) in general, and for the specific case when G = SO(3). The matrix logarithm of any group element parameterized in the range ||x|| < π is also well defined. Therefore, when ||x|| < π

log(R)=12xsinx(RRT)

where

x=cos1(tr(R)12)

where tr denotes the trace.

Relatively simple analytical expressions have been derived for the Jacobian Jl and its inverse when rotations are parameterized as in (4):

Jl(x)=I+1cosxx2X+xsinxx3X2.
(5)

The corresponding Jacobian Jr is calculated as [19]

Jr(x)=I1cosxx2X+xsinxx3X2.

Note that

Jl=JrTandJl=RJr.

The determinants are

det(Jl)=det(Jr)=2(1cosx)x2

2.2. Example 2: The Euclidean motion group of the plane

The Euclidean motion group of the plane can be thought of as the set of all matrices of the form

g(x,y,θ)=(cosθsinθxsinθcosθy001)
(6)

together with the operation of matrix multiplication.

It is straightforward to verify that the form of these matrices is closed under multiplication and inversion, and that g(0,0,0)=I, and that it is therefore a group. This is often referred to as the special Euclidean group, and is denoted as SE(2). Like SO(3), SE(2) is three dimensional. However, unlike SO(3), SE(2) is not compact. Nevertheless, it is possible to define a natural integration measure for SE(2) as

dg=dxdydθ.

And while SE(2) does not have finite volume (and so there is no single natural normalization constant such as 8π2 in the case of SO(3)), this integration measure nevertheless can be used to compute probabilities from probability densities.

Note that

g(x,y,θ)=exp(xX1+yX2)exp(θX3)

where

X1=(001000000);X2=(000001000);X3=(010100000).

These matrices form a basis for the Lie algebra, se(2). It is convenient to identify these with the natural basis for R3 by defining (Xi)[logical or] = ei. (This use of the [logical or] operation for SE(2) is completely analogous to how it was used in the context of SO(3), only now the basis elements and group are different.) In so doing, any element of se(2) can be identified with a vector in R3.

The Jacobians for this parametrization are then of the form

Jl=[(gxg1),(gyg1),(gθg1)]=(cosθsinθ0sinθcosθ0001)

and

Jr=[(g1gx),(g1gy),(g1gθ)]=(10y01x001).

Note that

det(Jl)=det(Jr)=1.

This parametrization is not unique, though it is probably the most well-known one.

3. Calculus on Lie groups

Whereas two low-dimensional examples of Lie groups were presented in the previous section to make the discussion concrete, a vast variety of different kinds of Lie groups exist. For example, the same constraints that were used to define SO(3) relative to R3×3 can be used to define SO(n) from Rn×n. The result is a Lie group of dimension n(n − 1)/2 and has a natural volume element dR. Similarly, the Euclidean motion group generalizes as all (n + 1) × (n + 1) matrices of the form

g=(Rt0T1)=(It0T1)(R00T1)
(7)

resulting in SE(n) having dimension n(n + 1)/2 and natural volume element dg = dRdt where tRn and dt = dt1dt2(...)dtn is the natural integration measure for Rn.

Many other unimodular Lie groups arise in applications. All compact Lie groups (including SU(n) and U(n)) are unimodular, as are the noncompact Lie groups GL(n,R), GL(n,C), SL(n,R), SL(n,C), and the Heisenberg groups, H(n). All finite groups can be considered unimodular when summation over the group replaces integration. This is not an exhaustive list, but it does indicate that unimodular Lie groups form quite a substantial fraction of those groups of interest in geometric mechanics.

The following subsections briefly review the general theory of unimodular Lie groups that will be relevant when defining information-theoretic inequalities.

3.1. Exponential, logarithm, and Vee operation

In general an n-dimensional real matrix Lie algebra is defined by a basis consisting of real matrices {Xi} for i = 1, …, n that is closed under the matrix commutator. That is, [Xi,Xj]=i=1nCijkXk for some real numbers {Cijk}, which are called the structure constants of the Lie algebra.

In a neighborhood around the identity of the corresponding Lie group, the parametrization

g(x1,,xn)=expXwhereX=i=1nxiXi
(8)

is always valid in a region around the identity in the corresponding Lie group. And in fact, for the examples discussed, this parametrization is good over almost the whole group, with the exception of a set of measure zero.

The logarithm map

logg(x)=X

(which is the inverse of the exponential) is valid except on this set of measure zero. It will be convenient in the analysis to follow to identify a vector xRn as

x=(logg)where(Xi)=ei.
(9)

Here {ei} is the natural basis for Rn.

In terms of quantities that have been defined in the examples, the adjoint matrices Ad and ad are the following matrix-valued functions:

Ad(g)=JlJr1andad(X)=logAd(eX).
(10)

The dimensions of these square matrices is the same as the dimension of the Lie group, which can be very different than the dimensions of the matrices that are used to represent the elements of the group. The function Δ(g) = detAd(g) is called the modular function of G. For a unimodular Lie group, Δ(g) = 1, which is used in many texts as the defining property rather than (1). It follows that for unimodular Lie groups [mid ] det(Jl)[mid ] = [mid ] det(Jr)[mid ], and so the subscripts l and r can be dropped when discussing Jacobian determinants.

3.2. Integration and differentiation on unimodular Lie groups

Unimodular Lie groups are defined by the fact that their integration measures are invariant under shifts and inversions. In any parametrization, this measure (or the corresponding volume element) can be expressed as in the examples by first computing a left or right Jacobian matrix and then setting dg = [mid ]J(q)[mid ]dq1dq2(...)dqn where n is the dimension of the group. In the special case when q = x is the exponential coordinates, then [35]

Gf(g)dg=Gf(eX)det(1ead(X)ad(X))dx

where x = X[logical or] and dx = dx1dx2(...)dxn. In the above expression it makes sense to write the division of one matrix by another because the involved matrices commute. The symbol G is used to denote the Lie algebra corresponding to G. In practice the integral is performed over a subset of G, which is equivalent to defining f(eX) to be zero over some portion of G.

Many different kinds of unimodular Lie groups exist. For example, SO(3) is compact and therefore has finite volume; SE(2) is solvable, H(1) is nilpotent; and SL(n,R) is semisimple. Each of these classes of Lie groups has been studied extensively. But for the purpose of this discussion, it is sufficient treat them all within the larger class of unimodular Lie groups.

Given a function f(g), the left and right Lie derivatives are defined with respect to any basis element of the Lie algebra XiG as

X~irf(g)=(ddtf(gexp(tXi)))t=0andX~ilf(g)=(ddtf(exp(tXi)g))t=0.
(11)

The use of l and r mimicks the way that the subscripts were used in the Jacobians Jl and Jr in the sense that if exp(tXi) appears on the left/right then the corresponding derivative is given an l/r designation. This notation, while not standard in the mathematics literature, is useful in computations because when evaluating left/right Lie derivatives in coordinates g = g(q), the left/right Jacobians enter in the computation as [19]

X~rf=[Jr(q)]TqfandX~lf=[Jl(q)]Tqf
(12)

where X~r=[X~1r,,X~nr]T, X~l=[X~1l,,X~nl]T, and [big down triangle, open]q = [[partial differential]/[partial differential]q1, …, [partial differential]/[partial differential]qn]T is the gradient operator treating q like Cartesian coordinates.

Rather than resorting to the fact that a Lie group is a manifold and using general differential geometric techniques, the space of functions C(G) can be defined in terms of the derivatives (11) in analogy with the way that C(Rn) is defined in terms of the usual partial derivatives along coordinate axes.

4. Probability theory and harmonic analysis on unimodular Lie groups

Given two probability density functions f1(g) and f2(g), their convolution is

(f1f2)(g)=Gf1(h)f2(h1g)dh.
(13)

Here h [set membership] G is a dummy variable of integration. Convolution inherits associativity from the group operation, but since in general g1g2g2g1, (f1 * f2)(g) ≠ (f2 * f1)(g).

For a unimodular Lie group, the convolution integral of the form in (13) can be written in the following equivalent ways:

(f1f2)(g)=Gf1(z1)f2(zg)dz=Gf1(gk1)f2(k)dk
(14)

where the substitutions z = h−1 and k = h−1g have been made, and the invariance of integration under shifts and inversions in (1) and (2) is used.

A powerful generalization of classical Fourier analysis exists. It is built on families of unitary matrix-valued functions of group-valued argument that are parameterized by values λ drawn from a set Ĝ and satisfy the homomorphism property:

U(g1g2,λ)=U(g1,λ)U(g2,λ).
(15)

Using * to denote the Hermitian conjugate, it follows that

I=U(e,λ)=U(g1g,λ)=U(g1,λ)U(g,λ),

and so

U(g1,λ)=(U(g,λ))1=U(g,λ).

In this generalized Fourier analysis (called noncommutative harmonic analysis) each U(g, λ) is constructed to be irreducible in the sense that it is not possible to simultaneously block-diagonalize U(g, λ) by the same similarity transformation for all values of g in the group. Such a matrix function U(g, λ) is called an irreducible unitary representation. Completeness of a set of representations means that every (reducible) representation can be decomposed into a direct sum of the representations in the set.

Explicit expressions for matrix entries of the IUR matrices, U(g, λ), are known for many Lie groups in terms of particular parameterizations, and are expressed in terms of the special functions of mathematical physics. For details see the classical works [30, 34, 48, 53, 69, 71, 76, 78, 83] and the more recent [19].

Once a complete set of IURs is known for a unimodular Lie group, the Fourier transform of a function on that group can be defined as

f^(λ)=Gf(g)U(g1,λ)dg.

Here λ (which is analogous to frequency) indexes the complete set of all IURs. An inversion formula can be used to recover the original function from all of the Fourier transforms as

f(g)=G^tr[f^(λ)U(g,λ)]d(λ).
(16)

The integration measure d(λ) on the dual (frequency) space Ĝ must be constructed on a case-by-case basis. In the case of a compact Lie group, Ĝ is discrete, and the resulting inversion formula is a series, much like the classical Fourier series for 2π-periodic functions. In this case the integral over Ĝ becomes a sum, and d(λ) becomes the dimension of the finite-dimensional matrix U(g, λ). However, in the case of noncompact groups the structure of Ĝ and the corresponding integral over Ĝ can vary widely. For example, in the case of SE(n), Ĝ [such that] λ can be identified with the positive real line with measure d(λ) = λn–1dλ.

A convolution theorem follows from (15) as

(f1f2^)(λ)=f^2(λ)f^1(λ)

and so does the Parseval/Plancherel formula:

Gf(g)2dg=G^f^(λ)2d(λ).
(17)

Here || · || is the Hilbert-Schmidt (Frobenius) norm, and d(λ) is the dimension of the matrix U(g, λ).

A useful definition is

u(Xi,λ)=ddt(U(exp(tXi),λ))t=0.

Explicit expressions for U(g, λ) and u(Xi, λ) using the exponential map and corresponding parameterizations for the groups SO(3), SE(2) and SE(3) are given in [57, 30].

As a consequence of these definitions, it can be shown that the following operational properties result [19]:

Xirf^=u(Xi,λ)f^(λ)andXilf^=f^(λ)u(Xi,λ).

This is very useful in probability problems because a diffusion equation with drift of the form

ρ(g;t)t=i=1dhiX~irρ(g;t)+12i,j=1dDijX~irX~jrρ(g;t)
(18)

(where D = [Dij] is symmetric and positive semidefinite and given initial conditions ρ(g; 0) = δ(g)) can be solved in the dual space Ĝ, and then the inversion formula can convert it back. Explicitly,

ρ(g;t)=G^tr[exp(tB(λ))U(g,λ)]d(λ)
(19)

where

B(λ)=12k,l=1nDlku(Xl,λ)u(Xk,λ)l=1nhlu(Xl,λ).

The solution of this sort of diffusion equation is important as a generalization of the concept of a Gaussian distribution. It has been studied by the author in the case of G = SE(3) in the context of polymer statistical mechanics and robotic manipulators [17, 18, 85]. The detailed structure of the matrices u(Xl, λ) and U(g, λ) expressed in terms of the classical special functions of mathematical physics are given in those references. As will be shown shortly, some of the classical information-theoretic inequalities that follow from the Gaussian distribution can be computed using the above analysis.

5. Properties of entropy and relative entropy on groups

The entropy of a pdf on a unimodular Lie group is defined as

S(f)=Gf(g)logf(g)dg.

For example, the entropy of a Gaussian distribution on (G,)=(Rn,+) with covariance ∑ is

S(ρ(g;t))=log{(2πe)n2Σ(t)12}
(20)

where log = loge.

The Kullback-Leibler distance between the pdfs f1(g) and f2(g) on a Lie group G naturally generalizes from its form in Rn as

DKL(f1f2)=Gf1(g)log(f1(g)f2(g))dg.
(21)

As with the case of pdfs in Rn, DKL(f1||f2) ≥ 0. Equality holds in this expression if (but not only if) f1 = f2. More precisely, if DKL(f1||f2) = 0 then f1(g) = f2(g) at “almost all” values of g [set membership] G (or, in probability terminology “f1(g) = f2(g) almost surely”). That is, they must be the same up to a set of measure zero.

Something that is not true in Rn that holds for a compact Lie group is that the maximum-entropy distribution is the number one relative to the normalized Haar measure. Such a distribution can be considered the limiting distribution of the diffusion process in (18) as time goes to infinity. If f2(g) = 1 is this sort of limiting distribution, then DKL(f1||1) = −S(f1).

5.1. Entropy inequalities from Jensen’s inequality

Jensen’s inequality is a fundamental tool that is often used in deriving information-theoretic inequalities, as well as inequalities in the field of convex geometry. In the context of unimodular Lie groups, Jensen’s inequality can be written as

Φ(Gϕ(g)ρ(g)dg)GΦ(ϕ(g))ρ(g)dg
(22)

where Φ:R0R is a convex function on the half infinite line, ρ(g) is a pdf, and ϕ(g) is another nonnegative measurable function on G.

Two important examples of [var phi](x) are [var phi]1(x) = −log x and [var phi]2(x) = +x log x. Using Jensen’s inequality with [var phi]2 gives the following result.

Theorem 5.1. Given pdfs f1(g) and f2(g) on the unimodular Lie group G,

S(f1f2)max{S(f1),S(f2)}
(23)

and

DKL(f1f2)max{DKL(f1ϕf2ϕ),DKL(ϕf1ϕf2)}.
(24)

Proof.

S(f1f2)=GΦ2((f1f2)(g))dg=GΦ2(Gf2(h1g)f1(h)dh)dgGGΦ2(f2(h1g))f1(h)dhdg=G(GΦ2(f2(h1g))dg)f1(h)dh=G(GΦ2(f2(g))dg)f1(h)dh=(GΦ2(f2(g))dg)(Gf1(h)dh)=S(f2).

If, on the other hand, we were to use the version of convolution in (14), and analogous manipulations as above, we would get −S(f1 * f2) ≤ −S(f1), which completes the proof of (23).

The proof of (24) (which is the Lie-group version of the data processing inequality) follows in the usual way from the joint convexity of the functional DKL|| ·).

If G is compact, any constant function on G is measurable. Letting ϕ(g) = 1 and [var phi](x) = [var phi]2(x) then gives 0 ≤ −S(f) for a pdf f(g). In contrast, for any unimodular Lie group, letting ρ(g) = f(g), ϕ(g) = [f(g)]α and [var phi](x) = [var phi]1(x) gives

log(G[f(g)]1+αdg)αS(f).
(25)

This leads to the following theorem.

Theorem 5.2. Let f^(λ) denote the Frobenius norm and f^(λ)2 denote the induced 2-norm of the Fourier transform of f(g) and define

D2(f)=G^logf^(λ)22d(λ)D(f)=G^logf^(λ)2d(λ)D~(f)=logG^f^(λ)2d(λ).
(26)

Then

S(f)D~(f)andD(f)D2(f)
(27)

and

D2(f1f2)D2(f1)+D2(f2)D(f1f2)D(f1)+D(f2).
(28)

Furthermore, denote the unit Heaviside step function on the real line as u(x) and let

B=G^u(f^(λ))d(λ).ThenD~(f)+logBD(f)B.
(29)

Proof. Substituting α = 1 into (25) and using the Plancherel formula (17) yields

S(f)log(G[f(g)]2dg)=log(G^f^(λ)2d(λ))=D~(f).

The fact that −log x is a decreasing function and ||A||2||A|| for all ACn×n gives the second inequality in (27).

The convolution theorem together with the facts that both norms are submultiplicative, −log(x) is a decreasing function, and the log of the product is the sum of the logs gives

D(f1f2)=G^logf1f2^(λ)2d(λ)=G^logf^1(λ)f^2(λ)2d(λ)D(f1)+D(f2).

An identical calculation follows for D2. The statement in (29) follows from the Plancherel formula (17) and using Jensen’s inequality (22) in the dual space Ĝ rather than on G:

Φ(G^ϕ^(λ)ρ(λ)d(λ))G^Φ(ϕ^(λ))ρ(λ)d(λ)
(30)

where

G^ρ(λ)d(λ)=1andρ(λ)0.

Recognizing that when B is finite ρ(λ)=u(f^(λ))B becomes a probability measure on this dual space, it follows that

D~(f)=log(G^f^(λ)2d(λ))=log(BG^f^(λ)2ρ(λ)d(λ))logBG^log(f^(λ)2)ρ(λ)d(λ)=logB+D(f)B.

This completes the proof.

Properties of dispersion measures similar to D(f) and D2(f) were studied in [31], but no connections to entropy were provided previously. By definition, bandlimited expansions have B finite. On the other hand, it is a classical result that for a finite group, Γ, the Plancherel formula is (see, for example, [19]):

γΓf(γ)2=1Γk=1αdk2f^k2

where α is the number of conjugacy classes of Γ and dk is the dimension of f^k. And by Burnside’s formula k=1αdk2=Γ it follows that B = 1 when all f^k0.

5.2. The entropy produced by convolution on a finite group is bounded

A finite group (or for that matter, any group with countable entries) can be considered to be a unimodular zero-dimensional Lie group, where integration is replaced by summation over all of the group elements. Here several results that are unique to this case are presented.

Let Γ be a finite group with [mid ]Γ[mid ] elements {g1, …, g[mid ]Γ[mid ]}, and let ρΓ(gi) ≥ 0 with i=1ΓρΓ(gi)=1 define a probability density/distribution on Γ. In analogy with how convolution and entropy are defined on a Lie group, G, they can also be defined on a finite group, Γ by using the Dirac delta function for G, denoted here as δ(g). If Γ < G (i.e., if Γ is a subgroup of G), then letting

ρG(g)=i=1ΓρΓ(gi)δ(gi1g)=γΓρΓ(γ)δ(γ1g)

can be used to define a pdf on G that is equivalent to a pdf on Γ in the sense that if the convolution of two pdfs on Γ is

(ρ1Γρ2Γ)(gi)=j=1Γρ1Γ(gj)ρ2Γ(gj1gi)
(31)

then

(ρ1Gρ2G)(g)=γΓ(ρ1Γρ2Γ)(γ)δ(γ1g).
(32)

Given a finite group, Γ, let

S(ρ)=i=1Γρ(gi)logρ(gi)=γΓρ(γ)logρ(γ).

Unlike the case of differential/continuous entropy on a Lie group, 0 ≤ S(ρ).

The following theorem describes how the discrete entropy of pdfs on Γ behaves under convolution. Since only finite groups are addressed, the superscript Γ on the discrete values ρ(gi) are dropped.

Theorem 5.3. The entropy of the convolution of two pdfs on a finite group is greater than either of the entropies of the convolved pdfs and is no greater than the sum of their individual entropies

max{S(ρ1),S(ρ2)}S(ρ1ρ2)S(ρ1)+S(ρ2).
(33)

Proof. The lower bound follows in the same way as the proof given for Theorem 5.1 with summation in place of integration. The entropy of convolved distributions on a finite group can be bounded from above in the following way.

Since the convolution sum contains products of all pairs, and each product is positive, it follows that

ρ1(gk)ρ2(gk1gi)(ρ1ρ2)(gi)

for all k [set membership] {1, …, [mid ]Γ[mid ]}. Therefore, since log is a strictly increasing function, it follows that

S(ρ1ρ2)i=1Γ(j=1Γρ1(gj)ρ2(gj1gi))log(ρ1(gk))ρ2(gk1gi)).

Since this is true for all values of k, we can bring the log term inside of the summation sign and choose k = j. Then multiplying by −1, and using the properties of the log function, we get

S(ρ1ρ2)i=1Γj=1Γρ1(gj)ρ2(gj1gi)logρ1(gj)i=1Γj=1Γρ1(gj)ρ2(gj1gi)logρ2(gj1gi).

Rearranging the order of summation signs gives

S(ρ1ρ2)j=1Γρ1(gj)logρ1(gj)(i=1Γρ2(gj1gi))j=1Γρ1(gj)(i=1Γρ2(gj1gi)logρ2(gj1gi)).
(34)

But summation of a function over a group is invariant under shifts. That is,

i=1ΓF(gj1gi)=i=1ΓF(gi)orγΓF(γ1g)=γΓF(g).

Hence, the terms in parenthesis in (34) can be written by replacing gj1gi with gi gives (33).

5.3. Entropy and decompositions

Aside from the ability to sustain the concept of convolution, one of the fundamental ways that groups resemble Euclidean space is the way in which they can be decomposed. In analogy with the way that an integral over a vector-valued function with argument xRn can be decomposed into integrals over each coordinate, integrals over Lie groups can also be decomposed in natural ways. This has implications with regard to inequalities involving the entropy of pdfs on Lie groups. Analogous expressions hold for finite groups, with volume replaced by the number of group elements.

5.3.1. Coset Decompositions

Given a subgroup HG, and any element g [set membership] G, the left coset gH is defined as gH = {gh[mid ]h [set membership] H}. Similarly, the right coset Hg is defined as Hg = {hg[mid ]h [set membership] H}. In the special case when g [set membership] H, the corresponding left and right cosets are equal to H. More generally for all g [set membership] G, g [set membership] gH and g1H = g2H if and only if g21g1H. Likewise for right cosets Hg1 = Hg2 if and only if g1g21H. Any group is divided into disjoint left (right) cosets, and the statement “g1 and g2 are in the same left (right) coset” is an equivalence relation.

When G is finite, an important property of gH and Hg is that they each have the same number of elements as H. The set of all left(or right) cosets is called the left(or right) coset space, and is denoted as G/H (or H\G). For finite groups, [mid ]G/H[mid ] = [mid ]H\G[mid ] = [mid ]G[mid ]/[mid ]H[mid ], which is Lagrange’s theorem. Similar expressions can be written for Lie groups and Lie subgroups. For example, edim(G/H) = edim(G)/edim(H) where dim(·) denotes the dimension of a manifold, and if G is a compact Lie group with Lie subgroup H, then V ol(G/H) = V ol(G)/V ol(H).

In what follows, it will be convenient to denote a function on G as fG(g), a function on G/H as fG/H(gH), etc. Let cG/H : G/HG be a mapping that generates one representative per coset. That is, cG/H(gH) [set membership] gH. Decomposing each g [set membership] G as g = cG/H(gH) ○ h for some h [set membership] H then means that fG(g) = fG(cG/H(gH) ○ h) and [21]2

GfG(g)dg=GH(HfG(cGH(gH)h)dh)d(gH)
(35)

where dh and d(gH) are unique up to normalization. In the special case when fG(g) is a left-coset function (i.e., a function that is constant on left cosets), fG(cG/H(gH)○h1) = fG(cG/H(gH) ○ h2) = fG/H(gH) for all h1, h2 [set membership] H and (35) reduces to

GfG(g)dg=GHfGH(gH)d(gH)

where it is assumed that dh is normalized so that H dh = 1. More generally,

fGH(gH)=HfG(cGH(gH)h)dh

is the value of the function fG(g) averaging over the subgroup H. Similarly,

fH(h)=GHfG(cGH(gH)h)d(gH)

is average of fG(g) taken over the coset gH.

Theorem 5.4. The entropy of a pdf on a unimodular Lie group is no greater than the sum of the marginal entropies on a subgroup and the corresponding coset space:

S(fG)S(fGH)+S(fH).
(36)

Proof. This inequality follows immediately from the nonnegativity of the Kullback-Leibler divergence

DKL(fGfGHfH)0.

For example, if G = SE(n) is a Euclidean motion group and H = SO(n) is the subgroup of pure rotations in n-dimensional Euclidean space, then GHRn, and an arbitrary element of SE(n) is written as a pair (R,t)SO(n)×Rn, then we can write

SE(n)f(g)dg=RnSO(n)f(R,t)dRdt=SE(n)SO(n)(SO(n)f((I,t)(R,0))dR)dt,

and the marginal entropies on the right-hand-side of (36) are those computed for pure rotations and pure translations.

5.3.2. Double Coset Decompositions

Let H < G and K < G. Then for any g [set membership] G, the set

HgK={hgkhH,kK}
(37)

is called the double coset of H and K, and any g[set membership] HgK (including g’ = g) is called a representative of the double coset. Though a double coset representative often can be described with two or more different pairs (h1, k1) and (h2, k2) so that g’ = h1gk1 = h2gk2, we only count g’ once in HgK. Hence [mid ]HgK[mid ][mid ]G[mid ], and in general [mid ]HgK[mid ][mid ]H[mid ] · [mid ]K[mid ]. In general, the set of all double cosets of H and K is denoted H\G/K and we have the hierarchy g [set membership] HgK [set membership] H\G/K. It can be shown that membership in a double coset is an equivalence relation. That is, G is partitioned into disjoint double cosets, and for H < G and K < G either Hg1KHg2K = [empty] or Hg1K = Hg2K.

It is possible to define a mapping cK\G/H : K\G/HG such that for any HgK [set membership] K\G/H, cK\G/H(HgK) [set membership] HgK. Such a function defines a rule for selecting one representative per double coset. Equipped with such a function, it becomes possible to write g = kcK\G/H(HgK) ○ h and hence [21]

GfG(g)dg=KGHHKfG(kcKGH(HgK)h)dkdhd(KgH).
(38)

A particular example of this is the integral over SO(3), which can be written in terms of Euler angles as

SO(3)f(g)dg=18π202π0π02πf(R3(α)R1(β)R3(γ))sinβdαdβdγ=SO(2)SO(3)SO(2)SO(2)SO(2)f(h1(α)c(HgH)h2(γ))dh1dh2d(HgH)

where h1(α) = R3(α), h2(γ) = R3(γ) [set membership] SO(2) and c(HgH) = R1(β) is the coset-representative function, and dh1 = /2π, dh1 = /2π, and d(HgH) = sin βdβ/2 in this case.

Theorem 5.5. The entropy of a pdf on a group is no greater than the sum of marginal entropies over any two subgroups and the corresponding double-coset space:

S(fG)S(fK)+S(fKGH)+S(fH).
(39)

Proof. Let

fK(k)=KGHHfG(kcKGH(HgK)h)dhd(KgH)fH(h)=KGHKfG(kcKGH(HgK)h)dkd(KgH)

and

fKGH(KgH)=KHfG(kcKGH(HgK)h)dhdk,

then again using the nonnegativity of the Kullback-Leibler divergence

DKL(fGfKfKGHfH)0

gives (39).

5.3.3. Nested Coset Decompositions

Theorem 5.6. The entropy of a pdf is no greater than the sum of entropies of its marginals over coset spaces defined by nested subgroups:

S(fG)S(fGK)+S(fKH)+S(fH).
(40)

Proof. Given a subgroup K of H, which is itself a subgroup of G (that is, H < K < G), apply (36) twice. Then S(fG) ≤ S(fG/K) + S(fK) and S(fK) ≤ S(fK/H) + S(fH), resulting in (40). Explicitly, g = cG/K(gK) ○ cK/H(kH) ○ h, and so fG(g) = fG(cG/K(gK) ○ cK/H(kH) ○ h). Therefore,

fGK(gK)=KHHfG(cGK(gK)cKH(kH)h)dhd(kH)fKH(kH)=GKHfG(cGK(gK)cKH(kH)h)dhd(gK)

and

fH(h)=GKKHfG(cGK(gK)cKH(kH)h)d(kH)d(gK).

5.3.4. Class Functions and Normal Subgroups

In analogy with the way a coset is defined, the conjugate of a subgroup H for a given g [set membership] G is defined as gHg−1 = {ghg−1[mid ]h [set membership] H}. Recall that a subgroup NG is called normal if and only if gNg−1 [subset, dbl equals] N for all g [set membership] G. This is equivalent to the conditions N [subset, dbl equals] g−1Ng, and so we also write gNg−1 = N and gN = Ng for all g [set membership] G.

A function, χ(g), that is constant on each class has the property that

χ(g)=χ(h1gh)orχ(hg)=χ(gh)
(41)

for any g, h [set membership] G. Such functions are called class functions. Though convolution of functions on a noncommutative group is generally noncommutative, the special nature of class functions means that

(fχ)(g)=Gf(h)χ(h1g)dh=Gf(h)χ(gh1)dh=Gχ(k)f(k1g)dk=(χf)(g).

where the change of variables k = gh−1 is used together with the unimodularity of G.

5.4. When inequivalent convolutions produce equal entropy

In general (ρ1 * ρ2)(g) ≠ (ρ2 * ρ1)(g). Even so, it can be the case that S(ρ1 * ρ2)(g) = S(ρ2 * ρ1)(g). This section addresses several special cases when this equality holds.

Let G denote a unimodular Lie group and for arbitrary g, g1 [set membership] G define ρ#(g) = ρ(g−1), Lg1ρ(g)=ρ(g11g), Rg1ρ(g) = ρ(gg1), Cg1ρ(g)=ρ(g11gg1). Then if ρ(g) is a pdf, it follows immediately from (1) and (2) that ρ#(g), Lg1ρ(g), Rg1ρ(g), and Cg1ρ(g) are all pdfs. A function for which ρ#(g) = ρ(g) is called symmetric, whereas a function for which Cg1ρ(g) = ρ(g) for all gi [set membership] G is a class function (i.e., it is constant on conjugacy classes).

Theorem 5.7. For arbitrary pdfs on a unimodular Lie group G and arbitrary g1, g2 [set membership] G,

ρ1ρ2ρ2#ρ1#Lg1ρ1Rg2ρ2Cg1ρ1Cg1ρ2,

however, entropy satisfies the following equalities

S(ρ1ρ2)=S(ρ2#ρ1#)=S(Lg1ρ1Rg2ρ2)=S(Cg1ρ1Cg1ρ2).
(42)

Proof. Each equality is proven by changing variables and using the unimodularity properties in (1) and (2).

(ρ2#ρ1#)(g)=Gρ2#(h)ρ1#(h1g)dh=Gρ2(h1)ρ1(g1h)dh=Gρ1(g1k1)ρ2(k)dk=(ρ1ρ2)(g1)=(ρ1ρ2)#(g).

Let F[ρ] = −ρlogρ. Then due to (2), the integral over G of F[ρ(g−1)] must be the same as F[ρ(g)], proving the first equality in (42). The second equality follows from the fact that (Lg1ρ1*Rg2ρ2)(g) = (ρ1*ρ2)(g1gg2) and the integral of F[ρ(g1gg2)] must be the same as F[ρ(g)], again due to (1) and (2). The final equality follows in a similar way from the fact that (Cg1ρ1Cg1ρ2)(g)=(ρ1ρ2)(g11gg1).

Note that the equalities in (42) can be combined. For example,

S(ρ1ρ2)=S(Lg1ρ2#Rg2ρ1#)=S(Cg1ρ2#Cg1ρ1#).

Theorem 5.8. The equality S(ρ1 * ρ2) = S(ρ2 * ρ1) holds for pdfs ρ1(g) and ρ2(g) on a unimodular Lie group G in the following cases: (a) ρi(g) for i = 1 or i = 2 is a class function; (b) ρi(g) for i = 1, 2 are both symmetric functions.

Proof. Statement (a) follows from the fact that if either ρ1 or ρ2 is a class function, then convolutions commute. Statement (b) follows from the first equality in (42) and the definition of a symmetric function.

Theorem 5.9. Given class functions χ1(g) and χ2(g) that are pdfs, then for general g1, g2 [set membership] G,

(χ1χ2)(g)(Lg1χ1Lg2χ2)(g)(Rg1χ1Rg2χ2)(g)(Rg1χ1Lg2χ2)(g)

and yet

S(χ1χ2)=S(Lg1χ1Lg2χ2)=S(Rg1χ1Rg2χ2)=S(Rg1χ1Lg2χ2).
(43)

Proof. Here the first and final equality will be proven. The middle one follows in the same way.

(Lg1χ1Lg2χ2)(g)=G(Lg1χ1)(h)(Lg2χ2)(h1g)dh=Gχ1(g11h)χ2(g21h1g)dh=Gχ1(k)χ2(g21k1g11g)dk=Gχ1(k)χ2(k1g11gg21)dk=(χ1χ2)(g11gg21).

Similarly,

(Rg1χ1Lg2χ2)(g)=G(Rg1χ1)(h)(Lg2χ2)(h1g)dh=Gχ1(hg1)χ2(g21h1g)dh=Gχ1(k)χ2(g21g1k1g)dk=Gχ1(k)χ2(k1gg21g1)dk=(χ1χ2)(gg21g1).

Then, since the entropy integral on a unimodular Lie group is invariant under shifts, the equalities in (43) follow.

6. Fisher information and diffusions on Lie groups

The natural extension of the Fisher information matrix for the case when f(g;θ) is a parametric distribution on a Lie group is

Fij(f,θ)=G1ffθifθjdg.
(44)

In the case when θ parameterizes G as g(θ) = exp(∑iθiXi) and f(g, θ) = f(g ○ exp(∑iθiXi)), then

fθiθ=0=X~irf

and Fij(f, 0) becomes

Fijr(f)=G1f(X~irf)(X~jrf)dg.
(45)

In a similar way, we can define

Fijl(f)=G1f(X~ilf)(X~jlf)dg.
(46)

Theorem 6.1. The matrices with elements defined in (45) and (46) have the properties

Fijr(L(h)f)=Fijr(f)andFijl(R(h)f)=Fijl(f)
(47)

and

Fijr(I(f))=Fijl(f)andFijl(I(f))=Fijr(f)
(48)

where (L(h)f)(g) = f(h−1g), (R(h)f)(g) = f(gh), and I(f)(g) = f(g−1).

Proof. The operators X~il and R(h) commute, and likewise X~ir and L(h) commute. This together with the invariance of integration under shifts proves (47). From the definitions of X~il and X~ir in (11), it follows that

X~ir(I(f))(g)=(ddtf([gexp(tXi)]1))t=0=(ddtf(exp(tXi)g1))t=0=(X~ilf)(g1).

Using the invariance of integration under shifts then gives (48).

As a special case, when f(g) is a symmetric function, the left and right Fisher information matrices will be the same.

Note that the entries of Fisher matrices Fijr(f) and Fijl(f) implicitly depend on the choice of orthonormal Lie algebra basis {Xi}, and so it would be more descriptive to use the notation Fijr(f,X) and Fijl(f,X).

If a different orthonormal basis {Yi} is used, such that Xi = ∑k aikYk, then the orthonormality of both {Xi} and {Yi} forces A = [aij] to be an orthogonal matrix. Furthermore, the linearity of the Lie derivative,

X~rf=ixiX~irfwhereX=ixiXi,

means that

Fijr(f,X)=G1f(kaikY~krf)(iajlY~lrf)dg=k,laikajlFklr(f,Y).

The same holds for Fijl. Summarizing these results in matrix form:

Fr(f,X)=AFr(f,Y)ATandFl(f,X)=AFl(f,Y)AT
(49)

where eiTAej=(Xi,Yj). This means that the eigenvalues of the Fisher information matrix (and therefore its trace) are invariant under change of orthonormal basis.

Note that it follows immediately from (48) that the left and right Fisher information matrices are equal for class functions (i.e., functions satisfying the condition f(gh) = f(hg) for all h, g [set membership] G) and for symmetric functions (i.e., functions satisfying the condition f(g) = f(g−1) for all g [set membership] G). But in general the left and right Fisher information matrices are not equal. Even the traces of the left and right Fisher information matrices for arbitrary pdfs on a unimodular Lie group will be different from each other in the general case.

6.1. Fisher information and convolution on groups

The decrease of Fisher information as a result of convolution can be studied in much the same way as for pdfs on Euclidean space. Two approaches are taken here. First, a straightforward application of the Cauchy-Bunyakovsky-Schwarz (CBS) inequality is used together with the bi-invariance of the integral over a unimodular Lie group to produce a bound on the Fisher information of the convolution of two probability densities. Then, a tighter bound is obtained using the concept of conditional expectation in the special case when the pdfs commute under convolution.

Theorem 6.2. The following inequalities hold for the diagonal entries of the left and right Fisher information matrices:

Fiir(f1f2)min{Fiir(f1),Fiir(f2)}andFiil(f1f2)min{Fiil(f1),Fiil(f2)}.
(50)

Proof. The CBS inequality holds for groups:

(Ga(g)b(g)dg)2Ga2(g)dgGb2(g)dg.

If a(g) ≥ 0 for all values of g, then it is possible to define j(g) = [a(g)]½ and k(g) = [a(g)]½ b(g), and since j(g)k(g) = a(g)b(g),

(Ga(g)b(g)dt)2(Gj2(g)dg)(Gk2(t)dg)=(Ga(g)dg)(Ga(g)[b(g)]2dg).
(51)

Using this version of the CBS inequality, and letting b(g)=X~irf2(h1g)[f2(h1g)] and a(g) = f1(h)f2(h−1g), essentially the same manipulations as in [12] can be used, with the roles of f1 and f2 interchanged due to the fact that in general for convolution on a Lie group (f1 * f2)(g) ≠ (f2 * f1)(g):

Fiir(f1f2)=G(G[X~irf2(h1g)f2(h1g)][f2(h1g)f1(h)]dh)2(f1f2)(g)dgG(G{[X~irf2(h1g)]2f2(h1g)}f1(h)dh)dg=G(G{[X~irf2(h1g)]2f2(h1g)}dg)f1(h)dh=Fiir(f2)Gf1(h)dh=Fiir(f2)

Since for a unimodular Lie group it is possible to perform changes of variables and inversion of the variable of integration without affecting the value of an integral, the convolution can be written in the following equivalent ways,

(f1f2)(g)=Gf1(h)f2(h1g)dh
(52)

=Gf1(gh1)f2(h)dh
(53)

=Gf1(gh)f2(h1)dh
(54)

=Gf1(h1)f2(hg)dh
(55)

It then follows that using (53) and the bi-invariance of integration that (50) holds.

6.1.1. Bounds using conditional expectation and commuting PDFs

In this subsection a better inequality is derived.

Theorem 6.3. The following inequality holds for the right and left Fisher information matrices:

tr[Fr(ρ1ρ2)P]tr[Fr(ρ2)P]andtr[Fl(ρ1ρ2)P]tr[Fl(ρ1)P]
(56)

where i = 1, 2 and P is an arbitrary symmetric positive definite matrix with the same dimensions as F.

Proof. Let

f12(h,g)=ρ1(h)ρ2(h1g).

Then f12(h, g) is a pdf on G × G with marginal densities

f1(h)=Gf12(h,g)dg=ρ1(h)andf2(g)=Gf12(h,g)dh=(ρ1ρ2)(g).

It follows that

(X~irf2)(g)=Gρ1(h)X~irρ2(h1g)dh.

Then by the change of variables k = h−1g,

(X~irf2)(g)=Gρ1(gk1)X~irρ2(k)dk.

This means that

(X~irf2)(g)f2(g)=G(X~irρ2)(k)ρ2(k)ρ1(gk1)ρ2(k)f2(g)dk=(X~irρ2)(k)ρ2(k)g,
(57)

where [left angle bracket]·[mid ]g[right angle bracket] denotes conditional expectation. This notation, which is standard in the literature, includes the functional dependence of whatever is in the place of “·” even though this is integrated out and no longer exists [20, 38, 42].

Therefore, using this notation

Fiir(f2)=((X~irρ2)(g)f2(g))2=(X~irρ2)(k)ρ2(k)g2((X~irρ2)(k)ρ2(k))2g=((X~irρ2)(k)ρ2(k))2=Fiir(ρ2).

An analogous argument using f12(h, g) = ρ1(gh−1)ρ2(h) and f2(g) = (ρ1*ρ2)(g) shows that

(X~ilf2)(g)f2(g)=(X~ilρ1)(k)ρ1(k)g
(58)

and

Fiil(f2)Fiil(ρ1).

The above results can be written concisely by introducing an arbitrary positive definite diagonal matrix ∧ as follows:

tr[Fr(ρ1ρ2)Λ]tr[Fr(ρ2)Λ]andtr[Fl(ρ1ρ2)Λ]tr[Fl(ρ1)Λ].

If this is true in one basis, then using (49) the more general statement in (56) must follow in another basis where P = PT > 0. Since the initial choice of basis is arbitrary, (56) must hold in every basis for an arbitrary positive definite matrix P. This completes the proof.

In some instances, even though the group is not commutative, the functions ρ1 and ρ2 will commute. For example, if ρ(gh) = ρ(hg) for all h, g [set membership] G, then (ρ*ρi)(g) = (ρi *ρ)(g) for any reasonable choice of ρi(g). Or if ρ2 = ρ1 * ρ1 * (...) ρ1 it will clearly be the case that ρ1 *ρ2 = ρ2 *ρ1. If, for whatever reason, ρ1 *ρ2 = ρ2 *ρ1 then (56) can be rewritten in the following form:

tr[Fr(ρ1ρ2)P]min{tr[Fr(ρ1)P],tr[Fr(ρ2)P]}andtr[Fl(ρ1ρ2)P]min{tr[Fl(ρ1)P],tr[Fl(ρ2)P]}
(59)

Theorem 6.4. When ρ1 * ρ2 = ρ2 * ρ1 the following inequality holds

2tr[Fr(ρ1ρ2)P]1tr[Fr(ρ1)P]+1tr[Fr(ρ2)P]for manyP=PT>0,
(60)

and likewise for Fl.

Proof. Returning to (57) and (58), in the case when ρ1 * ρ2 = ρ2 * ρ1 it is possible to write

(X~irf2)(g)f2(g)=(X~irρ2)(k)ρ2(k)g=(X~irρ1)(k)ρ1(k)g
(61)

and

(X~ilf2)(g)f2(g)=(X~ilρ1)(k)ρ1(k)g=(X~ilρ2)(k)ρ2(k)g.

Since the following calculation works the same way for both the ‘l’ and ‘r’ cases, consider only the ‘r’ case for now. Multiplying the first equality in (61) by 1 – β and the second by β, and adding together gives3:

(X~irf2)(g)f2(g)=β(X~irρ1)(k)ρ1(k)g+(1β)(X~irρ2)(k)ρ2(k)g)

for arbitrary value of β.

Now squaring both sides gives

[(X~irf2)(g)f2(g)]2=[β(X~irρ1)(k)ρ1(k)g+(1β)(X~irρ2)(k)ρ2(k)g]2.

Taking the (unconditional) expectation, and using Jensen’s inequality yields:

((X~irf2)(g)f2(g))2=[β(X~irρ1)(k)ρ1(k)g+(1β)(X~irρ2)(k)ρ2(k)g]2β2((X~irρ1)(k)ρ1(k))2+(1β)2((X~irρ2)(k)ρ2(k))2+2β(1β)(X~irρ1)(k)ρ1(k)g(X~irρ2)(k)ρ2(k)g.
(62)

But observing (61), moving the rightmost term to the left, and writing 1–2β(1 – β) as (1 – β)2 + β2 reduces (62) to

[(1β)2+β2]Fiir(ρ1ρ2)β2Fiir(ρ1)+(1β)2Fiir(ρ2).
(63)

Dividing both sides by [(1 – β)2 + β2], multiplying by λi ≥ 0, and summing over i gives

tr[ΛFr(ρ1ρ2)]β2[(1β)2+β2]tr[ΛFr(ρ1)]+(1β)2[(1β)2+β2]tr[ΛFr(ρ2)]
(64)

where [subset, dbl equals] = diag(λ1, …, λn).

Clearly,

0β2[(1β)2+β2],(1β)2[(1β)2+β2]1.

Choosing

β2[(1β)2+β2]=tr[ΛFr(ρ2)]tr[ΛFr(ρ1)]+tr[ΛFr(ρ2)]

and

(1β)2[(1β)2+β2]=tr[ΛFr(ρ1)]tr[ΛFr(ρ1)]+tr[ΛFr(ρ2)]

then gives

tr[ΛFr(ρ1ρ2)]2tr[ΛFr(ρ1)]tr[ΛFr(ρ2)]tr[ΛFr(ρ1)]+tr[ΛFr(ρ2)]

This can be written as

1tr[ΛFr(ρ1)]=1tr[ΛFr(ρ2)]2tr[ΛFr(ρ1ρ2)].
(65)

Again, since the basis is arbitrary, [subset, dbl equals] can be replaced with P, resulting in (60).

Note that in the classical (Abelian) version of this equality, it is possible to get the stronger condition without the factor of 2 in (60).

6.1.2. A Special Case: SO(3)

Consider the group of 3 × 3 orthogonal matrices with determinant +1. Let X~r=[X~1r+X~2r+X~3r]T and X~l=[X~1l+X~2l+X~3l]T. These two gradient vectors are related to each other by an adjoint matrix, which for this group is a rotation matrix [19]. Therefore, in the case when G = SO(3),

x~rf2=x~lf2tr[Fr(f)]=[Fl(f)]

Therefore, the inequalities in (59) will hold for pdfs on SO(3) regardless of whether or not the functions commute under convolution, but restricted to the condition P = I.

7. Generalizing the de Bruijn identity to Lie groups

This section generalizes the de Bruijn identity, in which entropy rates are related to Fisher information.

Theorem 7.1. Let fD,h,t(g) = f(g, t; D, h) denote the solution of the diffusion equation (18) with constant h subject to the initial condition f(g, 0; D, h) = δ(g). Then for any well-behaved pdf α(g),

ddtS(αfD,h,t)=12tr[DFr(αfD,h,t)].
(66)

Proof. It is easy to see that the solution of the diffusion equation

ρt=12i,j=1nDijX~irX~jrρk=1nhkX~krρ
(67)

subject to the initial conditions ρ(g, 0) = α(g) is simply ρ(g, t) = (α*fD,h,t)(g). This follows because all derivatives “pass through” the convolution integral for ρ(g, t) and act on fD,h,t(g).

Taking the time derivative of S(ρ(g, t)) we get

ddtS(ρ)=ddtGρ(g,t)logρ(g,t)dg=G{ρtlogρ+ρt}dg.
(68)

Using (67), the partial with respect to time can be replaced with Lie derivatives. But

GX~krρdg=GX~irX~jrρdg=0,

so the second term on the right side of (68) completely disappears. Using the integration-by-parts formula4

Gf1X~krf2dg=Gf2X~krf1dg,

with f1 = log ρ and f2 = ρ then gives

ddtS(αfD,h,t)=12i,j=1nDijG1αfD,h,tX~jr(αfD,h,t)X~ir(αfD,h,t)dg=12i,j=1nDijFijr(αfD,h,t)=12tr[DFr(αfD,h,t)].

The implication of this is that

S(αfD,h,t2)S(αfD,h,t1)=12t1t2tr[DFr(αfD,h,t)]dt

8. Information-theoretic inequalities from Log-Sobolev inequalities

In this section information-theoretic identities are derived from Log-Sobolev inequalities. Subsection 8.1 provides a brief review of Log-Sobolev inequalities. Subsection 8.2 then uses these to write information-theoretic inequalities.

8.1. Log-Sobolev inequalities in Rn and on Lie groups

The log-Sobolev inequality can be stated as [5, 6, 46]:

Rnψ(x)2logψ(x)2dxn2log[2πenRnψ2dx]
(69)

where

ψ=[ψx1,,ψxn]TandRnψ(x)2dx=1.

Here log = loge. Actually, there is a whole family of log-Sobolev inequalities, and (69) represents the tightest of these. The original form of the log-Sobolev inequality as introduced by Gross in [32] is

12Rnϕ(x)2logϕ(x)2ρ(x)dxRnϕ(x)2ρ(x)dx+ϕL2(Rn,ρ)2logϕL2(Rn,ρ)2
(70)

where

ϕL2(Rn,ρ)2=Rnϕ(x)2ρ(x)dx.

Here ρ(x) = ρ(x, 0) = (2π)n/2 exp(–||x||2/2) is the solution of the heat equation on Rn evaluated at t = 1.

Several different variations exist. For example, by rescaling, it is possible to rewrite (70) with ρ(x, t) in place of ρ(x) by introducing a multiplicative factor of t in the first term on the right hand side of the equation. Or, by letting ϕ(x) = ρ−½ (x)ψ(x/a) for some scaling factor a > 0, substituting into (70), and integrating by parts then gives [46]

Rnψ(x)2logψ(x)2ψ22dx+n(1+loga)ψ22a2πRnψ(x)2dx

where

ψ22=Rnψ(x)2dxandψ(x)2=ψ(x)ψ(x).

This, together with an optimization over a gives (69).

Gross subsequently extended (70) to Lie groups [33] as

G{ϕ(g)2logϕ(g)}ρ(g,t)dgcG(t)G(X~ϕ)(g)2ρ(g,t)dg+ϕL2(G,ρt)2logϕL2(G,ρt)2
(71)

where ρ(g, t) is the solution of the diffusion equation in (67) with hi = 0, Dij = δij, initial condition ρ(g, 0) = δ(g), and

X~ϕ=[X~1rϕ,,X~nrψ]TandϕL2(G,ρt)2=Gϕ(g)2ρ(g,t)dg.

In (71) the scalar function cG(t) depends on the particular group. For G=(Rn,+) we have cRn(t)=t, and likewise cSO(n)(t) = t.

In analogy with the way that (69) evolved from (70), a descendent of (71) for noncompact unimodular Lie groups is [2, 5, 6]

Gψ(g)2logψ(g)2dgn2log[2CGπenGX~ψ2dg]
(72)

The only difference is that, to the author’s knowledge, the sharp factor CG in this expression is not known for most Lie groups. The information-theoretic interpretation of these inequalities is provided in the following subsection.

8.2. Information-theoretic inequalities

For our purposes the form in (69) will be most useful. It is interesting to note in passing that Beckner has extended this inequality to the case where the domain, rather than being Rn, is the hyperbolic space H2SL(2,R)SO(2) and the Heisenberg groups H(n), including H(1) [5, 6]. Our goal here is to provide an information-theoretic interpretation of the inequalities from the previous section.

Theorem 8.1. Entropy powers and Fisher information are related as

[N(f)]11ntr(F)whereN(f)=CG2πeexp[2nS(f)].
(73)

Proof. We begin by proving (73) for G=(Rn,+). Making the simple substitution f(x) = [mid ]ψ(x)[mid ]2 into (69) and requiring that f(x) be a pdf gives

Rnf(x)logf(x)dxn2log[12πenRn1ff2dx].

or

S(f)n2logtr(F)2πenexp[2nS(f)]tr(F)2πen[N(f)]11ntr(F).
(74)

Here S(f) is the Boltzmann-Shannon entropy of f and F is the Fisher information matrix. As is customary in information theory, the entropy power can be defined as N(f) in (73) with CG = 1. Then the log-Sobolev inequality in the form in (74) is written as (73).

For the more general case, starting with (72) and letting f(g) = [mid ]ψ(g)[mid ]2 gives

Gf(g)logf(g)dgn2log[CG2πenG1fX~2]dgSn2log[CG2πentr(F)]
(75)

The rest is the same as for the case of Rn.

Starting with Gross’s original form of log-Sobolev inequalities involving the heat kernel, the following information-theoretic inequality results:

Theorem 8.2. The Kullback-Leibler divergence and Fisher-Information distance of any arbitrary pdf and the heat kernel are related as

DKL(fρt)cG(t)2DFI(fρt)
(76)

where in general given f1(g) and f2(g),

DFI(f1f2)=G1f1X~f11f2X~f22f1dg.
(77)

Proof. Starting with (71), let ψ(g, t) = [ρ(g, t)]−½ [f(g)]½ where f(g) is a pdf. Then

Gψ(g,t)2ρ(g,t)dg=Gf(g)dg=1

and so logϕL2(G,ρt)2=0, and we have

12Gf(g)logf(g)ρ(g,t)dgGX~([ρ(g,t)]12[f(g)]12)2ρ(g,t)dg.

By using the chain rule and product rule for differentiation,

X~([ρ(g,t)]12[f(g)]12)=12f12X~f12f12ρt1X~ρt.

Substititution into the right hand side of (71) then gives (76).

In the functional analysis community several connections between log-Sobolev inequalities on Rn and information theory have emerged. For example, Carlen [14] addresses Theorem 7.1 for the case of G=Rn. Ledoux [43, 44], Dembo [27], Talagrand [70], and Otto and Villani [56] addresses the connection between entropy and gradients of pdfs in the context of so-called “concentration of measure” phenomena related to logarithmic Sobolev inequalities. However, these studies are not usually concerned with the Lie-group setting. Moreover, the author has not found analogs of (74) in the context of Lie groups in the literature.

9. Covariance, the Cramér-Rao Bound, and maximum entropy distributions on unimodular Lie groups

Given a pdf fN(G) that is in addition unimodal and decays rapidly from its mode (in the precise sense described in [69]), its mean is defined here as the point μ [set membership] G such that [79]

G(logg)f(μg)dg=0.
(78)

Unlike in Rn, in which a mean can be computed for any pdf, in the Lie-group setting it is important to restrict the class of pdfs for the concept of mean to make sense. If not for such restrictions, the usefulness of the concept of the mean would diminish. For example, for the uniform distribution on SO(3) every point could be called a mean.

The covariance of a concentrated probability density centered around μ can be defined as [79]

Σ=G(logg)[(logg)]Tf(μg)dg.
(79)

This matrix will have finite values when f(g) is rapidly decreasing. Note that this concept of covariance differs from those presented in [31, 36], which are more akin to the dispersion defined in Theorem 5.2. The definitions in (78) and (79) are used in the theorem below.

9.1. The Cramér-Rao bound

Theorem 9.1. Let ρ(g)N(G) be a pdf with the additional symmetry condition ρ(g) = ρ(g−1) and set f(g; μ) = ρ(μ−1g). Given an unbiased estimator of μ, then the Cramér-Rao bound

ΣF1
(80)

holds for sufficiently small ||||, where ∑ and F are defined in (79) and (44) and the above matrix inequality is interpreted as each eigenvalue of ∑ – F−1 being non-negative.

Proof. For a symmetric pdf, ρ(g) = ρ(g−1), the mean is at the identity, and so

G(logg)ρ(g)dg=0.
(81)

The invariance of integration under shifts then gives

ϕ(μ)=G(log(μ1g))ρ(μ1g)dg=0.
(82)

Applying the derivatives X~ir gives an expression for X~irϕ(μ)=0 that can be expanded under the integral using the product rule X~ir(ab)=(X~ira)b+a(X~irb) where in the present case a = (log(μ−1g))[logical or] and b = ρ(μ−1g). Note that when ρ(·) is highly concentrated, the only values of g that significantly contribute to the integral are those for which μ−1ge. By definition

X~ir(log(μ1g))=ddtlog(((μetXi)1g))t=0=ddt[log(etXiμ1g)]t=0.

Using the Baker-Campbell-Hausdorff formula

log(eXeY)X+Y+12[X,Y]

with X = −tXi and Y = log(μ−1g) together with the fact that μ−1ge then gives

G[X~ir(log(μ1g))]ρ(μ1g)dgei.
(83)

The second term in the expansion of X~irϕ(μ) is

G[log(μ1g)]ρ(etXiμ1g)dgt=0=G[logh]ρ(etXih)dht=0

where the change of variables h = μ−1g has been made. Using the symmetry of ρ gives ρ(etXih) = ρ(h−1etXi), and making the change of variables hk−1 then reduces this term to (logk1)(X~irρ)(k)dk. Letting i = 1, 2, 3, and recombining all of the parts means that X~irϕ(μ)=0 can be written in the form ∫G ai(k)bj(k)dk = δij where ai(k) = [ρ(k)]½(log k−1)[logical or] · ei and bj(k)=[ρ(k)]12X~jr[logρ(k)]. Then, as in the proof of the classical Cramér-Rao bound, using the Cauchy-Schwarz inequality gives the result in (80).

9.2. Maximum entropy distributions

Recall that the Gaussian distribution on Rn has a number of remarkable properties including: (1) it is closed under the operation of convolution; (2) it solves a linear diffusion equation with constant coefficients; (3) it is the maximum entropy distributution subject to constraints on the mean and covariance. A natural question to ask is whether such a distribution exists on unimodular Lie groups. With regard to (1) and (2), the answer is certainly yes, and this kind of Gaussian distribution appeared as the solution of (18) subject to Dirac delta initial conditions. However, this is not necessarily the maximum entropy distribution subject to covariance constraints.

Equipped with a concept of mean and covariance, the concept of a maximum entropy distribution on a unimodular Lie group subject to constraints on the mean and covariance can be defined and computed in the usual way using Lagrange multipliers, and the result is of the form

ρ(g;μ,Σ)=1c(μ,Σ)exp(12[log(μ1g)]Σ1[log(μ1g)])
(84)

where

c(μ,Σ)=Gexp(12[log(μ1g)]Σ1[log(μ1g)])dg.

Such distributions have been studied in [80, 79] in the context of how the covariance of convolved Gaussians can be obtained from the covariances of those being convolved. As |||| becomes small, ρ(g; e, ∑) converges to the solution of a driftless diffusion with Dirac-delta initial conditions at the identity, and ∑ = tD. In this case exponential coordinates g = exp X become Cartesian coordinates near the identity and dgdx (the Lebesgue measure) by identifying the Lie algebra with Rn. In this limit the usual Gaussian on Rn results, c(μ, ∑) = (2π)n/2[mid ][mid ]1/2, and S(ρ) = log(2πe)n/2 [mid ][mid ]1/2.

However, as |||| (or tD) becomes larger, then the concepts of Gaussians on Lie groups as solutions to diffusion equations and as maximum entropy distributions become inconsistent with each other. Each of these concepts of Gaussian distribution has its advantages in different scenarios, and both are used in the following section.

10. An application: Sensor fusion in mobile robotics

Assume that the nonholonomic kinematic-cart robot with two independently actuated wheels shown in Figure 1 moves in a square room of known size. Relative to a frame of reference fixed in the room, the velocity of a frame of reference fixed in the robot is a function of the wheel speeds. The equations for this system are well-known. See for example [13, 55, 84]. The reference frame that results from numerically integrating these nonholonomic governing equations can be thought of as the time-dependent rigid-body motion g(t) = g(x(t), y(t), θ(t)) [set membership] SE(2). If the robot’s motion could be observed with infinite precision, then g(t) would be known for all times t ≥ 0. But, of course, infinitely precise observations are not possible. And the question becomes one of estimating g(t) given whatever sensory data is available. Such problems have become popular over the past decade, as reviewed in [54, 72].

Figure 1
A Kinematic Cart with an Uncertain Future Position and Orientation

In the present example, it is assumed that two noisy sensing modalities are present on the robot: (1) the wheel speeds can be measured (or calculated from wheel angle measurements) with sensors co-located with the motors resulting; (2) two range sensors fixed on the robot (one in front and one in back) point directly forward and behind the robot along its symmetry axis, and measure the distance to the walls ahead of and behind the robot. The scenario is that the robot starts at the center of the room with known initial heading, θ = 0, estimates g(t) from odometry for t [set membership] [0, T], and then switches on its range sensors at time T. Given models for the noise in both sensing modalities, how should these measurements be pooled to obtain the best estimate of the robot’s current pose ? And how can the quality of this estimate be compared with the estimates obtained from each individual sensing modality ? Several of the theorems derived earlier in this paper can be used to address these problems.

This is the question of fusing odometric (or “dead-reckoning”) data with range data. In odometry the nonholonomic kinematic equations of motion are numerically integrated given knowledge of the wheel speeds as a function of time. However such measurements are noisy, and so the path that the robot actually takes diverges from the predicted one one as time increases. The result is a conditional probability density f(gactgodo)N(SE(2)×SE(2)) where gact denotes the actual pose of the robot and godo denotes the pose predicted by odometry. Similarly, if the actual pose of the robot (which is unknown to the robot) is gact, and the distance/range range sensors take measurements from that pose, they will generate noisy distance measurements that can be used together with elementary geometry and trigonometry to develop a cloud of reference frames. If the sensors are accurate, this cloud will be tightly clustered around gact. The resulting histogram normalized to be a pdf in its first argument for any fixed value of gact [set membership] SE(2) will be f(gdis [mid ] gact) where gdis is the pose of the robot predicted from distance measurements. The peak value of gdis need not coincide exactly with gact. Any discrepancy between the peak value of gdis for a given gact represents bias in the range sensors. Models for f(gact [mid ] godo) and f(gdis [mid ] gact) are developed in the following subsections and then fused to obtain f(gact [mid ] godo, gdis), the estimate of the actual robot position and orientation given measurements from both odometry and distance sensors.

10.1. Dead Reckoning as a diffusion process on SE(2)

Let the two wheels each have radii r, and let the wheelbase be denoted as L. Imagine that the angles through which the wheels turn around their axes are governed by stochastic differential equations of the form

dϕ1=ωdt+Ddw1
(85)

dϕ2=ωdt+Ddw2
(86)

where dwi each represent uncorrelated unit-strength white noise, D scales the strength of the noise, and the constant rate of rotation of both wheels would be ω if there were no noise, in which case the robot would move straight forward and godo = g(ωT, 0, 0). In contrast, when there is noise the trajectories will look like those depicted in Figure 1. In this case a single stochastic trajectory for g(t) is defined by stochastic differential equations (SDEs) of the form [84]

(dxdydθ)=(rωcosθrωsinθ0)dt+D(r2cosθr2cosθr2sinθr2sinθrLrL)(dw1dw2).
(87)

Usually to be precise one must specify whether an SDE is of Itô or Stratonovich type. In this example both interpretations yield the same equation and this distinction is unimportant. But for the sake of definiteness, take (87) to be an Itô equation.

If such an equation is simulated many times, each time starting from the same initial conditions (say, x = y = θ = 0), then a function, f(x, y, θ; t) that records the distribution of positions and orientations of the cart at the same value of time, t, in each trajectory can be defined. (This pdf also depends on r, L, ω, and D, but the dependence on these constants is suppressed.)

As explained in detail in [84, 20], a well-developed theory for linking stochastic differential equations such as (87) to functions such as f(x, y, θ; t) exists. This theory produces a Fokker-Planck equation for f(x, y, θ; t). In the present context, this equation is of the form [84]

ft=rωcosθfxrωsinθfy+D2(r22cos2θ2fx2+r22sin2θ2fxy+r22sin2θ2fy2+2r2L22fθ2),

which is subject to the initial conditions f(x, y, θ; 0) = δ(x – 0)δ(y – 0)δ(θ – 0).

There is a coordinate-free way of writing (87) and the above equation. Namely,

(g1dgdt)dt=rωe1dt+rD2(11002L2L)dw.

The coordinate-free version of the Fokker-Planck equation above can be written compactly in terms of these Lie derivatives as [84]

ft=rωX~1f+r2D4(X~1)2f+r2DL2(X~3)2f
(88)

with initial conditions f(g; 0) = δ(g). The resulting time-evolving pdf is denoted as f(g; t), or with the shorthand ft(g).

While effcient techniques for solving this sort of equation exist for both the long-time and short time cases (see e.g. [84, 79] and references therein), the emphasis in the current paper is not solution techniques, but rather an assessment of how pose information can be obtained from (88) directly.

If the robot continues to move for an additional amount of time, t2, then the distribution will be updated as a convolution over SE(2) of the form

ft1+t2(g)=(ft1ft2)(g).
(89)

That is, solutions to (88), or more generally (18)), subject to Dirac delta initial conditions form a commutative semigroup under the operation of group convolution.

The function f(g; T) generated by solving (88) is in fact f(gact [mid ] godo) because the dead-reckoning path for a robot with fixed L, r, and D, is completely defined by ω and T, which in turn define the shape of f(g; T)

10.2. A maximum entropy model for Range-Finder measurements

In the scenario described at the beginning of this section, the robot is equipped with two range sensors arranged with their beams pointing in diametrically opposed directions on the x-axis passing through the center of the robot in Figure 1. If the range sensors could measure the distance to the walls exactly, and if the robot behaved exactly as the nonholonomic kinematic cart, then the robot could spin around its z axis through 180 degrees, generate an exact map of the environment and exactly know its location modulo symmetries. It is these symmetries that make the discussions of discrete groups, cosets, and double cosets in Sections 5.2 and 5.3 relevant to this application. For a robot that has 180-degree rotational symmetry around its z-axis, there is no distinction between gact = g(x, y, θ) and g(x, y, θ + π) = gactg(0, 0, π). This means that no matter what shape room the robot is in, it will localize in a bounded subset of the coset space SE(2)/C2 (where Cn is the cyclic group, isomorphic to the group of rotational symmetry operations of the regular n-gon, and the boundaries result from the walls of the room). Similarly, for a robot in a square room with no landmarks, there is no distinction between range sensor data when the robot is at the pose gact and when it is at g(0, 0, /2) ○ gact for any k [set membership] {0, 1, 2, 3}. This is true regardless of how the sensors are arranged on the robot. Therefore, the localization problem for a robot in a square room is one of finding a point in the coset space C4\SE(2). Now if the robot and its range finders have C2 symmetry and the room is square, then the localization problem becomes one in a bounded subset of the double coset space C4\SE(2)/C2.

Suppose that the robot is placed at the pose gact where it remains while its front and back range sensors take a stream of synchronized distance measurements. A natural estimator for the pairs of distances of the front and back of the robot to the walls that the two beams hit results from simply computing the sample mean and covariance of pairs of distances. A roboticist might then define a bivariate Gaussian distribution with mean and variance given by the sample mean and sample variance of these two sets of measured distances. This is an implicit use of the maximum entropy principle, i.e., with all other things being unknown, choose the maximum entropy distribution with specified mean and covariance, which is the Gaussian. The result is f(gdis [mid ] gact), a distribution of measured distances (which can be converted to a distribution of reference frames) when gact is specified (though not known to the robot).

Taking a stream of measurements when the robot is at a fixed pose certainly does not o er as much information as when it does a full sweep. But since it is implicit in this scenario that the geometry of the robot and the room are both known in advance, the above situation can be turned around. If we assume that gact is not known, then we can ask for each pair of measured front/back distances what all possible poses of the robot can be in order for such distance measurements to be replicated, under the constraint that the robot fits in the room without penetrating any walls. From this, a maximum entropy distribution f(gact [mid ] dfront, dback) can be constructed where dfront and dback are respectively the measured distances to walls in front and behind the robot.

In summary, an error model for distance measurements is presented that takes into account known symmetries of the robot and the room. Even if range finders could measure distance perfectly, due to symmetries, they would not be sufficient to localize in the room, but only to within a bounded region in the double coset described above. The odometry data, as imperfect as it is, provides a means to resolve the ambiguity in the range measurements. And information theory on groups provides a language in which to address questions such as: How much information is being provided by the range sensors vs. odometry ? And, how much improvement is there in estimates of gact when data is pooled vs. using each sensor modality independently ? These questions can only be answered using the theorems presented previously in this paper after a strategy for pooling the measurements is articulated. One such strategy is the Baysian filter (see [72] and references therein), a version of which is described in the following subsection.

10.3. Sensor fusion

One form of Bayes’ rule, which holds for probabilities and pdfs alike, is

P(AB,C)=p(BA,C)p(AC)p(BC).

Taking A = gact, B = gdis, and C = godo and using the pdfs generated in the previous sections gives

f(gactgdis,godo)=f(gdisgact,godo)f(gactgodo)f(gdisgodo).

If gact is known, then knowledge of godo adds nothing. Therefore, in this application f(gdis [mid ] gact, godo) = f(gdis [mid ] gact). This is analogous to how conditioning on states at all prior times reduces to conditioning on the immediate predecessor for a Markov process. Therefore, the sensor fusion equation becomes

f(gactgdis,godo)=f(gdisgact)f(gactgodo)f(gdisgodo).
(90)

Each term on the right-hand side of this equation can be evaluated using the individual sensor models presented in the previous two subsections.

If nothing were known other than the speed of the robot, ω, and the duration of travel, T, and if rωT is less than the distance to the nearest wall, then the robot position will be constrained to be within a circular disk of radius rωT. If its orientation is completely unknown, then an upper bound on the entropy of f(gact) (without conditioning on any odometry data) can be computed from the maximum entropy distribution on SE(2) that is independent of θ, constant on this circular disk in the x-y plane, and zero for x2+y2>rωT. The entropy associated with this distribution must necessarily be greater than that obtained when sensor measurements are obtained:

S(fmaxent(gact))S(f(gactgodo))S(f(gactgdis,godo)).

This follows from the general information-theoretic property that conditioning reduces entropy, and does not use any Lie-group properties. In contrast, inequalities that directly use the results of the theorems presented earlier in this paper are reviewed in the following subsection.

10.4. Application of the theorems

The theorems presented earlier in the paper describe relationships between several measures of dispersion for pdfs on Lie groups including entropy, covariance, and the (inverse) Fisher information matrix. The reason for performing the sensor fusion described above is to reduce the dispersion of the resulting pdf in the variable gact [set membership] SE(2) as compared to the pdfs obtained from individual sensor modalities. This is now verified using (90) where gdis and godo are fixed values and gact is the unknown to be estimated. Below, examples of the applicability of the theorems presented earlier are given.

Example 1

Since the dead reckoning distribution f(gact [mid ] godo), which is the solution of (88), f(gact; t), it satisfies (89). From Theorem 5.1 we know that the entropy of a pdf increases under convolution, and from Theorem 6.2 the Fisher information decreases under convolution. These indicate that an estimate of the pose of the robot obtained by selecting the value of g that maximizes ft(g) will become more and more unreliable as an estimator of the actual pose as t increases, and the amount of this unreliability is quantifiable.

Example 2

Due to the symmetry of the room and of the range-sensors, the entropy of the range-only sensing modality can be bounded using the result of Theorem 5.5, which holds both when the subgroups K and H are Lie groups and when they are finite. If both elements of C2, and all four elements of C4, are equally likely, computation of the entropies S(fC2) and S(fC4) becomes trivial, and S(fC4\SE(2)/C2) is computed by focusing on a single asymmetrical region of the configuration space. Theorem 5.5 then allows for the bounding of the actual entropy as the sum of these individual quantities, each of which is easier to compute than a mixture model of the form

f(gactdfront,dback)=18i=03j=01fC4SE(2)C2(g(0,0,iπ2)gactg(0,0,jπ)).

This mixture model reflects maximal uncertainty induced by the symmetries, but does not lend itself to closed-form entropy evaluations, which is one reason why the result of Theorem 5.5 is useful.

Example 3

Since f(gact [mid ] godo) = f(gact, t) in (88), which is a specific example of (18), the de Bruijn inequality in Section 7 (with α set to the Dirac delta function on SE(2)) gives the rate of entropy increase of the odometry model in terms of the Fisher information matrix.

Example 4

Using nothing more than the product rule for the Lie derivatives, the Fisher information matrix for f(gact [mid ] gdis, godo) in (90) can be computed from models of f(gdis [mid ] gact) and f(gact [mid ] godo), and through the CR-bound in Theorem 9.1 this can be used to bound the covariance of f(gact [mid ] gdis, godo).

Example 5

The Fourier-space solution for dead-reckoning models are described in [84], and the SE(2) and SE(3) group-Fourier transforms, their properties, and applications are described in detail in [18, 19, 81], and f^(λ) is completely characterized for equations such as the odometry model in (88). Therefore, the bounds in Theorem 5.2 can be applied.

Example 6

Theorems 5.7 and 5.8(b) are applicable to symmetric functions, including Gaussians on SE(2) with μ = e. The solution of the odometry equation (88) is not a symmetric function. However, ft(g) * ft(g−1) is symmetric. Therefore, if a robot does an initial sensor sweep in a room without symmetry, which results in an SE(2) Gaussian, and then the robot moves under the open-loop odometry model for time t, and then tries to return to its starting location, the resulting pose distribution will be the convolution of the SE(2) Gaussian constructed from range-finder data and the symmetric function ft(g) * ft(g−1). Theorems 5.7 and 5.8(b) allow for the isentropic reordering of some of these convolutions, even though convolution on noncommutative groups is generally order-dependent.

Example 7

The bounds on entropy powers and relative information in (73) and (76) resulting from log-Sobolev inequalities provides a means to respectively evaluate and compare additional informational quantities associated with f(gact [mid ] gdis, godo) and f(gact [mid ] godo).

In summary, the efficacy of nine out of the fifteen theorems presented have been illustrated in the context of a single example in robotics. By establishing the language and principles with which to articulate information theory on Lie groups, other applications that utilize all of these theorems will be explored in the future. For example, Theorems 5.8(a) and 5.9 pertaining to convolution and class functions are not relevant to robot localization in SE(2) because there are no pdfs that are class functions in N(SE(2)) [19]. However, solutions to the heat equation on SO(3) are both symmetric and are class functions, and so these theorems become immediately applicable to spacecraft and submarine orientational estimation problems.

11. Conclusions

By collecting and reinterpreting results relating to the study of diffusion processes, harmonic analysis, and log-Sobolev inequalities on Lie groups, and merging these results with definitions of Fisher information matrix and covariance, many inequalities of information theory were extended here to the context of probability densities on unimodular Lie groups. In addition, the natural decomposition of groups into cosets, double cosets, and the nesting of subgroups provides some inequalities that result from the Kullback-Leibler divergence of probability densities on Lie groups. Some special inequalities related to finite groups (which are also unimodular) were also provided. One open issue is determining the conditions under which the entropy power inequality [8, 66] will hold for Lie groups.

While the emphasis of this paper was on the discovery of fundamental inequalities, the motivation for this study originated with applications in robotics and other areas. Indeed, it was the problem of quantifying the difficulty of robotic assembly [22, 45, 9] and self-repair [40] tasks using the concept of “parts entropy” [63] that led the author to link group theory and information theory. A detailed example in mobile robot localization was provided here to illustrate the efficacy of the presented theorems.

As another example, the idea that a robot or animal should move (in SE(2)) so as to maximize the rate at which its sensors gain information is attracting attention [73, 74, 75, 15, 52, 65, 77, 24, 60, 62]. And analogous problems can be formulated in medical imaging in which only x-rays in directions that maximize the generation of new information need be taken rather than exposing a patient to the radiation of a whole CT scan. Related to this problem is that of biomolecular structure determination from disparate data sets such as cryo-electron microscopy, NMR, x-ray crystallography, etc. Each is related to the structure of molecules, their ensemble motion, and/or their quantum state - all of which are described in terms of probability densities on Lie groups. A first step toward information fusion of data on Lie groups is the version of information theory developed here and demonstrated on an example in the context of a mobile robot.

Acknowledgments

Comments from Graham Beck, Valentina Staneva, and the reviewers were helpful in improving the presentation.

The author is supported by NIH Grant R01 GM075310 and NSF Grant IIS-0915542.

Footnotes

2000 Mathematics Subject Classification. Primary: 22E15, 94A15; Secondary: 22E70.

1Here and throughout this paper, r denotes the fact that derivatives used in defining expressions appear on the ‘right’ and l denotes that they appear on the ‘left.’ This means that expressions with r are left invariant and those with l are right invariant. This convention is opposite that used in much of the mathematics literature.

2It is not required that G = (G/H) × H for this to be true. For example, the integral over SO(3) decomposes into one over S2 and one over S1 even though SO(3) ≠ S3.

3The names of the dummy variables k and k’ are unimportant. However, at this stage it is important that the names be different in order to emphasize their statistical independence.

4There are no surface terms because, like the circle and real line, each coordinate in the integral either wraps around or goes to infinity.

REFERENCES

[1] Amari S, Nagaoka H. Methods of Information Geometry, Translations of Mathematical Monographs 191. American Mathematical Society; Providence, RI: 2000.
[2] Bakry D, Concordet D, Ledoux M. Optimal heat Kernel bounds under logarithmic Sobolev inequalities. ESAIM: Probability and Statistics. 1997;1:391–407.
[3] Baldwin G, Mahony R, Trumpf J. A Nonlinear Observer for 6 DOF Pose Estimation from Inertial and Bearing Measurements. IEEE International Conference on Robotics and Automation; Kobe, Japan. May, 2009.
[4] Barron AR. Entropy and the central limit theorem. Ann. Prob. 1986;14:336–342.
[5] Beckner W. Sharp inequalities and geometric manifolds. J. Fourier Anal. Appl. 1997;3:825–836.
[6] Beckner W. Essays on Fourier Analysis in Honor of Elias M. Stein. Princeton University Press; 1995. Geometric inequalities in Fourier analysis; pp. 36–68.
[7] Berg HC. E. coli in Motion. Springer; New York: 2003.
[8] Blachman NM. The convolution inequality for entropy powers. IEEE Trans. Inform. Theory. 1965;11:267–271.
[9] Boothroyd G. Assembly Automation and Product Design. Second Edition CRC Press; Boca Raton, FL: 2005.
[10] Brockett RW. System theory on group manifolds and coset spaces. SIAM J. Control. 1972;10:265–284.
[11] Brockett RW. Lie Algebras and Lie Groups in Control Theory. In: Mayne DQ, Brockett RW, editors. Geometric Methods in System Theory. Reidel Publishing Company; Dordrecht-Holland: 1973.
[12] Brown LD. A proof of the Central Limit Theorem motivated by the Cramér Rao inequality. In: Kallianpur G, Krishnaiah PR, Ghosh JK, editors. Statistics and Probability: Essays in Honour of C.R. Rao. North-Holland, New York: 1982. pp. 141–148.
[13] Bullo F, Lewis AD. Geometric Control of Mechanical Systems. Springer; 2004.
[14] Carlen EA. Superadditivity of Fishers Information and Logarithmic Sobolev Inequalities. Journal Of Functional Analysis. 1991;101:194–211.
[15] Censi A. On Achievable Accuracy for Pose Tracking. IEEE International Conference on Robotics and Automation; Kobe, Japan. May, 2009.
[16] Chirikjian GS. Fredholm integral equations on the Euclidean motion group. Inverse Problems. 1996;12:579–599.
[17] Chirikjian GS, Wang YF. Conformational statistics of stiff macromolecules as solutions to PDEs on the rotation and motion groups. Physical Review E. 2000;62:880–892. [PubMed]
[18] Chirikjian GS, Kyatkin AB. An operational calculus for the Euclidean motion group with applications in robotics and polymer science. J. Fourier Analysis and Applications. 2000;6:583–606.
[19] Chirikjian GS, Kyatkin AB. Engineering Applications of Noncommutative Harmonic Analysis. CRC Press; Boca Raton, FL: 2001.
[20] Chirikjian GS. Stochastic Models, Information Theory, and Lie groups. Vol. 1. Birkäuser; Boston: 2009.
[21] Chirikjian GS. Stochastic Models, Information Theory, and Lie Groups. Vol. 2. Birkäuser; Boston: 2011.
[22] Chirikjian GS. Parts Entropy and the Principal Kinematic Formula. IEEE Conference on Automation Science and Engineering; Washington D.C.. August 2008.
[23] Chirikjian GS. The stochastic elastica and excluded-volume perturbations of DNA conformational ensembles. International Journal of Non-Linear Mechanics. 2008;43:1108–1120. [PMC free article] [PubMed]
[24] Cortez RA, Tanner HG, Lumia R. Distributed Robotic Radiation Mapping; ISER’08.
[25] Cover TM, Thomas JA. Elements of Information Theory. 2nd ed Wiley-Interscience; Hoboken, NJ: 2006.
[26] Csiszár I. I-Divergence Geometry of Probability Distributions and Minimization Problems. Ann. Prob. 1975;3:146–158.
[27] Dembo A. Information Inequalities and Concentration of Measure. Ann. Prob. 1997;25:527–539.
[28] Dembo A, Cover TM, Thomas JA. Information theoretic inequalities. IEEE Transactions On Information Theory. 1991;37:1501–1518.
[29] Duncan TE. An Estimation problem in compact Lie groups. Syst. Control Lett. 1998;10:257–263.
[30] Gelfand IM, Minlos RA, Ya. Shapiro Z. Representations of the Rotation and Lorentz Groups and Their Applications. Pergamon Press; New York: 1963.
[31] Grenander U. Probabilities on Algebraic Structures. Dover Edition. Wiley: 2008. 1963.
[32] Gross L. Logarithmic Sobolev inequalities. Amer. J. Math. 1975;97:1061–1083.
[33] Gross L. Logarithmic Sobolev inequalities on Lie groups. Illinois J. Math. 1992;36:447–490.
[34] Gurarie D. Symmetry and Laplacians. Introduction to Harmonic Analysis, Group Representations and Applications. Elsevier Science Publisher; The Netherlands: 1992.
[35] Helgason S. Groups and Geometric Analysis. American Mathematical Society; Providence, RI: 2000.
[36] Heyer H. Probability Measures on Locally Compact Groups. Springer-Verlag; New York: 1977.
[37] Johnson O, Suhov Y. Entropy and convergence on compact groups. Journal of Theoretical Probability. 2000;13:843–857.
[38] Johnson O. Information Theory and the Central Limit Theorem. Imperial College Press; London: 2004.
[39] Jurdjevic V, Sussmann HJ. Control systems on Lie groups. Journal of Differential Equations. 1972;12:313–329.
[40] Kutzer MDM, Armand M, Lin E, Scheidt D, Chirikjian GS. Toward cooperative team-diagnosis in multi-robot systems. International Journal of Robotics Research. 2008;27:1069–1090.
[41] Kwon J, Choi M, Park FC, Chu C. Particle filtering on the Euclidean group: framework and applications. Robotica. 2007;25:725–737.
[42] Lawler GF. Introduction to Stochastic Processes. 2nd ed CRC Press; Boca Raton, FL: 2006.
[43] Ledoux M. Concentration of Measure and Logarithmic Sobolev Inequalities. Volume 1709. Springer; Berlin: 1999. (Lecture Notes in Mathematics).
[44] Ledoux M. The Concentration of Measure Phenomenon. Amer. Math. Soc.; Providence, RI: 2001. (Math. Surveys and Monographs 89).
[45] Lee K, Chirikjian GS. Robotic self-replication from low-complexity parts. IEEE Robotics and Automation Magazine. 2007;14:34–43.
[46] Lieb EH, Loss M. Analysis. 2nd ed American Mathematical Society; Providence, RI: 2001.
[47] Linnik YV. An information-theoretic proof of the Central Limit Theorem with the Lindeberg condition. Theory of Probability and its Applications. 1959;4:288–299.
[48] Mackey GW. Induced Representations of Groups and Quantum Mechanics. W. A. Benjamin, Inc.; New York and Amsterdam: 1968.
[49] Makadia A, Daniilidis K. Rotation estimation from spherical images. IEEE Trans. Pattern Anal. Mach. Intell. 2006;28:1170–1175. [PubMed]
[50] Maksimov VM. Necessary and sufficient statistics for the family of shifts of probability distributions on continuous bicompact groups. Theory Of Probability And Its Applications. 1967;12:267–280.
[51] Malis E, Hamel T, Mahony R, Morin P. Dynamic Estimation of Homography Transformations on the Special Linear Group for Visual Servo Control. IEEE International Conference on Robotics and Automation; Kobe, Japan. May, 2009.
[52] Manyika J, Durrant-Whyte H. Data Fusion and Sensor Management: A Decentralized Information-Theoretic Approach. Ellis Horwood; New York: 1994.
[53] Miller W. Lie Theory and Special Functions. Academic Press; New York: 1968.
[54] Mourikis A, Roumeliotis S. On the treatment of relative-pose measurements for mobile robot localization; ICRA’06.
[55] Murray R, Li Z, Sastry S. A Mathematical Introduction to Robotics. CRC Press; 1994.
[56] Otto F, Villani C. Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality. J. Funct. Anal. 2000;173:361–400.
[57] Park W, Liu Y, Zhou Y, Moses M, Chirikjian GS. Kinematic state estimation and motion planning for stochastic nonholonomic systems using the exponential map. Robotica. 2008;26:419–434. [PMC free article] [PubMed]
[58] Park W, Wang Y, Chirikjian GS. The path-of-probability algorithm for steering and feedback control of flexible needles. International Journal of Robotics Research. 2010;29:813–830. [PMC free article] [PubMed]
[59] Pennec X. Intrinsic statistics on Riemannian manifolds: Basic tools for geometric measurements. Journal of Mathematical Imaging and Vision. 2006;25:127–154.
[60] Porat B, Nehorai A. Localizing vapor-emitting sources by moving sensors. IEEE Trans. Signal Processing. 1996;44:10181021.
[61] Roy KK. Exponential families of densities on an analytic group and sufficient statistics. Sankhy a: The Indian Journal of Statistics, Series A. 1975;37:82–92.
[62] Russell RA. Odour Detection by Mobile Robots. World Scientific; Singapore: 1999.
[63] Sanderson AC. Part Entropy Method for Robotic Assembly Design. Proceedings of International Conference on Robotics; 1984.
[64] Shannon CE, Weaver W. The Mathematical Theory of Communication. University of Illinois Press; Urbana: 1949.
[65] Smith P, Drummond T, Roussopoulos K. Computing MAP trajectories by representing, propagating and combining PDFs over groups. Proceedings of the 9th IEEE International Conference on Computer Vision, 2; Nice, France. 2003. pp. 1275–1282.
[66] Stam AJ. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Information and Control. 1959;2:101–112.
[67] Steele JM. Fisher information and detection of a Euclidean perturbation of an independent stationary process. The Annals of Probability. 1986;14:326–335.
[68] Su S, Lee CSG. Manipulation and propagation of uncertainty and verification of applicability of actions in assembly tasks. IEEE Trans. Syst. Man Cybern. 1992;22:1376–1389.
[69] Sugiura M. Unitary Representations and Harmonic Analysis. 2nd edition Elsevier Science Publisher; The Netherlands: 1990.
[70] Talagrand M. New concentration inequalities in product spaces. Invent. Math. 1996;126:505–563.
[71] Taylor ME. Noncommutative Harmonic Analysis. Mathematical Surveys and Monographs, American Mathematical Society; Providence, RI: 1986.
[72] Thrun S, Burgard W, Fox D. Probabilistic Robotics. MIT Press; Cambridge, MA: 2005.
[73] Tzanos P, Zefran M, Nehorai A. Information based distributed control for biochemical source detection and localization; ICRA’05.pp. 4457–4462.
[74] Tzanos P, Zefran M. Stability analysis of information based control for biochemical source localization; ICRA’06.pp. 3116–3121.
[75] Tzanos P, Zefran M. Locating a circular biochemical source: Modeling and control. :523–528.
[76] Varadarajan VS. An Introduction to Harmonic Analysis on Semisimple Lie Groups. Cambridge University Press; Cambridge, England: 1999.
[77] Vergassola1 M, Villermaux E, Shraiman BI. Infotaxis as a strategy for searching without gradients. Nature. 2007;445:406–409. [PubMed]
[78] Vilenkin NJ, Klimyk AU. Representation of Lie Group and Special Functions. Vol. 1-3. Kluwer Academic Publishers; The Netherlands: 1991.
[79] Wang Y, Chirikjian GS. Nonparametric second-order theory of error propagation on the Euclidean group. International Journal of Robotics Research. 2008;27:1258–1273. [PMC free article] [PubMed]
[80] Wang Y, Chirikjian GS. Error propagation on the Euclidean group with applications to manipulator kinematics. IEEE Transactions on Robotics. 2006;22:591–602.
[81] Wang Y, Zhou Y, Maslen DK, Chirikjian GS. Solving the phase-noise Fokker-Planck equation using the motion-group Fourier transform. IEEE Transactions on Communications. 2006;54:868–877.
[82] Willsky AS. Ph.D. dissertation. Dept. Aeronautics and Astronautics, M.I.T; Cambridge, Mass.: Jun, 1973. Dynamical Systems Defined on Groups: Structural Properties and Estimation.
[83] Želobenko DP. Compact Lie Groups and their Representations. American Mathematical Society; Rhode Island: 1973.
[84] Zhou Y, Chirikjian GS. Probabilistic Models of Dead-Reckoning Error in Nonholonomic Mobile Robots; ICRA’03; Taipei, Taiwan. September, 2003.
[85] Zhou Y, Chirikjian GS. Conformational statistics of bent semiflexible polymers. Journal of Chemical Physics. 2003;119:4962–4970.
[86] Zhou Y, Chirikjian GS. Conformational statistics of semi-flexible macromolecular chains with internal joints. Macromolecules. 2006;39:1950–1960. [PMC free article] [PubMed]