Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2990239

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Definition and examples of unimodular Lie groups
- 3. Calculus on Lie groups
- 4. Probability theory and harmonic analysis on unimodular Lie groups
- 5. Properties of entropy and relative entropy on groups
- 6. Fisher information and diffusions on Lie groups
- 7. Generalizing the de Bruijn identity to Lie groups
- 8. Information-theoretic inequalities from Log-Sobolev inequalities
- 9. Covariance, the Cramér-Rao Bound, and maximum entropy distributions on unimodular Lie groups
- 10. An application: Sensor fusion in mobile robotics
- 11. Conclusions
- REFERENCES

Authors

Related links

J Geom Mech. Author manuscript; available in PMC 2010 November 23.

Published in final edited form as:

J Geom Mech. 2010 June 1; 2(2): 119–158.

doi: 10.3934/jgm.2010.2.119PMCID: PMC2990239

NIHMSID: NIHMS224205

Department of Mechanical Engineering Johns Hopkins University 3400 N. Charles St. Baltimore, MD 21218, USA

Communicated by Andrew D. Lewis

Classical inequalities used in information theory such as those of de Bruijn, Fisher, Cramér, Rao, and Kullback carry over in a natural way from Euclidean space to unimodular Lie groups. These are groups that possess an integration measure that is simultaneously invariant under left and right shifts. All commutative groups are unimodular. And even in noncommutative cases unimodular Lie groups share many of the useful features of Euclidean space. The rotation and Euclidean motion groups, which are perhaps the most relevant Lie groups to problems in geometric mechanics, are unimodular, as are the unitary groups that play important roles in quantum computing. The extension of core information theoretic inequalities defined in the setting of Euclidean space to this broad class of Lie groups is potentially relevant to a number of problems relating to information gathering in mobile robotics, satellite attitude control, tomographic image reconstruction, biomolecular structure determination, and quantum information theory. In this paper, several definitions are extended from the Euclidean setting to that of Lie groups (including entropy and the Fisher information matrix), and inequalities analogous to those in classical information theory are derived and stated in the form of fifteen small theorems. In all such inequalities, addition of random variables is replaced with the group product, and the appropriate generalization of convolution of probability densities is employed. An example from the field of robotics demonstrates how several of these results can be applied to quantify the amount of information gained by pooling different sensory inputs.

Shannon’s brand of information theory is now more than six decades old, and some of the statistical methods developed by Fisher, Kullback, etc., are even older. Similarly, the study of Lie groups is now more than a century old. Despite their relatively long and roughly parallel history, surprisingly few connections appear to have been made between these two vast fields. The only attempts to do so known to the author include those of Johnson and Suhov [37, 38] from an information-theoretic perspective, Willsky [82] from an estimation and controls perspective, and Maksimov [50] and Roy [61] from a probability perspective.

The goal of this paper is therefore to present analytical foundations for “information theory on Lie groups.” As such, fifteen small theorems are presented that involve the structure and/or group operation of Lie groups. Unlike extensions of information theory to manifolds, the added structure inherent in Lie groups allow us to draw much stronger parallels with inequalities of classical information theory, such as those presented in [25, 28].

To the best of the author’s knowledge the only work that uses the concept and properties of information-theoretic entropy on Lie groups is that of Johnson and Suhov [37, 38]. Their goal was to use the Kullback-Leibler divergence between probability density functions on compact Lie groups to study the convergence to uniformity under iterated convolutions, in analogy with what was done by Linnik [47] and Barron [4] in the commutative case. The goal of the present paper is complementary: using some of the same tools, many of the major defined quantities and inequalities of (differential) information theory are extended from ${\mathbb{R}}^{n}$ to the context of unimodular Lie groups, which form a broader class of Lie groups than compact ones.

The goal here is to define and formalize probabilistic and information-theoretic quantities that are currently arising in scenarios such as robotics [45, 52, 65, 72, 57, 40, 79], bacterial motion [7, 77], and parts assembly in automated manufacturing systems [9, 22, 63, 68]. The topics of detection, tracking, estimation and control on Lie groups have been studied extensively over the past four decades. For example, see [10, 11, 15, 29, 39, 41, 49, 67, 19, 57, 79, 51, 3, 82] (and references therein). Many of these problems involve probability densities on the group of rigid-body motions. Indeed, the author’s own work on highly articulated robotic arms and polymer/DNA statistical mechanics involve computing probabilities and entropies on the group of rigid-body motions [23, 22, 58, 79, 80, 86]. However, rather than focusing only on rigid-body motions, a general information theory on the much broader class of unimodular Lie groups is presented here with little additional effort.

Several other research areas that would initially appear to be related to the present work have received intensive interest. Decades ago Amari and Csiszár developed the concept of information geometry [1, 26] in which the Fisher information matrix is used to define a Riemannian metric tensor on spaces of probability distributions, thereby allowing those spaces to be viewed as Riemannian manifolds. This provides a connection between information theory and differential geometry. However, in information geometry, the probability distributions themselves (such as Gaussian distributions) are defined on a Euclidean space, rather than on a Lie group. The presentation provided here opens up the possibility of defining information geometries on spaces of functions on Lie groups.

A different kind of connection between information theory and geometry has been established in the context of medical imaging and computer vision in which probability densities on manifolds are analyzed using information-theoretic techniques [59]. However, a manifold generally does not have an associated group operation, and so there is no natural way to “add” random variables.

Very little has been done along the lines of developing information theory on Lie groups, which in addition to possessing the structure of differential manifolds, also are endowed with group operations. Indeed, it would appear that applications such as deconvolution on Lie groups [16] and the field of Simultaneous Localization and Mapping (or SLAM) [72] have preceded the development of formal information inequalities that take advantage of the Lie-group structure of rigid-body motions.

This paper attempts to address this deficit with a two-pronged approach: (1) by collecting some known results from the functional analysis literature and reinterpreting them in information-theoretic terms (e.g. Gross’ log-Sobolev inequality on Lie groups); and (2) by defining information-theoretic quantities such as entropy and Fisher information matrix, and deriving inequalities involving these quantities that parallels those in classical information theory.

The remainder of this paper is structured as follows: Section 2 starts with two concrete examples of Lie groups. This leads in to the more general review of unimodular Lie groups presented in Section 3. A discussion of harmonic analysis and probability theory on unimodular Lie groups is presented in Section 4. Section 5 defines entropy and relative entropy for unimodular Lie groups and proves some of their properties under convolution and marginalization over subgroups and coset spaces. The concept of the Fisher information matrix for probability densities on unimodular Lie groups is defined in Section 6 and several elementary properties are proven. This generalized concept of Fisher information is used in Section 7 to establish the de Bruijn inequality for unimodular Lie groups. These definitions and properties are combined with recent results by others on log-Sobolev inequalities in Section 8. Section 9 derives a version of the Cramér-Rao bound for concentrated pdfs on Lie groups. Section 10 illustrates the efficacy of a subset of these theorems in the context of a mobile-robot localization problem. Finally, Section 11 summarizes the results and reviews how these equalities might be applied to other problems of practical interest.

A Lie group *G* with operation ○ is called unimodular if there exists a volume element *dg* such that for every *h* *G* and every function $f:G\to \mathbb{C}$ integrable with respect to *dg* that

$${\int}_{G}f(h\circ g)dg={\int}_{G}f(g\circ h)dg={\int}_{G}f\left(g\right)dg.$$

(1)

It is known that whenever the above condition holds, that [71, 19]

$${\int}_{G}f\left({g}^{-1}\right)dg={\int}_{G}f\left(g\right)dg$$

(2)

follows. Many possible spaces of functions on a unimodular Lie group, *G*, can exist. This includes those that are absolutely integrable (which is implicit in the above equations), forming the space *L*^{1}(*G*). There are those that are square integrable, meaning that

$${\int}_{G}{\mid f\left(g\right)\mid}^{2}dg<\infty .$$

The space of all such functions is *L*^{2}(*G*). There is also the space of smooth functions *C*^{∞}(*G*). Since the matrix Lie groups of interest in applications are also smooth Riemannian manifolds, the space of smooth functions, *C*^{∞}(*G*), can be defined with respect to this manifold structure. The focus here is probability density functions (i.e., non-negative real-valued functions that integrate to unity),

$$f\left(g\right)\ge 0\phantom{\rule{thinmathspace}{0ex}}\forall \phantom{\rule{thinmathspace}{0ex}}g\in G\phantom{\rule{thinmathspace}{0ex}}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\int}_{G}f\left(g\right)dg=1,$$

which automatically means that *f* *L*^{1}(*G*). In addition, we will restrict the focus to probability densities $f\in \mathcal{N}\left(G\right)$, the class of “nice” functions defined by

$$\mathcal{N}\left(G\right)={L}^{1}\left(G\right)\cap {L}^{2}\left(G\right)\cap {C}^{\infty}\left(G\right).$$

While this may seem like a severe restriction, in practice bandlimited Fourier expansions of any function in *L*^{2}(*G*) are also in *L*^{1}(*G*) and *C*^{∞}(*G*). Setting the finite bandlimit high enough ensures the desired accuracy in the approximation of any *f* *L*^{2}(*G*) while simultaneously being in *L*^{1}(*G*) and *C*^{∞}(*G*).

When using coordinates, the volume element for an *n*-dimensional Lie group is expressed as *dg* = *J*(**q**)*dq*_{1}*dq*_{2}*dq _{n}* where

Examples of unimodular Lie groups are examined in detail below. Readers who are already familiar with these examples can skip directly to Section 3. Perhaps one reason why there has been little cross-fertilization between the theory of Lie groups and information theory is that the presentation styles in these two fields are very different. Whereas Lie groups belong to pure mathematics, information theory emerged from engineering. Therefore, this section reviews some on the basic properties of Lie groups from a concrete engineering perspective. The focus is on matrix Lie groups, and the treatment of Lie-theoretic concepts is reduced to simple matrix-algebraic notation. All of the groups considered are therefore matrix Lie groups. Likewise, some notations used in information theory, such as *H*(*X*) to denote the entropy of a random variable, are replaced with *S*(*ρ*), where *ρ* is the probability density describing *X*. This notation, while not standard in information theory, fits better in the context of Lie groups.

Consider the set of 3×3 rotation matrices

$$\mathit{SO}\left(3\right)=\{R\in {\mathbb{R}}^{3\times 3}\mid \phantom{\rule{thinmathspace}{0ex}}R{R}^{T}=\mathbb{I},\mathrm{det}R=+1\}.$$

Here *SO*(3) denotes the set of special orthogonal 3 × 3 matrices with real entries.

Given any 3-parameter description of rotation, the angular velocity of a rigid body can be obtained from a rotation matrix. Angular velocity in the body-fixed and space-fixed reference frames can be written respectively as^{1}

$${\omega}_{r}={J}_{r}\left(\mathbf{q}\right)\stackrel{.}{\mathbf{q}}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\omega}_{l}={J}_{l}\left(\mathbf{q}\right)\stackrel{.}{\mathbf{q}}$$

where **q** is any parametrization of *SO*(3).

The Lie algebra *so*(3) consists of skew-symmetric matrices of the form

$$X=\left(\begin{array}{ccc}\hfill 0\hfill & \hfill -{x}_{3}\hfill & \hfill {x}_{2}\hfill \\ \hfill {x}_{3}\hfill & \hfill 0\hfill & \hfill -{x}_{1}\hfill \\ \hfill -{x}_{2}\hfill & \hfill {x}_{1}\hfill & \hfill 0\hfill \end{array}\right)=\sum _{i=1}^{3}{x}_{i}{X}_{i}.$$

(3)

The skew-symmetric matrices {*X _{i}*} form a basis for the set of all such 3 × 3 skew-symmetric matrices, and the coefficients {

Lie algebras and Lie groups are related in general by the exponential map. For matrix Lie groups (which are the only kind of Lie groups that will be discussed here), the exponential map is the matrix exponential function. In this specific case,

$$\mathrm{exp}:\mathit{so}\left(3\right)\to \mathit{SO}\left(3\right).$$

It is well known (see [19] for derivation and references) that this yields the parametrization

$$R\left(\mathbf{x}\right)={e}^{X}=I+\frac{\mathrm{sin}\phantom{\rule{thinmathspace}{0ex}}\Vert \mathbf{x}\Vert}{\Vert \mathbf{x}\Vert}X+\frac{(1-\mathrm{cos}\phantom{\rule{thinmathspace}{0ex}}\Vert \mathbf{x}\Vert )}{{\Vert \mathbf{x}\Vert}^{2}}{X}^{2}$$

(4)

where $\Vert \mathbf{x}\Vert ={({x}_{1}^{2}+{x}_{2}^{2}+{x}_{3}^{2})}^{\frac{1}{2}}$.

The Jacobian matrices *J _{r}*(

$${J}_{l}\left(\mathbf{x}\right)=\left[{\left(\frac{\partial R}{\partial {x}_{1}}{R}^{T}\right)}^{\vee},{\left(\frac{\partial R}{\partial {x}_{2}}{R}^{T}\right)}^{\vee},{\left(\frac{\partial R}{\partial {x}_{3}}{R}^{T}\right)}^{\vee}\right].$$

and

$${J}_{r}\left(\mathbf{x}\right)=\left[{\left({R}^{T}\frac{\partial R}{\partial {x}_{1}}\right)}^{\vee},{\left({R}^{T}\frac{\partial R}{\partial {x}_{2}}\right)}^{\vee},{\left({R}^{T}\frac{\partial R}{\partial {x}_{3}}\right)}^{\vee}\right].$$

This gives a hint as to why the subscripts *l* and *r* are used: if derivatives with respect to parameters appear on the ‘right’ of *R ^{T}*, this is denoted with an

Whereas the set of all rotations together with matrix multiplication forms a noncommutative (*R*_{1}*R*_{2} ≠ *R*_{2}*R*_{1} in general) Lie group, the set of all angular velocity vectors *ω*_{r} and *ω*_{l} (or more precisely, their corresponding matrices, ${\widehat{\omega}}_{r}$ and ${\widehat{\omega}}_{l}$) together with the operations of addition and scalar multiplication form a vector space. Furthermore, this vector space is endowed with an additional operation, the cross product *ω*_{1} × *ω*_{2} (or equivalently the matrix commutator $[{\widehat{\omega}}_{1},{\widehat{\omega}}_{2}]={\widehat{\omega}}_{1}{\widehat{\omega}}_{2}-{\widehat{\omega}}_{2}{\widehat{\omega}}_{1}$). This makes the set of all angular velocities a Lie algebra, which is denoted as *so*(3) (as opposed to the Lie group, *SO*(3)).

An interesting and useful fact is that except for a set of measure zero, all elements of *SO*(3) can be captured with the parameters within the open ball defined by **x** < *π*. The nature of the set of values **x** = *π* are irrelevant in the computations that are performed here because it constitutes a set of measure zero, which is of no consequence when performing computations with functions in $\mathcal{N}\left(G\right)$ in general, and for the specific case when *G* = *SO*(3). The matrix logarithm of any group element parameterized in the range **x** < *π* is also well defined. Therefore, when **x** < *π*

$$\mathrm{log}\left(R\right)=\frac{1}{2}\frac{\Vert \mathbf{x}\Vert}{\mathrm{sin}\phantom{\rule{thinmathspace}{0ex}}\Vert \mathbf{x}\Vert}(R-{R}^{T})$$

where

$$\Vert \mathbf{x}\Vert ={\mathrm{cos}}^{-1}\left(\frac{\mathrm{tr}\left(R\right)-1}{2}\right)$$

where tr denotes the trace.

Relatively simple analytical expressions have been derived for the Jacobian *J _{l}* and its inverse when rotations are parameterized as in (4):

$${J}_{l}\left(\mathbf{x}\right)=I+\frac{1-\mathrm{cos}\phantom{\rule{thinmathspace}{0ex}}\Vert \mathbf{x}\Vert}{{\Vert \mathbf{x}\Vert}^{2}}X+\frac{\Vert \mathbf{x}\Vert -\mathrm{sin}\phantom{\rule{thinmathspace}{0ex}}\Vert \mathbf{x}\Vert}{{\Vert \mathbf{x}\Vert}^{3}}{X}^{2}.$$

(5)

The corresponding Jacobian *J _{r}* is calculated as [19]

$${J}_{r}\left(\mathbf{x}\right)=I-\frac{1-\mathrm{cos}\phantom{\rule{thinmathspace}{0ex}}\Vert \mathbf{x}\Vert}{{\Vert \mathbf{x}\Vert}^{2}}X+\frac{\Vert \mathbf{x}\Vert -\mathrm{sin}\phantom{\rule{thinmathspace}{0ex}}\Vert \mathbf{x}\Vert}{{\Vert \mathbf{x}\Vert}^{3}}{X}^{2}.$$

Note that

$${J}_{l}={J}_{r}^{T}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{J}_{l}=R{J}_{r}.$$

The determinants are

$$\mid \mathrm{det}\left({J}_{l}\right)\mid =\mid \mathrm{det}\left({J}_{r}\right)\mid =\frac{2(1-\mathrm{cos}\phantom{\rule{thinmathspace}{0ex}}\Vert \mathbf{x}\Vert )}{{\Vert \mathbf{x}\Vert}^{2}}$$

The Euclidean motion group of the plane can be thought of as the set of all matrices of the form

$$g(x,y,\theta )=\left(\begin{array}{ccc}\hfill \mathrm{cos}\phantom{\rule{thinmathspace}{0ex}}\theta \hfill & \hfill -\mathrm{sin}\phantom{\rule{thinmathspace}{0ex}}\theta \hfill & \hfill x\hfill \\ \hfill \mathrm{sin}\phantom{\rule{thinmathspace}{0ex}}\theta \hfill & \hfill \mathrm{cos}\phantom{\rule{thinmathspace}{0ex}}\theta \hfill & \hfill y\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill \end{array}\right)$$

(6)

together with the operation of matrix multiplication.

It is straightforward to verify that the form of these matrices is closed under multiplication and inversion, and that $g(0,0,0)=\mathbb{I}$, and that it is therefore a group. This is often referred to as the special Euclidean group, and is denoted as *SE*(2). Like *SO*(3), *SE*(2) is three dimensional. However, unlike *SO*(3), *SE*(2) is not compact. Nevertheless, it is possible to define a natural integration measure for *SE*(2) as

$$dg=dxdyd\theta .$$

And while *SE*(2) does not have finite volume (and so there is no single natural normalization constant such as 8*π*^{2} in the case of *SO*(3)), this integration measure nevertheless can be used to compute probabilities from probability densities.

Note that

$$g(x,y,\theta )=\mathrm{exp}(x{X}_{1}+y{X}_{2})\phantom{\rule{thinmathspace}{0ex}}\mathrm{exp}\left(\theta {X}_{3}\right)$$

where

$${X}_{1}=\left(\begin{array}{ccc}\hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \end{array}\right);\phantom{\rule{1em}{0ex}}{X}_{2}=\left(\begin{array}{ccc}\hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \end{array}\right);\phantom{\rule{1em}{0ex}}{X}_{3}=\left(\begin{array}{ccc}\hfill 0\hfill & \hfill -1\hfill & \hfill 0\hfill \\ \hfill 1\hfill & \hfill 0\hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 0\hfill \end{array}\right).$$

These matrices form a basis for the Lie algebra, *se*(2). It is convenient to identify these with the natural basis for ${\mathbb{R}}^{3}$ by defining (*X _{i}*)

The Jacobians for this parametrization are then of the form

$${J}_{l}=\left[{\left(\frac{\partial g}{\partial x}{g}^{-1}\right)}^{\vee},\phantom{\rule{thickmathspace}{0ex}}{\left(\frac{\partial g}{\partial y}{g}^{-1}\right)}^{\vee},\phantom{\rule{thickmathspace}{0ex}}{\left(\frac{\partial g}{\partial \theta}{g}^{-1}\right)}^{\vee}\right]=\left(\begin{array}{ccc}\hfill \mathrm{cos}\phantom{\rule{thinmathspace}{0ex}}\theta \hfill & \hfill \mathrm{sin}\phantom{\rule{thinmathspace}{0ex}}\theta \hfill & \hfill 0\hfill \\ \hfill -\mathrm{sin}\phantom{\rule{thinmathspace}{0ex}}\theta \hfill & \hfill \mathrm{cos}\phantom{\rule{thinmathspace}{0ex}}\theta \hfill & \hfill 0\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill \end{array}\right)$$

and

$${J}_{r}=\left[{\left({g}^{-1}\frac{\partial g}{\partial x}\right)}^{\vee},\phantom{\rule{thickmathspace}{0ex}}{\left({g}^{-1}\frac{\partial g}{\partial y}\right)}^{\vee},\phantom{\rule{thickmathspace}{0ex}}{\left({g}^{-1}\frac{\partial g}{\partial \theta}\right)}^{\vee}\right]=\left(\begin{array}{ccc}\hfill 1\hfill & \hfill 0\hfill & \hfill y\hfill \\ \hfill 0\hfill & \hfill 1\hfill & \hfill -x\hfill \\ \hfill 0\hfill & \hfill 0\hfill & \hfill 1\hfill \end{array}\right).$$

Note that

$$\mid \mathrm{det}\left({J}_{l}\right)\mid =\mid \mathrm{det}\left({J}_{r}\right)\mid =1.$$

This parametrization is not unique, though it is probably the most well-known one.

Whereas two low-dimensional examples of Lie groups were presented in the previous section to make the discussion concrete, a vast variety of different kinds of Lie groups exist. For example, the same constraints that were used to define *SO*(3) relative to ${\mathbb{R}}^{3\times 3}$ can be used to define *SO*(*n*) from ${\mathbb{R}}^{n\times n}$. The result is a Lie group of dimension *n*(*n* − 1)/2 and has a natural volume element *dR*. Similarly, the Euclidean motion group generalizes as all (*n* + 1) × (*n* + 1) matrices of the form

$$g=\left(\begin{array}{cc}\hfill R\hfill & \hfill \mathbf{t}\hfill \\ \hfill {0}^{T}\hfill & \hfill 1\hfill \end{array}\right)=\left(\begin{array}{cc}\hfill \mathbb{I}\hfill & \hfill \mathbf{t}\hfill \\ \hfill {0}^{T}\hfill & \hfill 1\hfill \end{array}\right)\left(\begin{array}{cc}\hfill R\hfill & \hfill 0\hfill \\ \hfill {0}^{T}\hfill & \hfill 1\hfill \end{array}\right)$$

(7)

resulting in *SE*(*n*) having dimension *n*(*n* + 1)/2 and natural volume element *dg* = *dRd***t** where $\mathbf{t}\in {\mathbb{R}}^{n}$ and *d***t** = *dt*_{1}*dt*_{2}*dt _{n}* is the natural integration measure for ${\mathbb{R}}^{n}$.

Many other unimodular Lie groups arise in applications. All compact Lie groups (including *SU*(*n*) and *U*(*n*)) are unimodular, as are the noncompact Lie groups $\mathit{GL}(n,\mathbb{R})$, $\mathit{GL}(n,\mathbb{C})$, $\mathit{SL}(n,\mathbb{R})$, $\mathit{SL}(n,\mathbb{C})$, and the Heisenberg groups, *H*(*n*). All finite groups can be considered unimodular when summation over the group replaces integration. This is not an exhaustive list, but it does indicate that unimodular Lie groups form quite a substantial fraction of those groups of interest in geometric mechanics.

The following subsections briefly review the general theory of unimodular Lie groups that will be relevant when defining information-theoretic inequalities.

In general an n-dimensional real matrix Lie algebra is defined by a basis consisting of real matrices {*X _{i}*} for

In a neighborhood around the identity of the corresponding Lie group, the parametrization

$$g({x}_{1},\dots ,{x}_{n})=\mathrm{exp}\phantom{\rule{thinmathspace}{0ex}}X\phantom{\rule{1em}{0ex}}\text{where}\phantom{\rule{1em}{0ex}}X=\sum _{i=1}^{n}{x}_{i}{X}_{i}$$

(8)

is always valid in a region around the identity in the corresponding Lie group. And in fact, for the examples discussed, this parametrization is good over almost the whole group, with the exception of a set of measure zero.

The logarithm map

$$\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}g\left(\mathbf{x}\right)=X$$

(which is the inverse of the exponential) is valid except on this set of measure zero. It will be convenient in the analysis to follow to identify a vector $\mathbf{x}\in {\mathbb{R}}^{n}$ as

$$\mathbf{x}={\left(\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}g\right)}^{\vee}\phantom{\rule{1em}{0ex}}\text{where}\phantom{\rule{1em}{0ex}}{\left({X}_{i}\right)}^{\vee}={\mathbf{e}}_{i}.$$

(9)

Here {**e**_{i}} is the natural basis for ${\mathbb{R}}^{n}$.

In terms of quantities that have been defined in the examples, the adjoint matrices *Ad* and *ad* are the following matrix-valued functions:

$$\mathit{Ad}\left(g\right)={J}_{l}{J}_{r}^{-1}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}\mathit{ad}\left(X\right)=\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}\mathit{Ad}\left({e}^{X}\right).$$

(10)

The dimensions of these square matrices is the same as the dimension of the Lie group, which can be very different than the dimensions of the matrices that are used to represent the elements of the group. The function Δ(*g*) = det*Ad*(*g*) is called the modular function of *G*. For a unimodular Lie group, Δ(*g*) = 1, which is used in many texts as the defining property rather than (1). It follows that for unimodular Lie groups det(*J _{l}*) = det(

Unimodular Lie groups are defined by the fact that their integration measures are invariant under shifts and inversions. In any parametrization, this measure (or the corresponding volume element) can be expressed as in the examples by first computing a left or right Jacobian matrix and then setting *dg* = *J*(**q**)*dq*_{1}*dq*_{2}*dq _{n}* where

$${\int}_{G}f\left(g\right)dg={\int}_{\mathfrak{G}}f\left({e}^{X}\right)\mathrm{det}\left(\frac{1-{e}^{-\mathit{ad}\left(X\right)}}{\mathit{ad}\left(X\right)}\right)d\mathbf{x}$$

where **x** = *X*^{} and *d***x** = *dx*_{1}*dx*_{2}*dx _{n}*. In the above expression it makes sense to write the division of one matrix by another because the involved matrices commute. The symbol $\mathfrak{G}$ is used to denote the Lie algebra corresponding to

Many different kinds of unimodular Lie groups exist. For example, *SO*(3) is compact and therefore has finite volume; *SE*(2) is solvable, *H*(1) is nilpotent; and $\mathit{SL}(n,\mathbb{R})$ is semisimple. Each of these classes of Lie groups has been studied extensively. But for the purpose of this discussion, it is sufficient treat them all within the larger class of unimodular Lie groups.

Given a function *f*(*g*), the left and right Lie derivatives are defined with respect to any basis element of the Lie algebra ${X}_{i}\in \mathfrak{G}$ as

$${\stackrel{~}{X}}_{i}^{r}f\left(g\right)={\phantom{\mid}\left(\frac{d}{dt}f(g\circ \mathrm{exp}(t{X}_{i}\left)\right)\right)\mid}_{t=0}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\stackrel{~}{X}}_{i}^{l}f\left(g\right)={\phantom{\mid}\left(\frac{d}{dt}f\left(\mathrm{exp}\right(-t{X}_{i})\circ g)\right)\mid}_{t=0}.$$

(11)

The use of *l* and *r* mimicks the way that the subscripts were used in the Jacobians *J _{l}* and

$${\stackrel{~}{\mathbf{X}}}^{r}f={\left[{J}_{r}\right(\mathbf{q}\left)\right]}^{-T}{\nabla}_{\mathbf{q}}f\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\stackrel{~}{\mathbf{X}}}^{l}f=-{\left[{J}_{l}\right(\mathbf{q}\left)\right]}^{-T}{\nabla}_{\mathbf{q}}f$$

(12)

where ${\stackrel{~}{\mathbf{X}}}^{r}={[{\stackrel{~}{X}}_{1}^{r},\dots ,{\stackrel{~}{X}}_{n}^{r}]}^{T}$, ${\stackrel{~}{\mathbf{X}}}^{l}={[{\stackrel{~}{X}}_{1}^{l},\dots ,{\stackrel{~}{X}}_{n}^{l}]}^{T}$, and _{q} = [/*q*_{1}, …, /*q _{n}*]

Rather than resorting to the fact that a Lie group is a manifold and using general differential geometric techniques, the space of functions *C*^{∞}(*G*) can be defined in terms of the derivatives (11) in analogy with the way that ${C}^{\infty}\left({\mathbb{R}}^{n}\right)$ is defined in terms of the usual partial derivatives along coordinate axes.

Given two probability density functions *f*_{1}(*g*) and *f*_{2}(*g*), their convolution is

$$({f}_{1}\ast {f}_{2})\left(g\right)={\int}_{G}{f}_{1}\left(h\right){f}_{2}({h}^{-1}\circ g)dh.$$

(13)

Here *h* *G* is a dummy variable of integration. Convolution inherits associativity from the group operation, but since in general *g*_{1} ○ *g*_{2} ≠ *g*_{2} ○ *g*_{1}, (*f*_{1} * *f*_{2})(*g*) ≠ (*f*_{2} * *f*_{1})(*g*).

For a unimodular Lie group, the convolution integral of the form in (13) can be written in the following equivalent ways:

$$\begin{array}{cc}\hfill ({f}_{1}\ast {f}_{2})\left(g\right)& ={\int}_{G}{f}_{1}\left({z}^{-1}\right){f}_{2}(z\circ g)dz\hfill \\ \hfill & ={\int}_{G}{f}_{1}(g\circ {k}^{-1}){f}_{2}\left(k\right)dk\hfill \end{array}$$

(14)

where the substitutions *z* = *h*^{−1} and *k* = *h*^{−1}○*g* have been made, and the invariance of integration under shifts and inversions in (1) and (2) is used.

A powerful generalization of classical Fourier analysis exists. It is built on families of unitary matrix-valued functions of group-valued argument that are parameterized by values λ drawn from a set *Ĝ* and satisfy the homomorphism property:

$$U({g}_{1}\circ {g}_{2},\lambda )=U({g}_{1},\lambda )U({g}_{2},\lambda ).$$

(15)

Using * to denote the Hermitian conjugate, it follows that

$$\mathbb{I}=U(e,\lambda )=U({g}^{-1}\circ g,\lambda )=U({g}^{-1},\lambda )U(g,\lambda ),$$

and so

$$U({g}^{-1},\lambda )={\left(U\right(g,\lambda \left)\right)}^{-1}={U}^{\ast}(g,\lambda ).$$

In this generalized Fourier analysis (called noncommutative harmonic analysis) each *U*(*g*, λ) is constructed to be *irreducible* in the sense that it is not possible to simultaneously block-diagonalize *U*(*g*, λ) by the same similarity transformation for all values of *g* in the group. Such a matrix function *U*(*g*, λ) is called an *irreducible unitary representation*. Completeness of a set of representations means that every (reducible) representation can be decomposed into a direct sum of the representations in the set.

Explicit expressions for matrix entries of the IUR matrices, *U*(*g*, λ), are known for many Lie groups in terms of particular parameterizations, and are expressed in terms of the special functions of mathematical physics. For details see the classical works [30, 34, 48, 53, 69, 71, 76, 78, 83] and the more recent [19].

Once a complete set of IURs is known for a unimodular Lie group, the Fourier transform of a function on that group can be defined as

$$\widehat{f}\left(\lambda \right)={\int}_{G}f\left(g\right)U({g}^{-1},\lambda )dg.$$

Here λ (which is analogous to frequency) indexes the complete set of all IURs. An inversion formula can be used to recover the original function from all of the Fourier transforms as

$$f\left(g\right)={\int}_{\widehat{G}}\mathrm{tr}\left[\widehat{f}\right(\lambda \left)U\right(g,\lambda \left)\right]d\left(\lambda \right).$$

(16)

The integration measure *d*(λ) on the dual (frequency) space *Ĝ* must be constructed on a case-by-case basis. In the case of a compact Lie group, *Ĝ* is discrete, and the resulting inversion formula is a series, much like the classical Fourier series for 2*π*-periodic functions. In this case the integral over *Ĝ* becomes a sum, and *d*(λ) becomes the dimension of the finite-dimensional matrix *U*(*g*, λ). However, in the case of noncompact groups the structure of *Ĝ* and the corresponding integral over *Ĝ* can vary widely. For example, in the case of *SE*(*n*), *Ĝ* λ can be identified with the positive real line with measure *d*(λ) = λ^{n–1}*d*λ.

A convolution theorem follows from (15) as

$$\left(\widehat{{f}_{1}\ast {f}_{2}}\right)\left(\lambda \right)={\widehat{f}}_{2}\left(\lambda \right){\widehat{f}}_{1}\left(\lambda \right)$$

and so does the Parseval/Plancherel formula:

$${\int}_{G}{\mid f\left(g\right)\mid}^{2}dg={\int}_{\widehat{G}}{\Vert \widehat{f}\left(\lambda \right)\Vert}^{2}d\left(\lambda \right).$$

(17)

Here · is the Hilbert-Schmidt (Frobenius) norm, and *d*(λ) is the dimension of the matrix *U*(*g*, λ).

A useful definition is

$$u({X}_{i},\lambda )=\frac{d}{dt}{\left(U\right(\mathrm{exp}\left(t{X}_{i}\right),\lambda \left)\right)\mid}_{t=0}.$$

Explicit expressions for *U*(*g*, λ) and *u*(*X _{i}*, λ) using the exponential map and corresponding parameterizations for the groups

As a consequence of these definitions, it can be shown that the following operational properties result [19]:

$$\widehat{{X}_{i}^{r}f}=u({X}_{i},\lambda )\widehat{f}\left(\lambda \right)\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}\widehat{{X}_{i}^{l}f}=-\widehat{f}\left(\lambda \right)u({X}_{i},\lambda ).$$

This is very useful in probability problems because a diffusion equation with drift of the form

$$\frac{\partial \rho (g;t)}{\partial t}=-\sum _{i=1}^{d}{h}_{i}{\stackrel{~}{X}}_{i}^{r}\rho (g;t)+\frac{1}{2}\sum _{i,j=1}^{d}{D}_{ij}{\stackrel{~}{X}}_{i}^{r}{\stackrel{~}{X}}_{j}^{r}\phantom{\rule{thinmathspace}{0ex}}\rho (g;t)$$

(18)

(where *D* = [*D _{ij}*] is symmetric and positive semidefinite and given initial conditions

$$\rho (g;t)={\int}_{\widehat{G}}\mathrm{tr}\left[\mathrm{exp}\right(t\mathcal{B}\left(\lambda \right)\left)U\right(g,\lambda \left)\right]d\left(\lambda \right)$$

(19)

where

$$\mathcal{B}\left(\lambda \right)=\frac{1}{2}\sum _{k,l=1}^{n}{D}_{lk}\phantom{\rule{thinmathspace}{0ex}}u({X}_{l},\lambda )u({X}_{k},\lambda )-\sum _{l=1}^{n}{h}_{l}\phantom{\rule{thinmathspace}{0ex}}u({X}_{l},\lambda ).$$

The solution of this sort of diffusion equation is important as a generalization of the concept of a Gaussian distribution. It has been studied by the author in the case of *G* = *SE*(3) in the context of polymer statistical mechanics and robotic manipulators [17, 18, 85]. The detailed structure of the matrices *u*(*X _{l}*, λ) and

The entropy of a pdf on a unimodular Lie group is defined as

$$S\left(f\right)=-{\int}_{G}f\left(g\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}f\left(g\right)\phantom{\rule{thinmathspace}{0ex}}dg.$$

For example, the entropy of a Gaussian distribution on $(G,\circ )=({\mathbb{R}}^{n},+)$ with covariance ∑ is

$$S\left(\rho \right(g;t\left)\right)=\mathrm{log}\left\{{\left(2\pi e\right)}^{n\u22152}{\mid \Sigma \left(t\right)\mid}^{\frac{1}{2}}\right\}$$

(20)

where log = log_{e}.

The Kullback-Leibler distance between the pdfs *f*_{1}(*g*) and *f*_{2}(*g*) on a Lie group *G* naturally generalizes from its form in ${\mathbb{R}}^{n}$ as

$${D}_{\mathit{KL}}({f}_{1}\Vert {f}_{2})={\int}_{G}{f}_{1}\left(g\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}\left(\frac{{f}_{1}\left(g\right)}{{f}_{2}\left(g\right)}\right)dg.$$

(21)

As with the case of pdfs in ${\mathbb{R}}^{n}$, *D _{KL}*(

Something that is not true in ${\mathbb{R}}^{n}$ that holds for a compact Lie group is that the maximum-entropy distribution is the number one relative to the normalized Haar measure. Such a distribution can be considered the limiting distribution of the diffusion process in (18) as time goes to infinity. If *f*_{2}(*g*) = 1 is this sort of limiting distribution, then *D _{KL}*(

Jensen’s inequality is a fundamental tool that is often used in deriving information-theoretic inequalities, as well as inequalities in the field of convex geometry. In the context of unimodular Lie groups, Jensen’s inequality can be written as

$$\Phi \left({\int}_{G}\varphi \left(g\right)\rho \left(g\right)dg\right)\le {\int}_{G}\Phi \left(\varphi \right(g\left)\right)\rho \left(g\right)dg$$

(22)

where $\Phi :{\mathbb{R}}_{\ge 0}\to \mathbb{R}$ is a convex function on the half infinite line, *ρ*(*g*) is a pdf, and *ϕ*(*g*) is another nonnegative measurable function on *G*.

Two important examples of (*x*) are _{1}(*x*) = −log *x* and _{2}(*x*) = +*x* log *x*. Using Jensen’s inequality with _{2} gives the following result.

**Theorem 5.1. ***Given pdfs f*_{1}(*g*) *and f*_{2}(*g*) *on the unimodular Lie group G*,

$$S({f}_{1}\ast {f}_{2})\ge \mathrm{max}\left\{S\right({f}_{1}),S({f}_{2}\left)\right\}$$

(23)

*and*

$${D}_{\mathit{KL}}({f}_{1}\Vert {f}_{2})\ge \mathrm{max}\phantom{\rule{thinmathspace}{0ex}}\left\{{D}_{\mathit{KL}}\right({f}_{1}\ast \varphi \Vert {f}_{2}\ast \varphi ),{D}_{\mathit{KL}}(\varphi \ast {f}_{1}\Vert \varphi \ast {f}_{2}\left)\right\}.$$

(24)

*Proof*.

$$\begin{array}{cc}\hfill -S({f}_{1}\ast {f}_{2})& ={\int}_{G}{\Phi}_{2}\left(\right({f}_{1}\ast {f}_{2}\left)\right(g\left)\right)dg\hfill \\ \hfill & ={\int}_{G}{\Phi}_{2}\left({\int}_{G}{f}_{2}({h}^{-1}\circ g){f}_{1}\left(h\right)dh\right)dg\hfill \\ \hfill & \le {\int}_{G}{\int}_{G}{\Phi}_{2}\left({f}_{2}\right({h}^{-1}\circ g\left)\right){f}_{1}\left(h\right)dhdg\hfill \\ \hfill & ={\int}_{G}\left({\int}_{G}{\Phi}_{2}\left({f}_{2}\right({h}^{-1}\circ g\left)\right)dg\right){f}_{1}\left(h\right)dh\hfill \\ \hfill & ={\int}_{G}\left({\int}_{G}{\Phi}_{2}\left({f}_{2}\right(g\left)\right)dg\right){f}_{1}\left(h\right)dh\hfill \\ \hfill & =\left({\int}_{G}{\Phi}_{2}\left({f}_{2}\right(g\left)\right)dg\right)\left({\int}_{G}{f}_{1}\left(h\right)dh\right)\hfill \\ \hfill & =-S\left({f}_{2}\right).\hfill \end{array}$$

If, on the other hand, we were to use the version of convolution in (14), and analogous manipulations as above, we would get −*S*(*f*_{1} * *f*_{2}) ≤ −*S*(*f*_{1}), which completes the proof of (23).

The proof of (24) (which is the Lie-group version of the *data processing inequality*) follows in the usual way from the joint convexity of the functional *D _{KL}*(· ·).

If *G* is compact, any constant function on *G* is measurable. Letting *ϕ*(*g*) = 1 and (*x*) = _{2}(*x*) then gives 0 ≤ −*S*(*f*) for a pdf *f*(*g*). In contrast, for any unimodular Lie group, letting *ρ*(*g*) = *f*(*g*), *ϕ*(*g*) = [*f*(*g*)]^{α} and (*x*) = _{1}(*x*) gives

$$-\mathrm{log}\left({\int}_{G}{\left[f\right(g\left)\right]}^{1+\alpha}dg\right)\le \alpha S\left(f\right).$$

(25)

This leads to the following theorem.

**Theorem 5.2. ***Let* $\Vert \widehat{f}\left(\lambda \right)\Vert $ *denote the Frobenius norm and* ${\Vert \widehat{f}\left(\lambda \right)\Vert}_{2}$ *denote the induced* 2-*norm of the Fourier transform of f*(*g*) *and define*

$$\begin{array}{cc}\hfill {D}_{2}\left(f\right)& =-{\int}_{\widehat{G}}\mathrm{log}{\Vert \widehat{f}\left(\lambda \right)\Vert}_{2}^{2}d\left(\lambda \right)\hfill \\ \hfill D\left(f\right)& =-{\int}_{\widehat{G}}\mathrm{log}{\Vert \widehat{f}\left(\lambda \right)\Vert}^{2}d\left(\lambda \right)\hfill \\ \hfill \stackrel{~}{D}\left(f\right)& =-\mathrm{log}{\int}_{\widehat{G}}{\Vert \widehat{f}\left(\lambda \right)\Vert}^{2}d\left(\lambda \right).\hfill \end{array}$$

(26)

*Then*

$$S\left(f\right)\ge \stackrel{~}{D}\left(f\right)\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}D\left(f\right)\le {D}_{2}\left(f\right)$$

(27)

*and*

$$\begin{array}{cc}\hfill {D}_{2}({f}_{1}\ast {f}_{2})\phantom{\rule{1em}{0ex}}& \ge \phantom{\rule{1em}{0ex}}{D}_{2}\left({f}_{1}\right)+{D}_{2}\left({f}_{2}\right)\hfill \\ \hfill D({f}_{1}\ast {f}_{2})\phantom{\rule{1em}{0ex}}& \ge \phantom{\rule{1em}{0ex}}D\left({f}_{1}\right)+D\left({f}_{2}\right).\hfill \end{array}$$

(28)

Furthermore, denote the unit Heaviside step function on the real line as u(*x*) *and let*

$$B={\int}_{\widehat{G}}u\left(\Vert \widehat{f}\left(\lambda \right)\Vert \right)d\left(\lambda \right).\phantom{\rule{1em}{0ex}}\text{Then}\phantom{\rule{1em}{0ex}}\stackrel{~}{D}\left(f\right)+\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}B\le D\left(f\right)\u2215B.$$

(29)

*Proof*. Substituting *α* = 1 into (25) and using the Plancherel formula (17) yields

$$S\left(f\right)\ge -\mathrm{log}\left({\int}_{G}{\left[f\right(g\left)\right]}^{2}dg\right)=-\mathrm{log}\left({\int}_{\widehat{G}}{\Vert \widehat{f}\left(\lambda \right)\Vert}^{2}d\left(\lambda \right)\right)=\stackrel{~}{D}\left(f\right).$$

The fact that −log *x* is a decreasing function and *A*_{2} ≤ *A* for all $A\in {\mathbb{C}}^{n\times n}$ gives the second inequality in (27).

The convolution theorem together with the facts that both norms are submultiplicative, −log(*x*) is a decreasing function, and the log of the product is the sum of the logs gives

$$\begin{array}{cc}\hfill D({f}_{1}\ast {f}_{2})& =-{\int}_{\widehat{G}}\mathrm{log}{\Vert \widehat{{f}_{1}\ast {f}_{2}}\left(\lambda \right)\Vert}^{2}d\left(\lambda \right)=-{\int}_{\widehat{G}}\mathrm{log}{\Vert {\widehat{f}}_{1}\left(\lambda \right){\widehat{f}}_{2}\left(\lambda \right)\Vert}^{2}d\left(\lambda \right)\hfill \\ \hfill & \ge D\left({f}_{1}\right)+D\left({f}_{2}\right).\hfill \end{array}$$

An identical calculation follows for *D*_{2}. The statement in (29) follows from the Plancherel formula (17) and using Jensen’s inequality (22) in the dual space *Ĝ* rather than on *G*:

$$\Phi \left({\int}_{\widehat{G}}\Vert \widehat{\varphi}\left(\lambda \right)\Vert \rho \left(\lambda \right)d\left(\lambda \right)\right)\le {\int}_{\widehat{G}}\Phi (\Vert \widehat{\varphi}(\lambda )\Vert )\rho \left(\lambda \right)d\left(\lambda \right)$$

(30)

where

$${\int}_{\widehat{G}}\rho \left(\lambda \right)d\left(\lambda \right)=1\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}\rho \left(\lambda \right)\ge 0.$$

Recognizing that when *B* is finite $\rho \left(\lambda \right)=u\left(\Vert \widehat{f}\left(\lambda \right)\Vert \right)\u2215B$ becomes a probability measure on this dual space, it follows that

$$\begin{array}{cc}\hfill \stackrel{~}{D}\left(f\right)& =-\mathrm{log}\left({\int}_{\widehat{G}}{\Vert \widehat{f}\left(\lambda \right)\Vert}^{2}d\left(\lambda \right)\right)=-\mathrm{log}\left(B{\int}_{\widehat{G}}{\Vert \widehat{f}\left(\lambda \right)\Vert}^{2}\rho \left(\lambda \right)d\left(\lambda \right)\right)\hfill \\ \hfill & \le -\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}B-{\int}_{\widehat{G}}\mathrm{log}\left({\Vert \widehat{f}\left(\lambda \right)\Vert}^{2}\right)\rho \left(\lambda \right)d\left(\lambda \right)=-\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}B+D\left(f\right)\u2215B.\hfill \end{array}$$

This completes the proof.

Properties of dispersion measures similar to *D*(*f*) and *D*_{2}(*f*) were studied in [31], but no connections to entropy were provided previously. By definition, bandlimited expansions have *B* finite. On the other hand, it is a classical result that for a finite group, Γ, the Plancherel formula is (see, for example, [19]):

$$\sum _{\gamma \in \Gamma}{\mid f\left(\gamma \right)\mid}^{2}=\frac{1}{\mid \Gamma \mid}\sum _{k=1}^{\alpha}{d}_{k}^{2}{\Vert {\widehat{f}}_{k}\Vert}^{2}$$

where *α* is the number of conjugacy classes of Γ and *d _{k}* is the dimension of ${\widehat{f}}_{k}$. And by Burnside’s formula ${\sum}_{k=1}^{\alpha}{d}_{k}^{2}=\mid \Gamma \mid $ it follows that

A finite group (or for that matter, any group with countable entries) can be considered to be a unimodular zero-dimensional Lie group, where integration is replaced by summation over all of the group elements. Here several results that are unique to this case are presented.

Let Γ be a finite group with Γ elements {*g*_{1}, …, *g*_{Γ}}, and let *ρ*^{Γ}(*g _{i}*) ≥ 0 with ${\sum}_{i=1}^{\mid \Gamma \mid}{\rho}^{\Gamma}\left({g}_{i}\right)=1$ define a probability density/distribution on Γ. In analogy with how convolution and entropy are defined on a Lie group,

$${\rho}^{G}\left(g\right)=\sum _{i=1}^{\mid \Gamma \mid}{\rho}^{\Gamma}\left({g}_{i}\right)\delta ({g}_{i}^{-1}\circ g)=\sum _{\gamma \in \Gamma}{\rho}^{\Gamma}\left(\gamma \right)\delta ({\gamma}^{-1}\circ g)$$

can be used to define a pdf on *G* that is equivalent to a pdf on Γ in the sense that if the convolution of two pdfs on Γ is

$$({\rho}_{1}^{\Gamma}\ast {\rho}_{2}^{\Gamma})\left({g}_{i}\right)=\sum _{j=1}^{\mid \Gamma \mid}{\rho}_{1}^{\Gamma}\left({g}_{j}\right){\rho}_{2}^{\Gamma}({g}_{j}^{-1}\circ {g}_{i})$$

(31)

then

$$({\rho}_{1}^{G}\ast {\rho}_{2}^{G})\left(g\right)=\sum _{\gamma \in \Gamma}({\rho}_{1}^{\Gamma}\ast {\rho}_{2}^{\Gamma})\left(\gamma \right)\delta ({\gamma}^{-1}\circ g).$$

(32)

Given a finite group, Γ, let

$$S\left(\rho \right)=-\sum _{i=1}^{\mid \Gamma \mid}\rho \left({g}_{i}\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}\rho \left({g}_{i}\right)=-\sum _{\gamma \in \Gamma}\rho \left(\gamma \right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}\rho \left(\gamma \right).$$

Unlike the case of differential/continuous entropy on a Lie group, 0 ≤ *S*(*ρ*).

The following theorem describes how the discrete entropy of pdfs on Γ behaves under convolution. Since only finite groups are addressed, the superscript Γ on the discrete values *ρ*(*g _{i}*) are dropped.

**Theorem 5.3.** The entropy of the convolution of two pdfs on a finite group is greater than either of the entropies of the convolved pdfs and is no greater than the sum of their individual entropies

$$\mathrm{max}\left\{S\right({\rho}_{1}),S({\rho}_{2}\left)\right\}\le S({\rho}_{1}\ast {\rho}_{2})\le S\left({\rho}_{1}\right)+S\left({\rho}_{2}\right).$$

(33)

*Proof*. The lower bound follows in the same way as the proof given for Theorem 5.1 with summation in place of integration. The entropy of convolved distributions on a finite group can be bounded from above in the following way.

Since the convolution sum contains products of all pairs, and each product is positive, it follows that

$${\rho}_{1}\left({g}_{k}\right){\rho}_{2}({g}_{k}^{-1}\circ {g}_{i})\le ({\rho}_{1}\ast {\rho}_{2})\left({g}_{i}\right)$$

for all *k* {1, …, Γ}. Therefore, since log is a strictly increasing function, it follows that

$$-S({\rho}_{1}\ast {\rho}_{2})\ge \sum _{i=1}^{\mid \Gamma \mid}\left(\sum _{j=1}^{\mid \Gamma \mid}{\rho}_{1}\left({g}_{j}\right){\rho}_{2}({g}_{j}^{-1}\circ {g}_{i})\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}\left({\rho}_{1}\left({g}_{k}\right)\left){\rho}_{2}\right({g}_{k}^{-1}\circ {g}_{i})\right).$$

Since this is true for all values of *k*, we can bring the log term inside of the summation sign and choose *k* = *j*. Then multiplying by −1, and using the properties of the log function, we get

$$\begin{array}{cc}\hfill S({\rho}_{1}\ast {\rho}_{2})\le & -\sum _{i=1}^{\mid \Gamma \mid}\sum _{j=1}^{\mid \Gamma \mid}{\rho}_{1}\left({g}_{j}\right){\rho}_{2}({g}_{j}^{-1}\circ {g}_{i})\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}{\rho}_{1}\left({g}_{j}\right)\hfill \\ \hfill & -\sum _{i=1}^{\mid \Gamma \mid}\sum _{j=1}^{\mid \Gamma \mid}{\rho}_{1}\left({g}_{j}\right){\rho}_{2}({g}_{j}^{-1}\circ {g}_{i})\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}{\rho}_{2}({g}_{j}^{-1}\circ {g}_{i}).\hfill \end{array}$$

Rearranging the order of summation signs gives

$$\begin{array}{cc}\hfill S({\rho}_{1}\ast {\rho}_{2})\le & -\sum _{j=1}^{\mid \Gamma \mid}{\rho}_{1}\left({g}_{j}\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}{\rho}_{1}\left({g}_{j}\right)\left(\sum _{i=1}^{\mid \Gamma \mid}{\rho}_{2}({g}_{j}^{-1}\circ {g}_{i})\right)\hfill \\ \hfill & -\sum _{j=1}^{\mid \Gamma \mid}{\rho}_{1}\left({g}_{j}\right)\left(\sum _{i=1}^{\mid \Gamma \mid}{\rho}_{2}({g}_{j}^{-1}\circ {g}_{i})\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}{\rho}_{2}({g}_{j}^{-1}\circ {g}_{i})\right).\hfill \end{array}$$

(34)

But summation of a function over a group is invariant under shifts. That is,

$$\sum _{i=1}^{\mid \Gamma \mid}F({g}_{j}^{-1}\circ {g}_{i})=\sum _{i=1}^{\mid \Gamma \mid}F\left({g}_{i}\right)\phantom{\rule{1em}{0ex}}\text{or}\phantom{\rule{1em}{0ex}}\sum _{\gamma \in \Gamma}F({\gamma}^{-1}\circ g)=\sum _{\gamma \in \Gamma}F\left(g\right).$$

Hence, the terms in parenthesis in (34) can be written by replacing ${g}_{j}^{-1}\circ {g}_{i}$ with *g _{i}* gives (33).

Aside from the ability to sustain the concept of convolution, one of the fundamental ways that groups resemble Euclidean space is the way in which they can be decomposed. In analogy with the way that an integral over a vector-valued function with argument $\mathbf{x}\in {\mathbb{R}}^{n}$ can be decomposed into integrals over each coordinate, integrals over Lie groups can also be decomposed in natural ways. This has implications with regard to inequalities involving the entropy of pdfs on Lie groups. Analogous expressions hold for finite groups, with volume replaced by the number of group elements.

Given a subgroup *H* ≤ *G*, and any element *g* *G*, the *left coset gH* is defined as *gH* = {*g* ○ *h**h* *H*}. Similarly, the right coset *Hg* is defined as *Hg* = {*h* ○ *g**h* *H*}. In the special case when *g* *H*, the corresponding left and right cosets are equal to *H*. More generally for all *g* *G*, *g* *gH* and *g*_{1}*H* = *g*_{2}*H* if and only if ${g}_{2}^{-1}\circ {g}_{1}\in H$. Likewise for right cosets *Hg*_{1} = *Hg*_{2} if and only if ${g}_{1}\circ {g}_{2}^{-1}\in H$. Any group is divided into disjoint left (right) cosets, and the statement “*g*_{1} and *g*_{2} are in the same left (right) coset” is an equivalence relation.

When *G* is finite, an important property of *gH* and *Hg* is that they each have the same number of elements as *H*. The set of all left(or right) cosets is called the left(or right) *coset space*, and is denoted as *G*/*H* (or *H*\*G*). For finite groups, *G*/*H* = *H*\*G* = *G*/*H*, which is *Lagrange’s theorem*. Similar expressions can be written for Lie groups and Lie subgroups. For example, *e*^{dim(G/H)} = *e*^{dim(G)}/*e*^{dim(H)} where *dim*(·) denotes the dimension of a manifold, and if *G* is a compact Lie group with Lie subgroup *H*, then *V ol*(*G*/*H*) = *V ol*(*G*)/*V ol*(*H*).

In what follows, it will be convenient to denote a function on *G* as *f _{G}*(

$${\int}_{G}{f}_{G}\left(g\right)dg={\int}_{G\u2215H}\left({\int}_{H}{f}_{G}\left({c}_{G\u2215H}\right(gH)\circ h)dh\right)d\left(gH\right)$$

(35)

where *dh* and *d*(*gH*) are unique up to normalization. In the special case when *f _{G}*(

$${\int}_{G}{f}_{G}\left(g\right)dg={\int}_{G\u2215H}{f}_{G\u2215H}\left(gH\right)d\left(gH\right)$$

where it is assumed that *dh* is normalized so that *∫ _{H} dh* = 1. More generally,

$${f}_{G\u2215H}\left(gH\right)={\int}_{H}{f}_{G}\left({c}_{G\u2215H}\right(gH)\circ h)dh$$

is the value of the function *f _{G}*(

$${f}_{H}\left(h\right)={\int}_{G\u2215H}{f}_{G}\left({c}_{G\u2215H}\right(gH)\circ h)d\left(gH\right)$$

is average of *f _{G}*(

**Theorem 5.4.** The entropy of a pdf on a unimodular Lie group is no greater than the sum of the marginal entropies on a subgroup and the corresponding coset space:

$$S\left({f}_{G}\right)\le S\left({f}_{G\u2215H}\right)+S\left({f}_{H}\right).$$

(36)

*Proof*. This inequality follows immediately from the nonnegativity of the Kullback-Leibler divergence

$${D}_{\mathit{KL}}({f}_{G}\phantom{\rule{thinmathspace}{0ex}}\Vert \phantom{\rule{thinmathspace}{0ex}}{f}_{G\u2215H}\cdot {f}_{H})\ge 0.$$

For example, if *G* = *SE*(*n*) is a Euclidean motion group and *H* = *SO*(*n*) is the subgroup of pure rotations in *n*-dimensional Euclidean space, then $G\u2215H\cong {\mathbb{R}}^{n}$, and an arbitrary element of *SE*(*n*) is written as a pair $(R,\mathbf{t})\in \mathit{SO}\left(n\right)\times {\mathbb{R}}^{n}$, then we can write

$$\begin{array}{cc}\hfill {\int}_{\mathit{SE}\left(n\right)}f\left(g\right)dg=& {\int}_{{\mathbb{R}}^{n}}{\int}_{\mathit{SO}\left(n\right)}f(R,\mathbf{t})dRd\mathbf{t}\hfill \\ \hfill =& {\int}_{\mathit{SE}\left(n\right)\u2215\mathit{SO}\left(n\right)}\left({\int}_{\mathit{SO}\left(n\right)}f\left(\right(\mathbb{I},\mathbf{t})\circ (R,0\left)\right)dR\right)d\mathbf{t},\hfill \end{array}$$

and the marginal entropies on the right-hand-side of (36) are those computed for pure rotations and pure translations.

Let *H* < *G* and *K* < *G*. Then for any *g* *G*, the set

$$HgK=\{h\circ g\circ k\mid h\in H,k\in K\}$$

(37)

is called the *double coset* of *H* and *K*, and any *g*’ *HgK* (including *g*’ = *g*) is called a *representative* of the double coset. Though a double coset representative often can be described with two or more different pairs (*h*_{1}, *k*_{1}) and (*h*_{2}, *k*_{2}) so that *g*’ = *h*_{1} ○ *g* ○ *k*_{1} = *h*_{2} ○ *g* ○ *k*_{2}, we only count *g*’ once in *HgK*. Hence *HgK* ≤ *G*, and in general *HgK* ≠ *H* · *K*. In general, the set of all double cosets of *H* and *K* is denoted *H*\*G*/*K* and we have the hierarchy *g* *HgK* *H*\*G*/*K*. It can be shown that membership in a double coset is an equivalence relation. That is, *G* is partitioned into disjoint double cosets, and for *H* < *G* and *K* < *G* either *Hg*_{1}*K* ∩ *Hg*_{2}*K* = or *Hg*_{1}*K* = *Hg*_{2}*K*.

It is possible to define a mapping *c*_{K\G/H} : *K*\*G*/*H* → *G* such that for any *HgK* *K*\*G*/*H*, *c*_{K\G/H}(*HgK*) *HgK*. Such a function defines a rule for selecting one representative per double coset. Equipped with such a function, it becomes possible to write *g* = *k* ○ *c*_{K\G/H}(*HgK*) ○ *h* and hence [21]

$${\int}_{G}{f}_{G}\left(g\right)dg={\int}_{K\u2215G\u2215H}{\int}_{H}{\int}_{K}{f}_{G}(k\circ {c}_{K\u2215G\u2215H}(HgK)\circ h)dkdhd\left(KgH\right).$$

(38)

A particular example of this is the integral over *SO*(3), which can be written in terms of Euler angles as

$$\begin{array}{c}\hfill {\int}_{\mathit{SO}\left(3\right)}f\left(g\right)dg=\frac{1}{8{\pi}^{2}}{\int}_{0}^{2\pi}{\int}_{0}^{\pi}{\int}_{0}^{2\pi}f\left({R}_{3}\right(\alpha \left){R}_{1}\right(\beta \left){R}_{3}\right(\gamma \left)\right)\mathrm{sin}\phantom{\rule{thinmathspace}{0ex}}\beta d\alpha d\beta d\gamma =\hfill \\ \hfill {\int}_{\mathit{SO}\left(2\right)\u2215\mathit{SO}\left(3\right)\u2215\mathit{SO}\left(2\right)}{\int}_{\mathit{SO}\left(2\right)}{\int}_{\mathit{SO}\left(2\right)}f\left({h}_{1}\right(\alpha )\circ c(HgH)\circ {h}_{2}(\gamma \left)\right)d{h}_{1}d{h}_{2}d\left(HgH\right)\hfill \end{array}$$

where *h*_{1}(*α*) = *R*_{3}(*α*), *h*_{2}(*γ*) = *R*_{3}(*γ*) *SO*(2) and *c*(*HgH*) = *R*_{1}(*β*) is the coset-representative function, and *dh*_{1} = *dα*/2*π*, *dh*_{1} = *dγ*/2*π*, and *d*(*HgH*) = sin *βdβ*/2 in this case.

**Theorem 5.5.** The entropy of a pdf on a group is no greater than the sum of marginal entropies over any two subgroups and the corresponding double-coset space:

$$S\left({f}_{G}\right)\le S\left({f}_{K}\right)+S\left({f}_{K\u2215G\u2215H}\right)+S\left({f}_{H}\right).$$

(39)

*Proof*. Let

$$\begin{array}{cc}\hfill {f}_{K}\left(k\right)=& {\int}_{K\u2215G\u2215H}{\int}_{H}{f}_{G}(k\circ {c}_{K\u2215G\u2215H}(HgK)\circ h)dhd\left(KgH\right)\hfill \\ \hfill {f}_{H}\left(h\right)=& {\int}_{K\u2215G\u2215H}{\int}_{K}{f}_{G}(k\circ {c}_{K\u2215G\u2215H}(HgK)\circ h)dkd\left(KgH\right)\hfill \end{array}$$

and

$${f}_{K\u2215G\u2215H}\left(KgH\right)={\int}_{K}{\int}_{H}{f}_{G}(k\circ {c}_{K\u2215G\u2215H}(HgK)\circ h)dhdk,$$

then again using the nonnegativity of the Kullback-Leibler divergence

$${D}_{KL}({f}_{G}\parallel {f}_{K}\cdot {f}_{K\u2215G\u2215H}\cdot {f}_{H})\ge 0$$

gives (39).

**Theorem 5.6.** The entropy of a pdf is no greater than the sum of entropies of its marginals over coset spaces defined by nested subgroups:

$$S\left({f}_{G}\right)\le S\left({f}_{G\u2215K}\right)+S\left({f}_{K\u2215H}\right)+S\left({f}_{H}\right).$$

(40)

*Proof*. Given a subgroup *K* of *H*, which is itself a subgroup of *G* (that is, *H* < *K* < *G*), apply (36) twice. Then *S*(*f _{G}*) ≤

$$\begin{array}{cc}\hfill {f}_{G\u2215K}\left(gK\right)=& {\int}_{K\u2215H}{\int}_{H}{f}_{G}\left({c}_{G\u2215K}\right(gK)\circ {c}_{K\u2215H}(kH)\circ h)dhd\left(kH\right)\hfill \\ \hfill {f}_{K\u2215H}\left(kH\right)=& {\int}_{G\u2215K}{\int}_{H}{f}_{G}\left({c}_{G\u2215K}\right(gK)\circ {c}_{K\u2215H}(kH)\circ h)dhd\left(gK\right)\hfill \end{array}$$

and

$${f}_{H}\left(h\right)={\int}_{G\u2215K}{\int}_{K\u2215H}{f}_{G}\left({c}_{G\u2215K}\right(gK)\circ {c}_{K\u2215H}(kH)\circ h)d\left(kH\right)d\left(gK\right).$$

In analogy with the way a coset is defined, the conjugate of a subgroup *H* for a given *g* *G* is defined as *gHg*^{−1} = {*g* ○ *h* ○ *g*^{−1}*h* *H*}. Recall that a subgroup *N* ≤ *G* is called *normal* if and only if *gNg*^{−1} *N* for all *g* *G*. This is equivalent to the conditions *N* *g*^{−1}*Ng*, and so we also write *gNg*^{−1} = *N* and *gN* = *Ng* for all *g* *G*.

A function, χ(*g*), that is constant on each class has the property that

$$\chi \left(g\right)=\chi ({h}^{-1}\circ g\circ h)\phantom{\rule{1em}{0ex}}\text{or}\phantom{\rule{1em}{0ex}}\chi (h\circ g)=\chi (g\circ h)$$

(41)

for any *g*, *h* *G*. Such functions are called class functions. Though convolution of functions on a noncommutative group is generally noncommutative, the special nature of class functions means that

$$\begin{array}{cc}\hfill (f\ast \chi )\left(g\right)=& {\int}_{G}f\left(h\right)\chi ({h}^{-1}\circ g)dh={\int}_{G}f\left(h\right)\chi (g\circ {h}^{-1})dh\hfill \\ \hfill =& {\int}_{G}\chi \left(k\right)f({k}^{-1}\circ g)dk=(\chi \ast f)\left(g\right).\hfill \end{array}$$

where the change of variables *k* = *g* ○ *h*^{−1} is used together with the unimodularity of *G*.

In general (*ρ*_{1} * *ρ*_{2})(*g*) ≠ (*ρ*_{2} * *ρ*_{1})(*g*). Even so, it can be the case that *S*(*ρ*_{1} * *ρ*_{2})(*g*) = *S*(*ρ*_{2} * *ρ*_{1})(*g*). This section addresses several special cases when this equality holds.

Let *G* denote a unimodular Lie group and for arbitrary *g*, *g*_{1} *G* define *ρ*^{#}(*g*) = *ρ*(*g*^{−1}), ${L}_{{g}_{1}}\rho \left(g\right)=\rho ({g}_{1}^{-1}\circ g)$, *R*_{g1}*ρ*(*g*) = *ρ*(*g* ○ *g*_{1}), ${C}_{{g}_{1}}\rho \left(g\right)=\rho ({g}_{1}^{-1}\circ g\circ {g}_{1})$. Then if *ρ*(*g*) is a pdf, it follows immediately from (1) and (2) that *ρ*^{#}(*g*), *L*_{g1}*ρ*(*g*), *R*_{g1}*ρ*(*g*), and *C*_{g1}*ρ*(*g*) are all pdfs. A function for which *ρ*^{#}(*g*) = *ρ*(*g*) is called symmetric, whereas a function for which *C*_{g1}*ρ*(*g*) = *ρ*(*g*) for all *g _{i}*

**Theorem 5.7.** For arbitrary pdfs on a unimodular Lie group G and arbitrary g_{1}, *g*_{2} *G*,

$${\rho}_{1}\ast {\rho}_{2}\ne {\rho}_{2}^{\#}\ast {\rho}_{1}^{\#}\ne {L}_{{g}_{1}}{\rho}_{1}\ast {R}_{{g}_{2}}{\rho}_{2}\ne {C}_{{g}_{1}}{\rho}_{1}\ast {C}_{{g}_{1}}{\rho}_{2},$$

however, entropy satisfies the following equalities

$$S({\rho}_{1}\ast {\rho}_{2})=S({\rho}_{2}^{\#}\ast {\rho}_{1}^{\#})=S({L}_{{g}_{1}}{\rho}_{1}\ast {R}_{{g}_{2}}{\rho}_{2})=S({C}_{{g}_{1}}{\rho}_{1}\ast {C}_{{g}_{1}}{\rho}_{2}).$$

(42)

*Proof*. Each equality is proven by changing variables and using the unimodularity properties in (1) and (2).

$$\begin{array}{cc}\hfill ({\rho}_{2}^{\#}\ast {\rho}_{1}^{\#})\left(g\right)=& {\int}_{G}{\rho}_{2}^{\#}\left(h\right){\rho}_{1}^{\#}({h}^{-1}\circ g)dh={\int}_{G}{\rho}_{2}\left({h}^{-1}\right){\rho}_{1}({g}^{-1}\circ h)dh\hfill \\ \hfill =& {\int}_{G}{\rho}_{1}({g}^{-1}\circ {k}^{-1}){\rho}_{2}\left(k\right)dk=({\rho}_{1}\ast {\rho}_{2})\left({g}^{-1}\right)={({\rho}_{1}\ast {\rho}_{2})}^{\#}\left(g\right).\hfill \end{array}$$

Let *F*[*ρ*] = −*ρ*log*ρ*. Then due to (2), the integral over *G* of *F*[*ρ*(*g*^{−1})] must be the same as *F*[*ρ*(*g*)], proving the first equality in (42). The second equality follows from the fact that (*L*_{g1}*ρ*_{1}**R*_{g2}*ρ*_{2})(*g*) = (*ρ*_{1}**ρ*_{2})(*g*_{1}○*g*○*g*_{2}) and the integral of *F*[*ρ*(*g*_{1}○*g*○*g*_{2})] must be the same as *F*[*ρ*(*g*)], again due to (1) and (2). The final equality follows in a similar way from the fact that $({C}_{{g}_{1}}{\rho}_{1}\ast {C}_{{g}_{1}}{\rho}_{2})\left(g\right)=({\rho}_{1}\ast {\rho}_{2})({g}_{1}^{-1}\circ g\circ {g}_{1})$.

Note that the equalities in (42) can be combined. For example,

$$S({\rho}_{1}\ast {\rho}_{2})=S({L}_{{g}_{1}}{\rho}_{2}^{\#}\ast {R}_{{g}_{2}}{\rho}_{1}^{\#})=S({C}_{{g}_{1}}{\rho}_{2}^{\#}\ast {C}_{{g}_{1}}{\rho}_{1}^{\#}).$$

**Theorem 5.8. ***The equality S*(*ρ*_{1} * *ρ*_{2}) = *S*(*ρ*_{2} * *ρ*_{1}) *holds for pdfs ρ*_{1}(*g*) *and ρ*_{2}(*g*) on a unimodular Lie group G in the following cases: (a) ρ_{i}(*g*) *for i* = 1 *or i* = 2 *is a class function; (b) ρ _{i}*(

*Proof*. Statement (a) follows from the fact that if either *ρ*_{1} or *ρ*_{2} is a class function, then convolutions commute. Statement (b) follows from the first equality in (42) and the definition of a symmetric function.

**Theorem 5.9. ***Given class functions χ*_{1}(*g*) *and χ*_{2}(*g*) *that are pdfs, then for general g*_{1}, *g*_{2} *G*,

$$({\chi}_{1}\ast {\chi}_{2})\left(g\right)\ne ({L}_{{g}_{1}}{\chi}_{1}\ast {L}_{{g}_{2}}{\chi}_{2})\left(g\right)\ne ({R}_{{g}_{1}}{\chi}_{1}\ast {R}_{{g}_{2}}{\chi}_{2})\left(g\right)\ne ({R}_{{g}_{1}}{\chi}_{1}\ast {L}_{{g}_{2}}{\chi}_{2})\left(g\right)$$

*and yet*

$$S({\chi}_{1}\ast {\chi}_{2})=S({L}_{{g}_{1}}{\chi}_{1}\ast {L}_{{g}_{2}}{\chi}_{2})=S({R}_{{g}_{1}}{\chi}_{1}\ast {R}_{{g}_{2}}{\chi}_{2})=S({R}_{{g}_{1}}{\chi}_{1}\ast {L}_{{g}_{2}}{\chi}_{2}).$$

(43)

*Proof*. Here the first and final equality will be proven. The middle one follows in the same way.

$$\begin{array}{cc}\hfill ({L}_{{g}_{1}}{\chi}_{1}\ast {L}_{{g}_{2}}{\chi}_{2})\left(g\right)=& {\int}_{G}\left({L}_{{g}_{1}}{\chi}_{1}\right)\left(h\right)\ast \left({L}_{{g}_{2}}{\chi}_{2}\right)({h}^{-1}\circ g)dh\hfill \\ \hfill =& {\int}_{G}{\chi}_{1}({g}_{1}^{-1}\circ h){\chi}_{2}({g}_{2}^{-1}\circ {h}^{-1}\circ g)dh\hfill \\ \hfill =& {\int}_{G}{\chi}_{1}\left(k\right){\chi}_{2}({g}_{2}^{-1}\circ {k}^{-1}\circ {g}_{1}^{-1}\circ g)dk\hfill \\ \hfill =& {\int}_{G}{\chi}_{1}\left(k\right){\chi}_{2}({k}^{-1}\circ {g}_{1}^{-1}\circ g\circ {g}_{2}^{-1})dk\hfill \\ \hfill =& ({\chi}_{1}\ast {\chi}_{2})({g}_{1}^{-1}\circ g\circ {g}_{2}^{-1}).\hfill \end{array}$$

Similarly,

$$\begin{array}{cc}\hfill ({R}_{{g}_{1}}{\chi}_{1}\ast {L}_{{g}_{2}}{\chi}_{2})\left(g\right)=& {\int}_{G}\left({R}_{{g}_{1}}{\chi}_{1}\right)\left(h\right)\ast \left({L}_{{g}_{2}}{\chi}_{2}\right)({h}^{-1}\circ g)dh\hfill \\ \hfill =& {\int}_{G}{\chi}_{1}(h\circ {g}_{1}){\chi}_{2}({g}_{2}^{-1}\circ {h}^{-1}\circ g)dh\hfill \\ \hfill =& {\int}_{G}{\chi}_{1}\left(k\right)\ast {\chi}_{2}({g}_{2}^{-1}\circ {g}_{1}\circ {k}^{-1}\circ g)dk\hfill \\ \hfill =& {\int}_{G}{\chi}_{1}\left(k\right)\ast {\chi}_{2}({k}^{-1}\circ g\circ {g}_{2}^{-1}\circ {g}_{1})dk\hfill \\ \hfill =& ({\chi}_{1}\ast {\chi}_{2})(g\circ {g}_{2}^{-1}\circ {g}_{1}).\hfill \end{array}$$

Then, since the entropy integral on a unimodular Lie group is invariant under shifts, the equalities in (43) follow.

The natural extension of the Fisher information matrix for the case when *f*(*g*;** θ**) is a parametric distribution on a Lie group is

$${F}_{ij}(f,\theta )={\int}_{G}\frac{1}{f}\frac{\partial f}{\partial {\theta}_{i}}\frac{\partial f}{\partial {\theta}_{j}}dg.$$

(44)

In the case when ** θ** parameterizes

$${\phantom{\mid}\frac{\partial f}{\partial {\theta}_{i}}\mid}_{\theta =0}={\stackrel{~}{X}}_{i}^{r}f$$

and *F _{ij}*(

$${F}_{ij}^{r}\left(f\right)={\int}_{G}\frac{1}{f}\left({\stackrel{~}{X}}_{i}^{r}f\right)\left({\stackrel{~}{X}}_{j}^{r}f\right)dg.$$

(45)

In a similar way, we can define

$${F}_{ij}^{l}\left(f\right)={\int}_{G}\frac{1}{f}\left({\stackrel{~}{X}}_{i}^{l}f\right)\left({\stackrel{~}{X}}_{j}^{l}f\right)dg.$$

(46)

**Theorem 6.1.** The matrices with elements defined in (45) and (46) have the properties

$${F}_{ij}^{r}\left(L\right(h\left)f\right)={F}_{ij}^{r}\left(f\right)\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{F}_{ij}^{l}\left(R\right(h\left)f\right)={F}_{ij}^{l}\left(f\right)$$

(47)

*and*

$${F}_{ij}^{r}\left(I\right(f\left)\right)={F}_{ij}^{l}\left(f\right)\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{F}_{ij}^{l}\left(I\right(f\left)\right)={F}_{ij}^{r}\left(f\right)$$

(48)

*where* (*L*(*h*)*f*)(*g*) = *f*(*h*^{−1} ○ *g*), (*R*(*h*)*f*)(*g*) = *f*(*g* ○ *h*), and *I*(*f*)(*g*) = *f*(*g*^{−1}).

*Proof*. The operators ${\stackrel{~}{X}}_{i}^{l}$ and *R*(*h*) commute, and likewise ${\stackrel{~}{X}}_{i}^{r}$ and *L*(*h*) commute. This together with the invariance of integration under shifts proves (47). From the definitions of ${\stackrel{~}{X}}_{i}^{l}$ and ${\stackrel{~}{X}}_{i}^{r}$ in (11), it follows that

$$\begin{array}{cc}\hfill {\stackrel{~}{X}}_{i}^{r}\left(I\right(f\left)\right)\left(g\right)=& {\phantom{\mid}\left(\frac{d}{dt}f\left({[g\circ \mathrm{exp}(t{X}_{i}\left)\right]}^{-1}\right)\right)\mid}_{t=0}\hfill \\ \hfill =& {\phantom{\mid}\left(\frac{d}{dt}f\left(\mathrm{exp}\right(-t{X}_{i})\circ {g}^{-1})\right)\mid}_{t=0}\hfill \\ \hfill =& \left({\stackrel{~}{X}}_{i}^{l}f\right)\left({g}^{-1}\right).\hfill \end{array}$$

Using the invariance of integration under shifts then gives (48).

As a special case, when *f*(*g*) is a symmetric function, the left and right Fisher information matrices will be the same.

Note that the entries of Fisher matrices ${F}_{ij}^{r}\left(f\right)$ and ${F}_{ij}^{l}\left(f\right)$ implicitly depend on the choice of orthonormal Lie algebra basis {*X _{i}*}, and so it would be more descriptive to use the notation ${F}_{ij}^{r}(f,X)$ and ${F}_{ij}^{l}(f,X)$.

If a different orthonormal basis {*Y _{i}*} is used, such that

$${\stackrel{~}{X}}^{r}f=\sum _{i}{x}_{i}{\stackrel{~}{X}}_{i}^{r}f\phantom{\rule{1em}{0ex}}\text{where}\phantom{\rule{1em}{0ex}}X=\sum _{i}{x}_{i}{X}_{i},$$

means that

$${F}_{ij}^{r}(f,X)={\int}_{G}\frac{1}{f}\left(\sum _{k}{a}_{ik}{\stackrel{~}{Y}}_{k}^{r}f\right)\left(\sum _{i}{a}_{jl}{\stackrel{~}{Y}}_{l}^{r}f\right)dg=\sum _{k,l}{a}_{ik}{a}_{jl}{F}_{kl}^{r}(f,Y).$$

The same holds for ${F}_{ij}^{l}$. Summarizing these results in matrix form:

$${F}^{r}(f,X)=A{F}^{r}(f,Y){A}^{T}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{F}^{l}(f,X)=A{F}^{l}(f,Y){A}^{T}$$

(49)

where ${\mathbf{e}}_{i}^{T}A{\mathbf{e}}_{j}=({X}_{i},{Y}_{j})$. This means that the eigenvalues of the Fisher information matrix (and therefore its trace) are invariant under change of orthonormal basis.

Note that it follows immediately from (48) that the left and right Fisher information matrices are equal for class functions (i.e., functions satisfying the condition *f*(*g* ○ *h*) = *f*(*h* ○ *g*) for all *h*, *g* *G*) and for symmetric functions (i.e., functions satisfying the condition *f*(*g*) = *f*(*g*^{−1}) for all *g* *G*). But in general the left and right Fisher information matrices are not equal. Even the traces of the left and right Fisher information matrices for arbitrary pdfs on a unimodular Lie group will be different from each other in the general case.

The decrease of Fisher information as a result of convolution can be studied in much the same way as for pdfs on Euclidean space. Two approaches are taken here. First, a straightforward application of the Cauchy-Bunyakovsky-Schwarz (CBS) inequality is used together with the bi-invariance of the integral over a unimodular Lie group to produce a bound on the Fisher information of the convolution of two probability densities. Then, a tighter bound is obtained using the concept of conditional expectation in the special case when the pdfs commute under convolution.

**Theorem 6.2.** The following inequalities hold for the diagonal entries of the left and right Fisher information matrices:

$${F}_{ii}^{r}({f}_{1}\ast {f}_{2})\le \mathrm{min}\left\{{F}_{ii}^{r}\right({f}_{1}),{F}_{ii}^{r}({f}_{2}\left)\right\}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{F}_{ii}^{l}({f}_{1}\ast {f}_{2})\le \mathrm{min}\left\{{F}_{ii}^{l}\right({f}_{1}),{F}_{ii}^{l}({f}_{2}\left)\right\}.$$

(50)

*Proof*. The CBS inequality holds for groups:

$${\left({\int}_{G}a\left(g\right)b\left(g\right)dg\right)}^{2}\le {\int}_{G}{a}^{2}\left(g\right)dg{\int}_{G}{b}^{2}\left(g\right)dg.$$

If *a*(*g*) ≥ 0 for all values of *g*, then it is possible to define *j*(*g*) = [*a*(*g*)]^{½} and *k*(*g*) = [*a*(*g*)]^{½ }*b*(*g*), and since *j*(*g*)*k*(*g*) = *a*(*g*)*b*(*g*),

$$\begin{array}{cc}\hfill {\left({\int}_{G}a\left(g\right)b\left(g\right)dt\right)}^{2}\le & \left({\int}_{G}{j}^{2}\left(g\right)dg\right)\left({\int}_{G}{k}^{2}\left(t\right)dg\right)\hfill \\ \hfill =& \left({\int}_{G}a\left(g\right)dg\right)\left({\int}_{G}a\left(g\right){\left[b\right(g\left)\right]}^{2}dg\right).\hfill \end{array}$$

(51)

Using this version of the CBS inequality, and letting $b\left(g\right)={\stackrel{~}{X}}_{i}^{r}{f}_{2}({h}^{-1}\circ g)\u2215\left[{f}_{2}\right({h}^{-1}\circ g\left)\right]$ and *a*(*g*) = *f*_{1}(*h*)*f*_{2}(*h*^{−1} ○ *g*), essentially the same manipulations as in [12] can be used, with the roles of *f*_{1} and *f*_{2} interchanged due to the fact that in general for convolution on a Lie group (*f*_{1} * *f*_{2})(*g*) ≠ (*f*_{2} * *f*_{1})(*g*):

$$\begin{array}{cc}\hfill {F}_{ii}^{r}({f}_{1}\ast {f}_{2})=& {\int}_{G}\frac{{\left({\int}_{G}\left[{\stackrel{~}{X}}_{i}^{r}{f}_{2}\right({h}^{-1}\circ g)\u2215{f}_{2}({h}^{-1}\circ g\left)\right]\cdot \left[{f}_{2}\right({h}^{-1}\circ g\left){f}_{1}\right(h\left)\right]dh\right)}^{2}}{({f}_{1}\ast {f}_{2})\left(g\right)}dg\hfill \\ \hfill \le & {\int}_{G}\left({\int}_{G}\{{\left[{\stackrel{~}{X}}_{i}^{r}{f}_{2}\right({h}^{-1}\circ g\left)\right]}^{2}\u2215{f}_{2}({h}^{-1}\circ g\left)\right\}{f}_{1}\left(h\right)dh\right)dg\hfill \\ \hfill =& {\int}_{G}\left({\int}_{G}\{{\left[{\stackrel{~}{X}}_{i}^{r}{f}_{2}\right({h}^{-1}\circ g\left)\right]}^{2}\u2215{f}_{2}({h}^{-1}\circ g\left)\right\}dg\right){f}_{1}\left(h\right)dh\hfill \\ \hfill =& {F}_{ii}^{r}\left({f}_{2}\right){\int}_{G}{f}_{1}\left(h\right)dh\hfill \\ \hfill =& {F}_{ii}^{r}\left({f}_{2}\right)\hfill \end{array}$$

Since for a unimodular Lie group it is possible to perform changes of variables and inversion of the variable of integration without affecting the value of an integral, the convolution can be written in the following equivalent ways,

$$({f}_{1}\ast {f}_{2})\left(g\right)={\int}_{G}{f}_{1}\left(h\right){f}_{2}({h}^{-1}\circ g)dh$$

(52)

$$={\int}_{G}{f}_{1}(g\circ {h}^{-1}){f}_{2}\left(h\right)dh$$

(53)

$$={\int}_{G}{f}_{1}(g\circ h){f}_{2}\left({h}^{-1}\right)dh$$

(54)

$$={\int}_{G}{f}_{1}\left({h}^{-1}\right){f}_{2}(h\circ g)dh$$

(55)

It then follows that using (53) and the bi-invariance of integration that (50) holds.

In this subsection a better inequality is derived.

**Theorem 6.3.** The following inequality holds for the right and left Fisher information matrices:

$$\mathrm{tr}\left[{F}^{r}\right({\rho}_{1}\ast {\rho}_{2}\left)P\right]\le \mathrm{tr}\left[{F}^{r}\right({\rho}_{2}\left)P\right]\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}\mathrm{tr}\left[{F}^{l}\right({\rho}_{1}\ast {\rho}_{2}\left)P\right]\le \mathrm{tr}\left[{F}^{l}\right({\rho}_{1}\left)P\right]$$

(56)

*where i = 1, 2 and P is an arbitrary symmetric positive definite matrix with the same dimensions as F*.

*Proof*. Let

$${f}_{12}(h,g)={\rho}_{1}\left(h\right){\rho}_{2}({h}^{-1}\circ g).$$

Then *f*_{12}(*h*, *g*) is a pdf on *G* × *G* with marginal densities

$${f}_{1}\left(h\right)={\int}_{G}{f}_{12}(h,g)dg={\rho}_{1}\left(h\right)\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{f}_{2}\left(g\right)={\int}_{G}{f}_{12}(h,g)dh=({\rho}_{1}\ast {\rho}_{2})\left(g\right).$$

It follows that

$$\left({\stackrel{~}{X}}_{i}^{r}{f}_{2}\right)\left(g\right)={\int}_{G}{\rho}_{1}\left(h\right){\stackrel{~}{X}}_{i}^{r}{\rho}_{2}({h}^{-1}\circ g)dh.$$

Then by the change of variables *k* = *h*^{−1} ○ *g*,

$$\left({\stackrel{~}{X}}_{i}^{r}{f}_{2}\right)\left(g\right)={\int}_{G}{\rho}_{1}(g\circ {k}^{-1}){\stackrel{~}{X}}_{i}^{r}{\rho}_{2}\left(k\right)dk.$$

This means that

$$\frac{\left({\stackrel{~}{X}}_{i}^{r}{f}_{2}\right)\left(g\right)}{{f}_{2}\left(g\right)}={\int}_{G}\frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{2}\right)\left(k\right)}{{\rho}_{2}\left(k\right)}\frac{{\rho}_{1}(g\circ {k}^{-1}){\rho}_{2}\left(k\right)}{{f}_{2}\left(g\right)}dk=\langle \phantom{\mid}\frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{2}\right)\left(k\right)}{{\rho}_{2}\left(k\right)}\mid g\rangle ,$$

(57)

where ·*g* denotes conditional expectation. This notation, which is standard in the literature, includes the functional dependence of whatever is in the place of “·” even though this is integrated out and no longer exists [20, 38, 42].

Therefore, using this notation

$$\begin{array}{cc}\hfill {F}_{ii}^{r}\left({f}_{2}\right)=& \langle {\left(\frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{2}\right)\left(g\right)}{{f}_{2}\left(g\right)}\right)}^{2}\rangle =\langle {\langle \phantom{\mid}\frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{2}\right)\left(k\right)}{{\rho}_{2}\left(k\right)}\mid g\rangle}^{2}\rangle \hfill \\ \hfill \le & \langle \langle \phantom{\mid}{\left(\frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{2}\right)\left(k\right)}{{\rho}_{2}\left(k\right)}\right)}^{2}\mid g\rangle \rangle =\langle {\left(\frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{2}\right)\left(k\right)}{{\rho}_{2}\left(k\right)}\right)}^{2}\rangle \hfill \\ \hfill =& {F}_{ii}^{r}\left({\rho}_{2}\right).\hfill \end{array}$$

An analogous argument using *f*_{12}(*h*, *g*) = *ρ*_{1}(*g*○*h*^{−1})*ρ*_{2}(*h*) and *f*_{2}(*g*) = (*ρ*_{1}**ρ*_{2})(*g*) shows that

$$\frac{\left({\stackrel{~}{X}}_{i}^{l}{f}_{2}\right)\left(g\right)}{{f}_{2}\left(g\right)}=\langle \phantom{\mid}\frac{\left({\stackrel{~}{X}}_{i}^{l}{\rho}_{1}\right)\left(k\right)}{{\rho}_{1}\left(k\right)}\mid g\rangle $$

(58)

and

$${F}_{ii}^{l}\left({f}_{2}\right)\le {F}_{ii}^{l}\left({\rho}_{1}\right).$$

The above results can be written concisely by introducing an arbitrary positive definite diagonal matrix ∧ as follows:

$$\mathrm{tr}\left[{F}^{r}\right({\rho}_{1}\ast {\rho}_{2}\left)\Lambda \right]\le \mathrm{tr}\left[{F}^{r}\right({\rho}_{2}\left)\Lambda \right]\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}\mathrm{tr}\left[{F}^{l}\right({\rho}_{1}\ast {\rho}_{2}\left)\Lambda \right]\le \mathrm{tr}\left[{F}^{l}\right({\rho}_{1}\left)\Lambda \right].$$

If this is true in one basis, then using (49) the more general statement in (56) must follow in another basis where *P* = *P ^{T}* > 0. Since the initial choice of basis is arbitrary, (56) must hold in every basis for an arbitrary positive definite matrix

In some instances, even though the group is not commutative, the functions *ρ*_{1} and *ρ*_{2} will commute. For example, if *ρ*(*g* ○ *h*) = *ρ*(*h* ○ *g*) for all *h*, *g* *G*, then (*ρ***ρ _{i}*)(

$$\begin{array}{cc}\hfill \mathrm{tr}\left[{F}^{r}\right({\rho}_{1}\ast {\rho}_{2}\left)P\right]\phantom{\rule{1em}{0ex}}& \le \phantom{\rule{1em}{0ex}}\mathrm{min}\left\{\mathrm{tr}\right[{F}^{r}\left({\rho}_{1}\right)P],\mathrm{tr}[{F}^{r}\left({\rho}_{2}\right)P\left]\right\}\hfill \\ \hfill & \text{and}\hfill \\ \hfill \mathrm{tr}\left[{F}^{l}\right({\rho}_{1}\ast {\rho}_{2}\left)P\right]\phantom{\rule{1em}{0ex}}& \le \phantom{\rule{1em}{0ex}}\mathrm{min}\left\{\mathrm{tr}\right[{F}^{l}\left({\rho}_{1}\right)P],\mathrm{tr}[{F}^{l}\left({\rho}_{2}\right)P\left]\right\}\hfill \end{array}$$

(59)

**Theorem 6.4. ***When ρ*_{1} * *ρ*_{2} = *ρ*_{2} * *ρ*_{1 }*the following inequality holds*

$$\frac{2}{\mathrm{tr}\left[{F}^{r}\right({\rho}_{1}\ast {\rho}_{2}\left)P\right]}\ge \frac{1}{\mathrm{tr}\left[{F}^{r}\right({\rho}_{1}\left)P\right]}+\frac{1}{\mathrm{tr}\left[{F}^{r}\right({\rho}_{2}\left)P\right]}\phantom{\rule{1em}{0ex}}\text{for many}\phantom{\rule{1em}{0ex}}P={P}^{T}>0,$$

(60)

*and likewise for F ^{l}*.

*Proof*. Returning to (57) and (58), in the case when *ρ*_{1} * *ρ*_{2} = *ρ*_{2} * *ρ*_{1} it is possible to write

$$\frac{\left({\stackrel{~}{X}}_{i}^{r}{f}_{2}\right)\left(g\right)}{{f}_{2}\left(g\right)}=\langle \phantom{\mid}\frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{2}\right)\left(k\right)}{{\rho}_{2}\left(k\right)}\mid g\rangle =\mid \langle \frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{1}\right)\left(k\right)}{{\rho}_{1}\left(k\right)}\mid g\rangle $$

(61)

and

$$\frac{\left({\stackrel{~}{X}}_{i}^{l}{f}_{2}\right)\left(g\right)}{{f}_{2}\left(g\right)}=\langle \phantom{\mid}\frac{\left({\stackrel{~}{X}}_{i}^{l}{\rho}_{1}\right)\left(k\right)}{{\rho}_{1}\left(k\right)}\mid g\rangle =\langle \phantom{\mid}\frac{\left({\stackrel{~}{X}}_{i}^{l}{\rho}_{2}\right)\left({k}^{\prime}\right)}{{\rho}_{2}\left({k}^{\prime}\right)}\mid g\rangle .$$

Since the following calculation works the same way for both the ‘l’ and ‘r’ cases, consider only the ‘r’ case for now. Multiplying the first equality in (61) by 1 – *β* and the second by *β*, and adding together gives^{3}:

$$\frac{\left({\stackrel{~}{X}}_{i}^{r}{f}_{2}\right)\left(g\right)}{{f}_{2}\left(g\right)}=\beta \langle \phantom{\mid}\frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{1}\right)\left(k\right)}{{\rho}_{1}\left(k\right)}\mid g\rangle +(1-\beta )\langle \phantom{\mid}\frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{2}\right)\left({k}^{\prime}\right)}{{\rho}_{2}\left({k}^{\prime}\right)}\mid g\rangle )$$

for arbitrary value of *β*.

Now squaring both sides gives

$${\left[\frac{\left({\stackrel{~}{X}}_{i}^{r}{f}_{2}\right)\left(g\right)}{{f}_{2}\left(g\right)}\right]}^{2}={\left[\beta \langle \phantom{\mid}\frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{1}\right)\left(k\right)}{{\rho}_{1}\left(k\right)}\mid g\rangle +(1-\beta )\langle \phantom{\mid}\frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{2}\right)\left({k}^{\prime}\right)}{{\rho}_{2}\left({k}^{\prime}\right)}\mid g\rangle \right]}^{2}.$$

Taking the (unconditional) expectation, and using Jensen’s inequality yields:

$$\begin{array}{cc}\hfill \langle {\left(\frac{\left({\stackrel{~}{X}}_{i}^{r}{f}_{2}\right)\left(g\right)}{{f}_{2}\left(g\right)}\right)}^{2}\rangle =& \langle {\left[\beta \langle \phantom{\mid}\frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{1}\right)\left(k\right)}{{\rho}_{1}\left(k\right)}\mid g\rangle +(1-\beta )\langle \phantom{\mid}\frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{2}\right)\left({k}^{\prime}\right)}{{\rho}_{2}\left({k}^{\prime}\right)}\mid g\rangle \right]}^{2}\rangle \hfill \\ \hfill \le & {\beta}^{2}\langle {\left(\frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{1}\right)\left(k\right)}{{\rho}_{1}\left(k\right)}\right)}^{2}\rangle +{(1-\beta )}^{2}\langle {\left(\frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{2}\right)\left({k}^{\prime}\right)}{{\rho}_{2}\left({k}^{\prime}\right)}\right)}^{2}\rangle \hfill \\ \hfill & +2\beta (1-\beta )\langle \langle \phantom{\mid}\frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{1}\right)\left(k\right)}{{\rho}_{1}\left(k\right)}\mid g\rangle \cdot \langle \phantom{\mid}\frac{\left({\stackrel{~}{X}}_{i}^{r}{\rho}_{2}\right)\left({k}^{\prime}\right)}{{\rho}_{2}\left({k}^{\prime}\right)}\mid g\rangle \rangle .\hfill \end{array}$$

(62)

But observing (61), moving the rightmost term to the left, and writing 1–2*β*(1 – *β*) as (1 – *β*)^{2} + *β*^{2} reduces (62) to

$$[{(1-\beta )}^{2}+{\beta}^{2}]{F}_{ii}^{r}({\rho}_{1}\ast {\rho}_{2})\le {\beta}^{2}{F}_{ii}^{r}\left({\rho}_{1}\right)+{(1-\beta )}^{2}{F}_{ii}^{r}\left({\rho}_{2}\right).$$

(63)

Dividing both sides by [(1 – *β*)^{2} + *β*^{2}], multiplying by λ_{i} ≥ 0, and summing over *i* gives

$$\mathrm{tr}\left[\Lambda {F}^{r}\right({\rho}_{1}\ast {\rho}_{2}\left)\right]\le \frac{{\beta}^{2}}{[{(1-\beta )}^{2}+{\beta}^{2}]}\mathrm{tr}\left[\Lambda {F}^{r}\right({\rho}_{1}\left)\right]+\frac{{(1-\beta )}^{2}}{[{(1-\beta )}^{2}+{\beta}^{2}]}\mathrm{tr}\left[\Lambda {F}^{r}\right({\rho}_{2}\left)\right]$$

(64)

where = diag(λ_{1}, …, λ_{n}).

Clearly,

$$0\le \frac{{\beta}^{2}}{[{(1-\beta )}^{2}+{\beta}^{2}]},\frac{{(1-\beta )}^{2}}{[{(1-\beta )}^{2}+{\beta}^{2}]}\le 1.$$

Choosing

$$\frac{{\beta}^{2}}{[{(1-\beta )}^{2}+{\beta}^{2}]}=\frac{\mathrm{tr}\left[\Lambda {F}^{r}\right({\rho}_{2}\left)\right]}{\mathrm{tr}\left[\Lambda {F}^{r}\right({\rho}_{1}\left)\right]+\mathrm{tr}\left[\Lambda {F}^{r}\right({\rho}_{2}\left)\right]}$$

and

$$\frac{{(1-\beta )}^{2}}{[{(1-\beta )}^{2}+{\beta}^{2}]}=\frac{\mathrm{tr}\left[\Lambda {F}^{r}\right({\rho}_{1}\left)\right]}{\mathrm{tr}\left[\Lambda {F}^{r}\right({\rho}_{1}\left)\right]+\mathrm{tr}\left[\Lambda {F}^{r}\right({\rho}_{2}\left)\right]}$$

then gives

$$\mathrm{tr}\left[\Lambda {F}^{r}\right({\rho}_{1}\ast {\rho}_{2}\left)\right]\le \frac{2\mathrm{tr}\left[\Lambda {F}^{r}\right({\rho}_{1}\left)\right]\mathrm{tr}\left[\Lambda {F}^{r}\right({\rho}_{2}\left)\right]}{\mathrm{tr}\left[\Lambda {F}^{r}\right({\rho}_{1}\left)\right]+\mathrm{tr}\left[\Lambda {F}^{r}\right({\rho}_{2}\left)\right]}$$

This can be written as

$$\frac{1}{\mathrm{tr}\left[\Lambda {F}^{r}\right({\rho}_{1}\left)\right]}=\frac{1}{\mathrm{tr}\left[\Lambda {F}^{r}\right({\rho}_{2}\left)\right]}\le \frac{2}{\mathrm{tr}\left[\Lambda {F}^{r}\right({\rho}_{1}\ast {\rho}_{2}\left)\right]}.$$

(65)

Again, since the basis is arbitrary, can be replaced with *P*, resulting in (60).

Note that in the classical (Abelian) version of this equality, it is possible to get the stronger condition without the factor of 2 in (60).

Consider the group of 3 × 3 orthogonal matrices with determinant +1. Let ${\stackrel{~}{\mathbf{X}}}^{\mathbf{r}}={[{\stackrel{~}{X}}_{1}^{r}+{\stackrel{~}{X}}_{2}^{r}+{\stackrel{~}{X}}_{3}^{r}]}^{T}$ and ${\stackrel{~}{\mathbf{X}}}^{\mathbf{l}}={[{\stackrel{~}{X}}_{1}^{l}+{\stackrel{~}{X}}_{2}^{l}+{\stackrel{~}{X}}_{3}^{l}]}^{T}$. These two gradient vectors are related to each other by an adjoint matrix, which for this group is a rotation matrix [19]. Therefore, in the case when *G* = *SO*(3),

$${\Vert {\stackrel{~}{\mathbf{x}}}^{\mathbf{r}}f\Vert}^{2}={\Vert {\stackrel{~}{\mathbf{x}}}^{\mathbf{l}}f\Vert}^{2}\Rightarrow \mathrm{tr}\left[{F}^{r}\right(f\left)\right]=\left[{F}^{l}\right(f\left)\right]$$

Therefore, the inequalities in (59) will hold for pdfs on *SO*(3) regardless of whether or not the functions commute under convolution, but restricted to the condition *P* = *I*.

This section generalizes the de Bruijn identity, in which entropy rates are related to Fisher information.

**Theorem 7.1. ***Let f*_{D,h,t}(*g*) = *f*(*g*, *t*; *D*, **h**) denote the solution of the diffusion equation (18) with constant **h ***subject to the initial condition f*(*g*, 0; *D*, **h**) = *δ*(*g*). *Then for any well-behaved pdf α*(*g*),

$$\frac{d}{dt}S(\alpha \ast {f}_{D,\mathbf{h},t})=\frac{1}{2}\mathrm{tr}\left[D{F}^{r}\right(\alpha \ast {f}_{D,\mathbf{h},t}\left)\right].$$

(66)

*Proof*. It is easy to see that the solution of the diffusion equation

$$\frac{\partial \rho}{\partial t}=\frac{1}{2}\sum _{i,j=1}^{n}{D}_{ij}{\stackrel{~}{X}}_{i}^{r}{\stackrel{~}{X}}_{j}^{r}\rho -\sum _{k=1}^{n}{h}_{k}{\stackrel{~}{X}}_{k}^{r}\rho $$

(67)

subject to the initial conditions *ρ*(*g*, 0) = *α*(*g*) is simply *ρ*(*g*, *t*) = (*α***f*_{D,h,t)}(*g*). This follows because all derivatives “pass through” the convolution integral for *ρ*(*g*, *t*) and act on *f*_{D,h,t}(*g*).

Taking the time derivative of *S*(*ρ*(*g*, *t*)) we get

$$\frac{d}{dt}S\left(\rho \right)=-\frac{d}{dt}{\int}_{G}\rho (g,t)\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}\rho (g,t)dg=-{\int}_{G}\left\{\frac{\partial \rho}{\partial t}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}\rho +\frac{\partial \rho}{\partial t}\right\}dg.$$

(68)

Using (67), the partial with respect to time can be replaced with Lie derivatives. But

$${\int}_{G}{\stackrel{~}{X}}_{k}^{r}\rho dg={\int}_{G}{\stackrel{~}{X}}_{i}^{r}{\stackrel{~}{X}}_{j}^{r}\rho dg=0,$$

so the second term on the right side of (68) completely disappears. Using the integration-by-parts formula^{4}

$${\int}_{G}{f}_{1}{\stackrel{~}{X}}_{k}^{r}{f}_{2}dg=-{\int}_{G}{f}_{2}{\stackrel{~}{X}}_{k}^{r}{f}_{1}dg,$$

with *f*_{1} = log *ρ* and *f*_{2} = *ρ* then gives

$$\begin{array}{cc}\hfill \frac{d}{dt}S(\alpha \ast {f}_{D,\mathbf{h},t})=& \frac{1}{2}\sum _{i,j=1}^{n}{D}_{ij}{\int}_{G}\frac{1}{\alpha \ast {f}_{D,\mathbf{h},t}}{\stackrel{~}{X}}_{j}^{r}(\alpha \ast {f}_{D,\mathbf{h},t}){\stackrel{~}{X}}_{i}^{r}(\alpha \ast {f}_{D,\mathbf{h},t})dg\hfill \\ \hfill =& \frac{1}{2}\sum _{i,j=1}^{n}{D}_{ij}{F}_{ij}^{r}(\alpha \ast {f}_{D,\mathbf{h},t})\hfill \\ \hfill =& \frac{1}{2}\mathrm{tr}\left[D{F}^{r}\right(\alpha \ast {f}_{D,\mathbf{h},t}\left)\right].\hfill \end{array}$$

The implication of this is that

$$S(\alpha \ast {f}_{D,\mathbf{h},{t}_{2}})-S(\alpha \ast {f}_{D,\mathbf{h},{t}_{1}})=\frac{1}{2}{\int}_{{t}_{1}}^{{t}_{2}}\mathrm{tr}\left[D{F}^{r}\right(\alpha \ast {f}_{D,\mathbf{h},t}\left)\right]dt$$

In this section information-theoretic identities are derived from Log-Sobolev inequalities. Subsection 8.1 provides a brief review of Log-Sobolev inequalities. Subsection 8.2 then uses these to write information-theoretic inequalities.

The log-Sobolev inequality can be stated as [5, 6, 46]:

$${\int}_{{\mathbb{R}}^{n}}{\mid \psi \left(\mathbf{x}\right)\mid}^{2}\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}{\mid \psi \left(\mathbf{x}\right)\mid}^{2}\phantom{\rule{thinmathspace}{0ex}}d\mathbf{x}\le \frac{n}{2}\mathrm{log}\left[\frac{2}{\pi en}{\int}_{{\mathbb{R}}^{n}}{\Vert \nabla \psi \Vert}^{2}d\mathbf{x}\right]$$

(69)

where

$$\nabla \psi ={\left[\frac{\partial \psi}{\partial {x}_{1}},\dots ,\frac{\partial \psi}{\partial {x}_{n}}\right]}^{T}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\int}_{{\mathbb{R}}^{n}}{\mid \psi \left(\mathbf{x}\right)\mid}^{2}d\mathbf{x}=1.$$

Here log = log_{e}. Actually, there is a whole family of log-Sobolev inequalities, and (69) represents the tightest of these. The original form of the log-Sobolev inequality as introduced by Gross in [32] is

$$\frac{1}{2}{\int}_{{\mathbb{R}}^{n}}{\mid \varphi \left(\mathbf{x}\right)\mid}^{2}\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}{\mid \varphi \left(\mathbf{x}\right)\mid}^{2}\phantom{\rule{thinmathspace}{0ex}}\rho \left(\mathbf{x}\right)d\mathbf{x}\le {\int}_{{\mathbb{R}}^{n}}{\Vert \nabla \varphi \left(\mathbf{x}\right)\Vert}^{2}\phantom{\rule{thinmathspace}{0ex}}\rho \left(\mathbf{x}\right)d\mathbf{x}+{\Vert \varphi \Vert}_{{L}^{2}({\mathbb{R}}^{n},\rho )}^{2}\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}{\Vert \varphi \Vert}_{{L}^{2}({\mathbb{R}}^{n},\rho )}^{2}$$

(70)

where

$${\Vert \varphi \Vert}_{{L}^{2}({\mathbb{R}}^{n},\rho )}^{2}={\int}_{{\mathbb{R}}^{n}}{\mid \varphi \left(\mathbf{x}\right)\mid}^{2}\rho \left(\mathbf{x}\right)d\mathbf{x}.$$

Here *ρ*(**x**) = *ρ*(**x**, 0) = (2*π*)^{−n/2} exp(–**x**^{2}/2) is the solution of the heat equation on ${\mathbb{R}}^{n}$ evaluated at *t* = 1.

Several different variations exist. For example, by rescaling, it is possible to rewrite (70) with *ρ*(**x**, *t*) in place of *ρ*(**x**) by introducing a multiplicative factor of *t* in the first term on the right hand side of the equation. Or, by letting *ϕ*(**x**) = *ρ*^{−½} (**x**)*ψ*(**x**/*a*) for some scaling factor *a* > 0, substituting into (70), and integrating by parts then gives [46]

$${\int}_{{\mathbb{R}}^{n}}{\mid \psi \left(\mathbf{x}\right)\mid}^{2}\mathrm{log}\frac{{\mid \psi \left(\mathbf{x}\right)\mid}^{2}}{{\Vert \psi \Vert}_{2}^{2}}d\mathbf{x}+n(1+\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}a){\Vert \psi \Vert}_{2}^{2}\le \frac{{a}^{2}}{\pi}{\int}_{{\mathbb{R}}^{n}}{\Vert \nabla \psi \left(\mathbf{x}\right)\Vert}^{2}d\mathbf{x}$$

where

$${\Vert \psi \Vert}_{2}^{2}={\int}_{{\mathbb{R}}^{n}}{\mid \psi \left(\mathbf{x}\right)\mid}^{2}d\mathbf{x}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\Vert \nabla \psi \left(\mathbf{x}\right)\Vert}^{2}=\nabla \psi \left(\mathbf{x}\right)\cdot \nabla \psi \left(\mathbf{x}\right).$$

This, together with an optimization over *a* gives (69).

Gross subsequently extended (70) to Lie groups [33] as

$$\begin{array}{cc}\hfill {\int}_{G}\left\{{\mid \varphi \left(g\right)\mid}^{2}\mathrm{log}\mid \varphi \left(g\right)\mid \right\}\rho (g,t)dg\le & {c}_{G}\left(t\right){\int}_{G}{\Vert \left(\stackrel{~}{X}\varphi \right)\left(g\right)\Vert}^{2}\rho (g,t)dg\hfill \\ \hfill & +{\Vert \varphi \Vert}_{{L}^{2}(G,{\rho}_{t})}^{2}\mathrm{log}{\Vert \varphi \Vert}_{{L}^{2}(G,{\rho}_{t})}^{2}\hfill \end{array}$$

(71)

where *ρ*(*g*, *t*) is the solution of the diffusion equation in (67) with *h _{i}* = 0,

$$\stackrel{~}{X}\varphi ={[{\stackrel{~}{X}}_{1}^{r}\varphi ,\dots ,{\stackrel{~}{X}}_{n}^{r}\psi ]}^{T}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}{\Vert \varphi \Vert}_{{L}^{2}(G,{\rho}_{t})}^{2}={\int}_{G}{\mid \varphi \left(g\right)\mid}^{2}\rho (g,t)dg.$$

In (71) the scalar function *c _{G}*(

In analogy with the way that (69) evolved from (70), a descendent of (71) for noncompact unimodular Lie groups is [2, 5, 6]

$${\int}_{G}{\mid \psi \left(g\right)\mid}^{2}\mathrm{log}{\mid \psi \left(g\right)\mid}^{2}dg\le \frac{n}{2}\mathrm{log}\left[\frac{2{C}_{G}}{\pi en}{\int}_{G}{\Vert \stackrel{~}{X}\psi \Vert}^{2}dg\right]$$

(72)

The only difference is that, to the author’s knowledge, the sharp factor *C _{G}* in this expression is not known for most Lie groups. The information-theoretic interpretation of these inequalities is provided in the following subsection.

For our purposes the form in (69) will be most useful. It is interesting to note in passing that Beckner has extended this inequality to the case where the domain, rather than being ${\mathbb{R}}^{n}$, is the hyperbolic space ${\mathbb{H}}^{2}\cong \mathit{SL}(2,\mathbb{R})\u2215\mathit{SO}\left(2\right)$ and the Heisenberg groups *H*(*n*), including *H*(1) [5, 6]. Our goal here is to provide an information-theoretic interpretation of the inequalities from the previous section.

**Theorem 8.1.** Entropy powers and Fisher information are related as

$${\left[N\right(f\left)\right]}^{-1}\le \frac{1}{n}\mathrm{tr}\left(F\right)\phantom{\rule{1em}{0ex}}\text{where}\phantom{\rule{1em}{0ex}}N\left(f\right)=\frac{{C}_{G}}{2\pi e}\mathrm{exp}\left[\frac{2}{n}S\left(f\right)\right].$$

(73)

*Proof*. We begin by proving (73) for $G=({\mathbb{R}}^{n},+)$. Making the simple substitution *f*(**x**) = *ψ*(**x**)^{2} into (69) and requiring that *f*(**x**) be a pdf gives

$${\int}_{{\mathbb{R}}^{n}}f\left(\mathbf{x}\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}f\left(\mathbf{x}\right)d\mathbf{x}\le \frac{n}{2}\mathrm{log}\left[\frac{1}{2\pi en}{\int}_{{\mathbb{R}}^{n}}\frac{1}{f}{\Vert \nabla f\Vert}^{2}d\mathbf{x}\right].$$

or

$$-S\left(f\right)\le \frac{n}{2}\mathrm{log}\frac{\mathrm{tr}\left(F\right)}{2\pi en}\Rightarrow \mathrm{exp}\left[-\frac{2}{n}S\left(f\right)\right]\le \frac{\mathrm{tr}\left(F\right)}{2\pi en}\Rightarrow {\left[N\right(f\left)\right]}^{-1}\le \frac{1}{n}\mathrm{tr}\left(F\right).$$

(74)

Here *S*(*f*) is the Boltzmann-Shannon entropy of *f* and *F* is the Fisher information matrix. As is customary in information theory, the entropy power can be defined as *N*(*f*) in (73) with *C _{G}* = 1. Then the log-Sobolev inequality in the form in (74) is written as (73).

For the more general case, starting with (72) and letting *f*(*g*) = *ψ*(*g*)^{2} gives

$${\int}_{G}f\left(g\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}f\left(g\right)dg\le \frac{n}{2}\mathrm{log}\left[\frac{{C}_{G}}{2\pi en}{\int}_{G}\frac{1}{f}{\Vert \stackrel{~}{X}\Vert}^{2}\right]dg\Rightarrow -S\le \frac{n}{2}\mathrm{log}\left[\frac{{C}_{G}}{2\pi en}\mathrm{tr}\left(F\right)\right]$$

(75)

The rest is the same as for the case of ${\mathbb{R}}^{n}$.

Starting with Gross’s original form of log-Sobolev inequalities involving the heat kernel, the following information-theoretic inequality results:

**Theorem 8.2.** The Kullback-Leibler divergence and Fisher-Information distance of any arbitrary pdf and the heat kernel are related as

$${D}_{KL}(f\parallel {\rho}_{t})\le \frac{{c}_{G}\left(t\right)}{2}{D}_{FI}(f\parallel {\rho}_{t})$$

(76)

*where in general given f*_{1}(*g*) *and f*_{2}(*g*),

$${D}_{FI}({f}_{1}\parallel {f}_{2})={\int}_{G}{\Vert \frac{1}{{f}_{1}}\stackrel{~}{X}{f}_{1}-\frac{1}{{f}_{2}}\stackrel{~}{X}{f}_{2}\Vert}^{2}{f}_{1}dg.$$

(77)

*Proof*. Starting with (71), let *ψ*(*g*, *t*) = [*ρ*(*g*, *t*)]^{−½} [*f*(*g*)]^{½} where *f*(*g*) is a pdf. Then

$${\int}_{G}{\mid \psi (g,t)\mid}^{2}\rho (g,t)dg={\int}_{G}f\left(g\right)dg=1$$

and so $\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}{\Vert \varphi \Vert}_{{L}^{2}(G,{\rho}_{t})}^{2}=0$, and we have

$$\frac{1}{2}{\int}_{G}f\left(g\right)\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\frac{f\left(g\right)}{\rho (g,t)}dg\le {\int}_{G}{\Vert \stackrel{~}{X}\left({\left[\rho \right(g,t\left)\right]}^{-\frac{1}{2}}{\left[f\right(g\left)\right]}^{\frac{1}{2}}\right)\Vert}^{2}\rho (g,t)dg.$$

By using the chain rule and product rule for differentiation,

$$\stackrel{~}{X}\left({\left[\rho \right(g,t\left)\right]}^{-\frac{1}{2}}{\left[f\right(g\left)\right]}^{\frac{1}{2}}\right)=\frac{1}{2}{f}^{-\frac{1}{2}}\stackrel{~}{X}f-\frac{1}{2}{f}^{\frac{1}{2}}{{\rho}_{t}}^{-1}{\stackrel{~}{X}}_{{\rho}_{t}}.$$

Substititution into the right hand side of (71) then gives (76).

In the functional analysis community several connections between log-Sobolev inequalities on ${\mathbb{R}}^{n}$ and information theory have emerged. For example, Carlen [14] addresses Theorem 7.1 for the case of $G={\mathbb{R}}^{n}$. Ledoux [43, 44], Dembo [27], Talagrand [70], and Otto and Villani [56] addresses the connection between entropy and gradients of pdfs in the context of so-called “concentration of measure” phenomena related to logarithmic Sobolev inequalities. However, these studies are not usually concerned with the Lie-group setting. Moreover, the author has not found analogs of (74) in the context of Lie groups in the literature.

Given a pdf $f\in \mathcal{N}\left(G\right)$ that is in addition unimodal and decays rapidly from its mode (in the precise sense described in [69]), its *mean* is defined here as the point *μ* *G* such that [79]

$${\int}_{G}{\left(\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}g\right)}^{\vee}f(\mu \circ g)dg=0.$$

(78)

Unlike in ${\mathbb{R}}^{n}$, in which a mean can be computed for any pdf, in the Lie-group setting it is important to restrict the class of pdfs for the concept of mean to make sense. If not for such restrictions, the usefulness of the concept of the mean would diminish. For example, for the uniform distribution on *SO*(3) every point could be called a mean.

The covariance of a concentrated probability density centered around *μ* can be defined as [79]

$$\Sigma ={\int}_{G}{\left(\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}g\right)}^{\vee}{\left[{\left(\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}g\right)}^{\vee}\right]}^{T}f(\mu \circ g)dg.$$

(79)

This matrix will have finite values when *f*(*g*) is rapidly decreasing. Note that this concept of covariance differs from those presented in [31, 36], which are more akin to the dispersion defined in Theorem 5.2. The definitions in (78) and (79) are used in the theorem below.

**Theorem 9.1. ***Let* $\rho \left(g\right)\in \mathcal{N}\left(G\right)$ *be a pdf with the additional symmetry condition ρ*(*g*) = *ρ*(*g*^{−1}) *and set f*(*g*; *μ*) = *ρ*(*μ*^{−1} ○ *g*). Given an unbiased estimator of μ, then the Cramér-Rao bound

$$\Sigma \ge {F}^{-1}$$

(80)

*holds for sufficiently small* ∑, *where* ∑ and F are defined in (79) and (44) and the above matrix inequality is interpreted as each eigenvalue of ∑ – *F*^{−1 }*being non-negative*.

*Proof*. For a symmetric pdf, *ρ*(*g*) = *ρ*(*g*^{−1}), the mean is at the identity, and so

$${\int}_{G}{\left(\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}g\right)}^{\vee}\rho \left(g\right)dg=0.$$

(81)

The invariance of integration under shifts then gives

$$\varphi \left(\mu \right)={\int}_{G}{\left(\mathrm{log}\right({\mu}^{-1}\circ g\left)\right)}^{\vee}\rho ({\mu}^{-1}\circ g)dg=0.$$

(82)

Applying the derivatives ${\stackrel{~}{X}}_{i}^{r}$ gives an expression for ${\stackrel{~}{X}}_{i}^{r}\varphi \left(\mu \right)=0$ that can be expanded under the integral using the product rule ${\stackrel{~}{X}}_{i}^{r}(a\cdot b)=\left({\stackrel{~}{X}}_{i}^{r}a\right)\cdot b+a\cdot \left({\stackrel{~}{X}}_{i}^{r}b\right)$ where in the present case *a* = (log(*μ*^{−1} ○ *g*))^{} and *b* = *ρ*(*μ*^{−1} ○ *g*). Note that when *ρ*(·) is highly concentrated, the only values of *g* that significantly contribute to the integral are those for which *μ*^{−1} ○ *g* ≈ *e*. By definition

$$\begin{array}{cc}\hfill {\stackrel{~}{X}}_{i}^{r}{\left(\mathrm{log}\right({\mu}^{-1}\circ g\left)\right)}^{\vee}=& {\phantom{\mid}\frac{d}{dt}\mathrm{log}{\left(\right({(\mu \circ {e}^{t{X}_{i}})}^{-1}\circ g\left)\right)}^{\vee}\mid}_{t=0}\hfill \\ \hfill =& {\phantom{\mid}\frac{d}{dt}{\left[\mathrm{log}\right({e}^{-t{X}_{i}}\circ {\mu}^{-1}\circ g\left)\right]}^{\vee}\mid}_{t=0}.\hfill \end{array}$$

Using the Baker-Campbell-Hausdorff formula

$$\mathrm{log}\left({e}^{X}{e}^{Y}\right)\approx X+Y+\frac{1}{2}[X,Y]$$

with *X* = −*tX _{i}* and

$${\int}_{G}\left[{\stackrel{~}{X}}_{i}^{r}{\left(\mathrm{log}\right({\mu}^{-1}\circ g\left)\right)}^{\vee}\right]\rho ({\mu}^{-1}\circ g)dg\approx -{\mathbf{e}}_{i}.$$

(83)

The second term in the expansion of ${\stackrel{~}{X}}_{i}^{r}\varphi \left(\mu \right)$ is

$$\begin{array}{c}\hfill {\phantom{\mid}{\int}_{G}{\left[\mathrm{log}\right({\mu}^{-1}\circ g\left)\right]}^{\vee}\rho ({e}^{-t{X}_{i}}\circ {\mu}^{-1}\circ g)dg\mid}_{t=0}=\hfill \\ \hfill {\phantom{\mid}{\int}_{G}{\left[\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}h\right]}^{\vee}\rho ({e}^{-t{X}_{i}}\circ h)dh\mid}_{t=0}\hfill \end{array}$$

where the change of variables *h* = *μ*^{−1} ○ *g* has been made. Using the symmetry of *ρ* gives *ρ*(*e*^{−tXi} ○ *h*) = *ρ*(*h*^{−1} ○*e ^{tXi}*), and making the change of variables

Recall that the Gaussian distribution on ${\mathbb{R}}^{n}$ has a number of remarkable properties including: (1) it is closed under the operation of convolution; (2) it solves a linear diffusion equation with constant coefficients; (3) it is the maximum entropy distributution subject to constraints on the mean and covariance. A natural question to ask is whether such a distribution exists on unimodular Lie groups. With regard to (1) and (2), the answer is certainly yes, and this kind of Gaussian distribution appeared as the solution of (18) subject to Dirac delta initial conditions. However, this is not necessarily the maximum entropy distribution subject to covariance constraints.

Equipped with a concept of mean and covariance, the concept of a maximum entropy distribution on a unimodular Lie group subject to constraints on the mean and covariance can be defined and computed in the usual way using Lagrange multipliers, and the result is of the form

$$\rho (g;\mu ,\Sigma )=\frac{1}{c(\mu ,\Sigma )}\mathrm{exp}\left(-\frac{1}{2}{\left[\mathrm{log}\right({\mu}^{-1}\circ g\left)\right]}^{\vee}\cdot {\Sigma}^{-1}{\left[\mathrm{log}\right({\mu}^{-1}\circ g\left)\right]}^{\vee}\right)$$

(84)

where

$$c(\mu ,\Sigma )={\int}_{G}\mathrm{exp}\left(-\frac{1}{2}{\left[\mathrm{log}\right({\mu}^{-1}\circ g\left)\right]}^{\vee}\cdot {\Sigma}^{-1}{\left[\mathrm{log}\right({\mu}^{-1}\circ g\left)\right]}^{\vee}\right)dg.$$

Such distributions have been studied in [80, 79] in the context of how the covariance of convolved Gaussians can be obtained from the covariances of those being convolved. As ∑ becomes small, *ρ*(*g*; *e*, ∑) converges to the solution of a driftless diffusion with Dirac-delta initial conditions at the identity, and ∑ = *tD*. In this case exponential coordinates *g* = exp *X* become Cartesian coordinates near the identity and *dg* ≈ *dx* (the Lebesgue measure) by identifying the Lie algebra with ${\mathbb{R}}^{n}$. In this limit the usual Gaussian on ${\mathbb{R}}^{n}$ results, *c*(** μ**, ∑) = (2

However, as ∑ (or *tD*) becomes larger, then the concepts of Gaussians on Lie groups as solutions to diffusion equations and as maximum entropy distributions become inconsistent with each other. Each of these concepts of Gaussian distribution has its advantages in different scenarios, and both are used in the following section.

Assume that the nonholonomic kinematic-cart robot with two independently actuated wheels shown in Figure 1 moves in a square room of known size. Relative to a frame of reference fixed in the room, the velocity of a frame of reference fixed in the robot is a function of the wheel speeds. The equations for this system are well-known. See for example [13, 55, 84]. The reference frame that results from numerically integrating these nonholonomic governing equations can be thought of as the time-dependent rigid-body motion *g*(*t*) = *g*(*x*(*t*), *y*(*t*), *θ*(*t*)) *SE*(2). If the robot’s motion could be observed with infinite precision, then *g*(*t*) would be known for all times *t* ≥ 0. But, of course, infinitely precise observations are not possible. And the question becomes one of estimating *g*(*t*) given whatever sensory data is available. Such problems have become popular over the past decade, as reviewed in [54, 72].

In the present example, it is assumed that two noisy sensing modalities are present on the robot: (1) the wheel speeds can be measured (or calculated from wheel angle measurements) with sensors co-located with the motors resulting; (2) two range sensors fixed on the robot (one in front and one in back) point directly forward and behind the robot along its symmetry axis, and measure the distance to the walls ahead of and behind the robot. The scenario is that the robot starts at the center of the room with known initial heading, *θ* = 0, estimates *g*(*t*) from odometry for *t* [0, *T*], and then switches on its range sensors at time *T*. Given models for the noise in both sensing modalities, how should these measurements be pooled to obtain the best estimate of the robot’s current pose ? And how can the quality of this estimate be compared with the estimates obtained from each individual sensing modality ? Several of the theorems derived earlier in this paper can be used to address these problems.

This is the question of fusing odometric (or “dead-reckoning”) data with range data. In odometry the nonholonomic kinematic equations of motion are numerically integrated given knowledge of the wheel speeds as a function of time. However such measurements are noisy, and so the path that the robot actually takes diverges from the predicted one one as time increases. The result is a conditional probability density $f({g}_{\text{act}}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{g}_{\mathit{odo}})\in \mathcal{N}\left(\mathit{SE}\right(2)\times \mathit{SE}(2\left)\right)$ where *g _{act}* denotes the actual pose of the robot and

Let the two wheels each have radii *r*, and let the wheelbase be denoted as *L*. Imagine that the angles through which the wheels turn around their axes are governed by stochastic differential equations of the form

$$d{\varphi}_{1}=\omega dt+\sqrt{D}d{w}_{1}$$

(85)

$$d{\varphi}_{2}=\omega dt+\sqrt{D}d{w}_{2}$$

(86)

where *dw _{i}* each represent uncorrelated unit-strength white noise,

$$\left(\begin{array}{c}\hfill dx\hfill \\ \hfill dy\hfill \\ \hfill d\theta \hfill \end{array}\right)=\left(\begin{array}{c}\hfill r\omega \phantom{\rule{thinmathspace}{0ex}}\mathrm{cos}\phantom{\rule{thinmathspace}{0ex}}\theta \hfill \\ \hfill r\omega \phantom{\rule{thinmathspace}{0ex}}\mathrm{sin}\phantom{\rule{thinmathspace}{0ex}}\theta \hfill \\ \hfill 0\hfill \end{array}\right)dt+\sqrt{D}\left(\begin{array}{cc}\hfill \frac{r}{2}\phantom{\rule{thinmathspace}{0ex}}\mathrm{cos}\phantom{\rule{thinmathspace}{0ex}}\theta \hfill & \hfill \frac{r}{2}\phantom{\rule{thinmathspace}{0ex}}\mathrm{cos}\phantom{\rule{thinmathspace}{0ex}}\theta \hfill \\ \hfill \frac{r}{2}\phantom{\rule{thinmathspace}{0ex}}\mathrm{sin}\phantom{\rule{thinmathspace}{0ex}}\theta \hfill & \hfill \frac{r}{2}\mathrm{sin}\phantom{\rule{thinmathspace}{0ex}}\theta \hfill \\ \hfill \frac{r}{L}\hfill & \hfill -\frac{r}{L}\hfill \end{array}\right)\left(\begin{array}{c}\hfill d{w}_{1}\hfill \\ \hfill d{w}_{2}\hfill \end{array}\right).$$

(87)

Usually to be precise one must specify whether an SDE is of Itô or Stratonovich type. In this example both interpretations yield the same equation and this distinction is unimportant. But for the sake of definiteness, take (87) to be an Itô equation.

If such an equation is simulated many times, each time starting from the same initial conditions (say, *x* = *y* = *θ* = 0), then a function, *f*(*x*, *y*, *θ*; *t*) that records the distribution of positions and orientations of the cart at the same value of time, *t*, in each trajectory can be defined. (This pdf also depends on *r*, *L*, *ω*, and *D*, but the dependence on these constants is suppressed.)

As explained in detail in [84, 20], a well-developed theory for linking stochastic differential equations such as (87) to functions such as *f*(*x*, *y*, *θ*; *t*) exists. This theory produces a *Fokker-Planck equation* for *f*(*x*, *y*, *θ*; *t*). In the present context, this equation is of the form [84]

$$\begin{array}{c}\hfill \frac{\partial f}{\partial t}=-r\omega \phantom{\rule{thinmathspace}{0ex}}\mathrm{cos}\phantom{\rule{thinmathspace}{0ex}}\theta \frac{\partial f}{\partial x}-r\omega \phantom{\rule{thinmathspace}{0ex}}\mathrm{sin}\phantom{\rule{thinmathspace}{0ex}}\theta \frac{\partial f}{\partial y}+\hfill \\ \hfill \frac{D}{2}\left(\frac{{r}^{2}}{2}\phantom{\rule{thinmathspace}{0ex}}{\mathrm{cos}}^{2}\phantom{\rule{thinmathspace}{0ex}}\theta \frac{{\partial}^{2}f}{\partial {x}^{2}}+\frac{{r}^{2}}{2}\phantom{\rule{thinmathspace}{0ex}}\mathrm{sin}\phantom{\rule{thinmathspace}{0ex}}2\theta \frac{{\partial}^{2}f}{\partial x\partial y}+\frac{{r}^{2}}{2}\phantom{\rule{thinmathspace}{0ex}}{\mathrm{sin}}^{2}\phantom{\rule{thinmathspace}{0ex}}\theta \frac{{\partial}^{2}f}{\partial {y}^{2}}+\frac{2{r}^{2}}{{L}^{2}}\frac{{\partial}^{2}f}{\partial {\theta}^{2}}\right),\hfill \end{array}$$

which is subject to the initial conditions *f*(*x*, *y*, *θ*; 0) = *δ*(*x* – 0)*δ*(*y* – 0)*δ*(*θ* – 0).

There is a coordinate-free way of writing (87) and the above equation. Namely,

$${\left({g}^{-1}\frac{dg}{dt}\right)}^{\vee}dt=r\omega {\mathbf{e}}_{1}dt+\frac{r\sqrt{D}}{2}\left(\begin{array}{cc}\hfill 1\hfill & \hfill 1\hfill \\ \hfill 0\hfill & \hfill 0\hfill \\ \hfill 2\u2215L\hfill & \hfill -2\u2215L\hfill \end{array}\right)d\mathbf{w}.$$

The coordinate-free version of the Fokker-Planck equation above can be written compactly in terms of these Lie derivatives as [84]

$$\frac{\partial f}{\partial t}=-r\omega {\stackrel{~}{X}}_{1}f+\frac{{r}^{2}D}{4}{\left({\stackrel{~}{X}}_{1}\right)}^{2}f+\frac{{r}^{2}D}{{L}^{2}}{\left({\stackrel{~}{X}}_{3}\right)}^{2}f$$

(88)

with initial conditions *f*(*g*; 0) = *δ*(*g*). The resulting time-evolving pdf is denoted as *f*(*g*; *t*), or with the shorthand *f _{t}*(

While effcient techniques for solving this sort of equation exist for both the long-time and short time cases (see e.g. [84, 79] and references therein), the emphasis in the current paper is not solution techniques, but rather an assessment of how pose information can be obtained from (88) directly.

If the robot continues to move for an additional amount of time, *t*_{2}, then the distribution will be updated as a convolution over *SE*(2) of the form

$${f}_{{t}_{1}+{t}_{2}}\left(g\right)=({f}_{{t}_{1}}\ast {f}_{{t}_{2}})\left(g\right).$$

(89)

That is, solutions to (88), or more generally (18)), subject to Dirac delta initial conditions form a commutative semigroup under the operation of group convolution.

The function *f*(*g*; *T*) generated by solving (88) is in fact *f*(*g _{act}*

In the scenario described at the beginning of this section, the robot is equipped with two range sensors arranged with their beams pointing in diametrically opposed directions on the *x*-axis passing through the center of the robot in Figure 1. If the range sensors could measure the distance to the walls exactly, and if the robot behaved exactly as the nonholonomic kinematic cart, then the robot could spin around its *z* axis through 180 degrees, generate an exact map of the environment and exactly know its location modulo symmetries. It is these symmetries that make the discussions of discrete groups, cosets, and double cosets in Sections 5.2 and 5.3 relevant to this application. For a robot that has 180-degree rotational symmetry around its *z*-axis, there is no distinction between *g _{act}* =

Suppose that the robot is placed at the pose *g _{act}* where it remains while its front and back range sensors take a stream of synchronized distance measurements. A natural estimator for the pairs of distances of the front and back of the robot to the walls that the two beams hit results from simply computing the sample mean and covariance of pairs of distances. A roboticist might then define a bivariate Gaussian distribution with mean and variance given by the sample mean and sample variance of these two sets of measured distances. This is an implicit use of the maximum entropy principle, i.e., with all other things being unknown, choose the maximum entropy distribution with specified mean and covariance, which is the Gaussian. The result is

Taking a stream of measurements when the robot is at a fixed pose certainly does not o er as much information as when it does a full sweep. But since it is implicit in this scenario that the geometry of the robot and the room are both known in advance, the above situation can be turned around. If we assume that *g _{act}* is not known, then we can ask for each pair of measured front/back distances what all possible poses of the robot can be in order for such distance measurements to be replicated, under the constraint that the robot fits in the room without penetrating any walls. From this, a maximum entropy distribution

In summary, an error model for distance measurements is presented that takes into account known symmetries of the robot and the room. Even if range finders could measure distance perfectly, due to symmetries, they would not be sufficient to localize in the room, but only to within a bounded region in the double coset described above. The odometry data, as imperfect as it is, provides a means to resolve the ambiguity in the range measurements. And information theory on groups provides a language in which to address questions such as: How much information is being provided by the range sensors vs. odometry ? And, how much improvement is there in estimates of *g _{act}* when data is pooled vs. using each sensor modality independently ? These questions can only be answered using the theorems presented previously in this paper after a strategy for pooling the measurements is articulated. One such strategy is the Baysian filter (see [72] and references therein), a version of which is described in the following subsection.

One form of Bayes’ rule, which holds for probabilities and pdfs alike, is

$$P(A\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}B,C)=\frac{p(B\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}A,C)p(A\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}C)}{p(B\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}C)}.$$

Taking *A* = *g _{act}*,

$$f({g}_{\text{act}}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{g}_{\mathit{dis}},{g}_{\mathit{odo}})=\frac{f({g}_{\mathit{dis}}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{g}_{\text{act}},{g}_{\mathit{odo}})f({g}_{\text{act}}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{g}_{\mathit{odo}})}{f({g}_{\mathit{dis}}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{g}_{\mathit{odo}})}.$$

If *g _{act}* is known, then knowledge of

$$f({g}_{\text{act}}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{g}_{\mathit{dis}},{g}_{\mathit{odo}})=\frac{f({g}_{\mathit{dis}}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{g}_{\text{act}})f({g}_{\text{act}}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{g}_{\mathit{odo}})}{f({g}_{\mathit{dis}}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{g}_{\mathit{odo}})}.$$

(90)

Each term on the right-hand side of this equation can be evaluated using the individual sensor models presented in the previous two subsections.

If nothing were known other than the speed of the robot, *ω*, and the duration of travel, *T*, and if *rωT* is less than the distance to the nearest wall, then the robot position will be constrained to be within a circular disk of radius *rωT*. If its orientation is completely unknown, then an upper bound on the entropy of *f*(*g _{act}*) (without conditioning on any odometry data) can be computed from the maximum entropy distribution on

$$S\left({f}_{\mathrm{max}\phantom{\rule{thinmathspace}{0ex}}\mathit{ent}}\right({g}_{\text{act}}\left)\right)\ge S\left(f\right({g}_{\text{act}}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{g}_{\mathit{odo}}\left)\right)\ge S\left(f\right({g}_{\text{act}}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{g}_{\mathit{dis}},{g}_{\mathit{odo}}\left)\right).$$

This follows from the general information-theoretic property that conditioning reduces entropy, and does not use any Lie-group properties. In contrast, inequalities that directly use the results of the theorems presented earlier in this paper are reviewed in the following subsection.

The theorems presented earlier in the paper describe relationships between several measures of dispersion for pdfs on Lie groups including entropy, covariance, and the (inverse) Fisher information matrix. The reason for performing the sensor fusion described above is to reduce the dispersion of the resulting pdf in the variable *g _{act}*

Since the dead reckoning distribution *f*(*g _{act}*

Due to the symmetry of the room and of the range-sensors, the entropy of the range-only sensing modality can be bounded using the result of Theorem 5.5, which holds both when the subgroups *K* and *H* are Lie groups and when they are finite. If both elements of *C*_{2}, and all four elements of *C*_{4}, are equally likely, computation of the entropies *S*(*f*_{C2}) and *S*(*f*_{C4}) becomes trivial, and *S*(*f*_{C4\SE(2)/C2}) is computed by focusing on a single asymmetrical region of the configuration space. Theorem 5.5 then allows for the bounding of the actual entropy as the sum of these individual quantities, each of which is easier to compute than a mixture model of the form

$$f({g}_{\text{act}}\phantom{\rule{thinmathspace}{0ex}}\mid \phantom{\rule{thinmathspace}{0ex}}{d}_{\text{front}},{d}_{\text{back}})=\frac{1}{8}\sum _{i=0}^{3}\sum _{j=0}^{1}{f}_{{C}_{4}\u2215\mathit{SE}\left(2\right)\u2215{C}_{2}}\left(g\right(0,0,i\pi \u22152)\circ {g}_{\text{act}}\circ g(0,0,j\pi \left)\right).$$

This mixture model reflects maximal uncertainty induced by the symmetries, but does not lend itself to closed-form entropy evaluations, which is one reason why the result of Theorem 5.5 is useful.

Since *f*(*g _{act}*

Using nothing more than the product rule for the Lie derivatives, the Fisher information matrix for *f*(*g _{act}*

The Fourier-space solution for dead-reckoning models are described in [84], and the *SE*(2) and *SE*(3) group-Fourier transforms, their properties, and applications are described in detail in [18, 19, 81], and $\widehat{f}\left(\lambda \right)$ is completely characterized for equations such as the odometry model in (88). Therefore, the bounds in Theorem 5.2 can be applied.

Theorems 5.7 and 5.8(b) are applicable to symmetric functions, including Gaussians on *SE*(2) with *μ* = *e*. The solution of the odometry equation (88) is not a symmetric function. However, *f _{t}*(

The bounds on entropy powers and relative information in (73) and (76) resulting from log-Sobolev inequalities provides a means to respectively evaluate and compare additional informational quantities associated with *f*(*g _{act}*

In summary, the efficacy of nine out of the fifteen theorems presented have been illustrated in the context of a single example in robotics. By establishing the language and principles with which to articulate information theory on Lie groups, other applications that utilize all of these theorems will be explored in the future. For example, Theorems 5.8(a) and 5.9 pertaining to convolution and class functions are not relevant to robot localization in *SE*(2) because there are no pdfs that are class functions in $\mathcal{N}\left(\mathit{SE}\right(2\left)\right)$ [19]. However, solutions to the heat equation on *SO*(3) are both symmetric and are class functions, and so these theorems become immediately applicable to spacecraft and submarine orientational estimation problems.

By collecting and reinterpreting results relating to the study of diffusion processes, harmonic analysis, and log-Sobolev inequalities on Lie groups, and merging these results with definitions of Fisher information matrix and covariance, many inequalities of information theory were extended here to the context of probability densities on unimodular Lie groups. In addition, the natural decomposition of groups into cosets, double cosets, and the nesting of subgroups provides some inequalities that result from the Kullback-Leibler divergence of probability densities on Lie groups. Some special inequalities related to finite groups (which are also unimodular) were also provided. One open issue is determining the conditions under which the entropy power inequality [8, 66] will hold for Lie groups.

While the emphasis of this paper was on the discovery of fundamental inequalities, the motivation for this study originated with applications in robotics and other areas. Indeed, it was the problem of quantifying the difficulty of robotic assembly [22, 45, 9] and self-repair [40] tasks using the concept of “parts entropy” [63] that led the author to link group theory and information theory. A detailed example in mobile robot localization was provided here to illustrate the efficacy of the presented theorems.

As another example, the idea that a robot or animal should move (in *SE*(2)) so as to maximize the rate at which its sensors gain information is attracting attention [73, 74, 75, 15, 52, 65, 77, 24, 60, 62]. And analogous problems can be formulated in medical imaging in which only x-rays in directions that maximize the generation of new information need be taken rather than exposing a patient to the radiation of a whole CT scan. Related to this problem is that of biomolecular structure determination from disparate data sets such as cryo-electron microscopy, NMR, x-ray crystallography, etc. Each is related to the structure of molecules, their ensemble motion, and/or their quantum state - all of which are described in terms of probability densities on Lie groups. A first step toward information fusion of data on Lie groups is the version of information theory developed here and demonstrated on an example in the context of a mobile robot.

Comments from Graham Beck, Valentina Staneva, and the reviewers were helpful in improving the presentation.

The author is supported by NIH Grant R01 GM075310 and NSF Grant IIS-0915542.

*2000 Mathematics Subject Classification*. Primary: 22E15, 94A15; Secondary: 22E70.

^{1}Here and throughout this paper, *r* denotes the fact that derivatives used in defining expressions appear on the ‘right’ and *l* denotes that they appear on the ‘left.’ This means that expressions with *r* are left invariant and those with *l* are right invariant. This convention is opposite that used in much of the mathematics literature.

^{2}It is not required that *G* = (*G*/*H*) × *H* for this to be true. For example, the integral over *SO*(3) decomposes into one over *S*^{2} and one over *S*^{1} even though *SO*(3) ≠ *S*^{3}.

^{3}The names of the dummy variables *k* and *k*’ are unimportant. However, at this stage it is important that the names be different in order to emphasize their statistical independence.

^{4}There are no surface terms because, like the circle and real line, each coordinate in the integral either wraps around or goes to infinity.

[1] Amari S, Nagaoka H. Methods of Information Geometry, Translations of Mathematical Monographs 191. American Mathematical Society; Providence, RI: 2000.

[2] Bakry D, Concordet D, Ledoux M. Optimal heat Kernel bounds under logarithmic Sobolev inequalities. ESAIM: Probability and Statistics. 1997;1:391–407.

[3] Baldwin G, Mahony R, Trumpf J. A Nonlinear Observer for 6 DOF Pose Estimation from Inertial and Bearing Measurements. IEEE International Conference on Robotics and Automation; Kobe, Japan. May, 2009.

[4] Barron AR. Entropy and the central limit theorem. Ann. Prob. 1986;14:336–342.

[5] Beckner W. Sharp inequalities and geometric manifolds. J. Fourier Anal. Appl. 1997;3:825–836.

[6] Beckner W. Essays on Fourier Analysis in Honor of Elias M. Stein. Princeton University Press; 1995. Geometric inequalities in Fourier analysis; pp. 36–68.

[7] Berg HC. E. coli in Motion. Springer; New York: 2003.

[8] Blachman NM. The convolution inequality for entropy powers. IEEE Trans. Inform. Theory. 1965;11:267–271.

[9] Boothroyd G. Assembly Automation and Product Design. Second Edition CRC Press; Boca Raton, FL: 2005.

[10] Brockett RW. System theory on group manifolds and coset spaces. SIAM J. Control. 1972;10:265–284.

[11] Brockett RW. Lie Algebras and Lie Groups in Control Theory. In: Mayne DQ, Brockett RW, editors. Geometric Methods in System Theory. Reidel Publishing Company; Dordrecht-Holland: 1973.

[12] Brown LD. A proof of the Central Limit Theorem motivated by the Cramér Rao inequality. In: Kallianpur G, Krishnaiah PR, Ghosh JK, editors. Statistics and Probability: Essays in Honour of C.R. Rao. North-Holland, New York: 1982. pp. 141–148.

[13] Bullo F, Lewis AD. Geometric Control of Mechanical Systems. Springer; 2004.

[14] Carlen EA. Superadditivity of Fishers Information and Logarithmic Sobolev Inequalities. Journal Of Functional Analysis. 1991;101:194–211.

[15] Censi A. On Achievable Accuracy for Pose Tracking. IEEE International Conference on Robotics and Automation; Kobe, Japan. May, 2009.

[16] Chirikjian GS. Fredholm integral equations on the Euclidean motion group. Inverse Problems. 1996;12:579–599.

[17] Chirikjian GS, Wang YF. Conformational statistics of stiff macromolecules as solutions to PDEs on the rotation and motion groups. Physical Review E. 2000;62:880–892. [PubMed]

[18] Chirikjian GS, Kyatkin AB. An operational calculus for the Euclidean motion group with applications in robotics and polymer science. J. Fourier Analysis and Applications. 2000;6:583–606.

[19] Chirikjian GS, Kyatkin AB. Engineering Applications of Noncommutative Harmonic Analysis. CRC Press; Boca Raton, FL: 2001.

[20] Chirikjian GS. Stochastic Models, Information Theory, and Lie groups. Vol. 1. Birkäuser; Boston: 2009.

[21] Chirikjian GS. Stochastic Models, Information Theory, and Lie Groups. Vol. 2. Birkäuser; Boston: 2011.

[22] Chirikjian GS. Parts Entropy and the Principal Kinematic Formula. IEEE Conference on Automation Science and Engineering; Washington D.C.. August 2008.

[23] Chirikjian GS. The stochastic elastica and excluded-volume perturbations of DNA conformational ensembles. International Journal of Non-Linear Mechanics. 2008;43:1108–1120. [PMC free article] [PubMed]

[24] Cortez RA, Tanner HG, Lumia R. Distributed Robotic Radiation Mapping; ISER’08.

[25] Cover TM, Thomas JA. Elements of Information Theory. 2nd ed Wiley-Interscience; Hoboken, NJ: 2006.

[26] Csiszár I. *I*-Divergence Geometry of Probability Distributions and Minimization Problems. Ann. Prob. 1975;3:146–158.

[27] Dembo A. Information Inequalities and Concentration of Measure. Ann. Prob. 1997;25:527–539.

[28] Dembo A, Cover TM, Thomas JA. Information theoretic inequalities. IEEE Transactions On Information Theory. 1991;37:1501–1518.

[29] Duncan TE. An Estimation problem in compact Lie groups. Syst. Control Lett. 1998;10:257–263.

[30] Gelfand IM, Minlos RA, Ya. Shapiro Z. Representations of the Rotation and Lorentz Groups and Their Applications. Pergamon Press; New York: 1963.

[31] Grenander U. Probabilities on Algebraic Structures. Dover Edition. Wiley: 2008. 1963.

[32] Gross L. Logarithmic Sobolev inequalities. Amer. J. Math. 1975;97:1061–1083.

[33] Gross L. Logarithmic Sobolev inequalities on Lie groups. Illinois J. Math. 1992;36:447–490.

[34] Gurarie D. Symmetry and Laplacians. Introduction to Harmonic Analysis, Group Representations and Applications. Elsevier Science Publisher; The Netherlands: 1992.

[35] Helgason S. Groups and Geometric Analysis. American Mathematical Society; Providence, RI: 2000.

[36] Heyer H. Probability Measures on Locally Compact Groups. Springer-Verlag; New York: 1977.

[37] Johnson O, Suhov Y. Entropy and convergence on compact groups. Journal of Theoretical Probability. 2000;13:843–857.

[38] Johnson O. Information Theory and the Central Limit Theorem. Imperial College Press; London: 2004.

[39] Jurdjevic V, Sussmann HJ. Control systems on Lie groups. Journal of Differential Equations. 1972;12:313–329.

[40] Kutzer MDM, Armand M, Lin E, Scheidt D, Chirikjian GS. Toward cooperative team-diagnosis in multi-robot systems. International Journal of Robotics Research. 2008;27:1069–1090.

[41] Kwon J, Choi M, Park FC, Chu C. Particle filtering on the Euclidean group: framework and applications. Robotica. 2007;25:725–737.

[42] Lawler GF. Introduction to Stochastic Processes. 2nd ed CRC Press; Boca Raton, FL: 2006.

[43] Ledoux M. Concentration of Measure and Logarithmic Sobolev Inequalities. Volume 1709. Springer; Berlin: 1999. (Lecture Notes in Mathematics).

[44] Ledoux M. The Concentration of Measure Phenomenon. Amer. Math. Soc.; Providence, RI: 2001. (Math. Surveys and Monographs 89).

[45] Lee K, Chirikjian GS. Robotic self-replication from low-complexity parts. IEEE Robotics and Automation Magazine. 2007;14:34–43.

[46] Lieb EH, Loss M. Analysis. 2nd ed American Mathematical Society; Providence, RI: 2001.

[47] Linnik YV. An information-theoretic proof of the Central Limit Theorem with the Lindeberg condition. Theory of Probability and its Applications. 1959;4:288–299.

[48] Mackey GW. Induced Representations of Groups and Quantum Mechanics. W. A. Benjamin, Inc.; New York and Amsterdam: 1968.

[49] Makadia A, Daniilidis K. Rotation estimation from spherical images. IEEE Trans. Pattern Anal. Mach. Intell. 2006;28:1170–1175. [PubMed]

[50] Maksimov VM. Necessary and sufficient statistics for the family of shifts of probability distributions on continuous bicompact groups. Theory Of Probability And Its Applications. 1967;12:267–280.

[51] Malis E, Hamel T, Mahony R, Morin P. Dynamic Estimation of Homography Transformations on the Special Linear Group for Visual Servo Control. IEEE International Conference on Robotics and Automation; Kobe, Japan. May, 2009.

[52] Manyika J, Durrant-Whyte H. Data Fusion and Sensor Management: A Decentralized Information-Theoretic Approach. Ellis Horwood; New York: 1994.

[53] Miller W. Lie Theory and Special Functions. Academic Press; New York: 1968.

[54] Mourikis A, Roumeliotis S. On the treatment of relative-pose measurements for mobile robot localization; ICRA’06.

[55] Murray R, Li Z, Sastry S. A Mathematical Introduction to Robotics. CRC Press; 1994.

[56] Otto F, Villani C. Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality. J. Funct. Anal. 2000;173:361–400.

[57] Park W, Liu Y, Zhou Y, Moses M, Chirikjian GS. Kinematic state estimation and motion planning for stochastic nonholonomic systems using the exponential map. Robotica. 2008;26:419–434. [PMC free article] [PubMed]

[58] Park W, Wang Y, Chirikjian GS. The path-of-probability algorithm for steering and feedback control of flexible needles. International Journal of Robotics Research. 2010;29:813–830. [PMC free article] [PubMed]

[59] Pennec X. Intrinsic statistics on Riemannian manifolds: Basic tools for geometric measurements. Journal of Mathematical Imaging and Vision. 2006;25:127–154.

[60] Porat B, Nehorai A. Localizing vapor-emitting sources by moving sensors. IEEE Trans. Signal Processing. 1996;44:10181021.

[61] Roy KK. Exponential families of densities on an analytic group and sufficient statistics. Sankhy a: The Indian Journal of Statistics, Series A. 1975;37:82–92.

[62] Russell RA. Odour Detection by Mobile Robots. World Scientific; Singapore: 1999.

[63] Sanderson AC. Part Entropy Method for Robotic Assembly Design. Proceedings of International Conference on Robotics; 1984.

[64] Shannon CE, Weaver W. The Mathematical Theory of Communication. University of Illinois Press; Urbana: 1949.

[65] Smith P, Drummond T, Roussopoulos K. Computing MAP trajectories by representing, propagating and combining PDFs over groups. Proceedings of the 9th IEEE International Conference on Computer Vision, 2; Nice, France. 2003. pp. 1275–1282.

[66] Stam AJ. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Information and Control. 1959;2:101–112.

[67] Steele JM. Fisher information and detection of a Euclidean perturbation of an independent stationary process. The Annals of Probability. 1986;14:326–335.

[68] Su S, Lee CSG. Manipulation and propagation of uncertainty and verification of applicability of actions in assembly tasks. IEEE Trans. Syst. Man Cybern. 1992;22:1376–1389.

[69] Sugiura M. Unitary Representations and Harmonic Analysis. 2nd edition Elsevier Science Publisher; The Netherlands: 1990.

[70] Talagrand M. New concentration inequalities in product spaces. Invent. Math. 1996;126:505–563.

[71] Taylor ME. Noncommutative Harmonic Analysis. Mathematical Surveys and Monographs, American Mathematical Society; Providence, RI: 1986.

[72] Thrun S, Burgard W, Fox D. Probabilistic Robotics. MIT Press; Cambridge, MA: 2005.

[73] Tzanos P, Zefran M, Nehorai A. Information based distributed control for biochemical source detection and localization; ICRA’05.pp. 4457–4462.

[74] Tzanos P, Zefran M. Stability analysis of information based control for biochemical source localization; ICRA’06.pp. 3116–3121.

[75] Tzanos P, Zefran M. Locating a circular biochemical source: Modeling and control. :523–528.

[76] Varadarajan VS. An Introduction to Harmonic Analysis on Semisimple Lie Groups. Cambridge University Press; Cambridge, England: 1999.

[77] Vergassola1 M, Villermaux E, Shraiman BI. Infotaxis as a strategy for searching without gradients. Nature. 2007;445:406–409. [PubMed]

[78] Vilenkin NJ, Klimyk AU. Representation of Lie Group and Special Functions. Vol. 1-3. Kluwer Academic Publishers; The Netherlands: 1991.

[79] Wang Y, Chirikjian GS. Nonparametric second-order theory of error propagation on the Euclidean group. International Journal of Robotics Research. 2008;27:1258–1273. [PMC free article] [PubMed]

[80] Wang Y, Chirikjian GS. Error propagation on the Euclidean group with applications to manipulator kinematics. IEEE Transactions on Robotics. 2006;22:591–602.

[81] Wang Y, Zhou Y, Maslen DK, Chirikjian GS. Solving the phase-noise Fokker-Planck equation using the motion-group Fourier transform. IEEE Transactions on Communications. 2006;54:868–877.

[82] Willsky AS. Ph.D. dissertation. Dept. Aeronautics and Astronautics, M.I.T; Cambridge, Mass.: Jun, 1973. Dynamical Systems Defined on Groups: Structural Properties and Estimation.

[83] Želobenko DP. Compact Lie Groups and their Representations. American Mathematical Society; Rhode Island: 1973.

[84] Zhou Y, Chirikjian GS. Probabilistic Models of Dead-Reckoning Error in Nonholonomic Mobile Robots; ICRA’03; Taipei, Taiwan. September, 2003.

[85] Zhou Y, Chirikjian GS. Conformational statistics of bent semiflexible polymers. Journal of Chemical Physics. 2003;119:4962–4970.

[86] Zhou Y, Chirikjian GS. Conformational statistics of semi-flexible macromolecular chains with internal joints. Macromolecules. 2006;39:1950–1960. [PMC free article] [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |