Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2901903

Formats

Article sections

Authors

Related links

Proc IEEE Int Symp Biomed Imaging. Author manuscript; available in PMC 2010 July 12.

Published in final edited form as:

Proc IEEE Int Symp Biomed Imaging. 2009; 5193141: 690–693.

doi: 10.1109/ISBI.2009.5193141PMCID: PMC2901903

NIHMSID: NIHMS212408

See other articles in PMC that cite the published article.

Biological shape modeling is an essential task that is required for systems biology efforts to simulate complex cell behaviors. Statistical learning methods have been used to build generative shape models based on reconstructive shape parameters extracted from microscope image collections. However, such parametric modeling approaches are usually limited to simple shapes and easily-modeled parameter distributions. Moreover, to maximize the reconstruction accuracy, significant effort is required to design models for specific datasets or patterns. We have therefore developed an instance-based approach to model biological shapes within a shape space built upon diffeomorphic measurement. We also designed a recursive interpolation algorithm to probabilistically synthesize new shape instances using the shape space model and the original instances. The method is quite generalizable and therefore can be applied to most nuclear, cell and protein object shapes, in both 2D and 3D.

Proteins function in different cellular or subcellular compartments to form a living cell system. In systems biology, modeling this complex system from different aspects and at various levels is anticipated to contribute to a final understanding of cell mechanisms [1, 2]. Quantitative, predictive compartment models which capture the spatial properties of subcellular structures is a basic building block for modeling of complex cell behaviors. With high-throughput microscopes capable of generating large numbers of high-resolution images, automated learning methods will be needed to build predictive compartment shape models.

We have previously described approaches for learning generative models of important cell compartments and protein localization [3]. In this work, nuclei were parameterized into B-spline coefficients of a medial axis curve and width from the media axis, and cell boundaries were parameterized into a compressed vector of the ratio of cell boundary position to nuclear boundary position in a polar coordinate system center on the nucleus. Statistical learning was performed on sets of parameters extracted from HeLa cell images to fit proper distributions to them. Algorithms to synthesize new shapes from sampled parameters were described for different protein distributions.

In such parametric modeling methods, the shape space is represented by probabilistic distributions of parameters. Parametric methods usually induce very concise models, but they also have obvious shortages. First, parametric description converts the shape into finite set of parameters, which causes information loss during the simplification process. Second, shape parameter distributions are usually arbitrarily fitted with frequently-used distributions by experience or histogram inspection, which is often not accurate enough in the real shape space. Third, complex shapes (especially 3D shapes) are often not easy to parameterize.

As first described by Yang et al [4], non-rigid registration methods are a valuable alternative to parametric methods for analyzing nuclear shape. Rohde et al [5, 6] proposed a registration-based measurement of distance between deformable shapes using the large deformation diffeomorphic metric mapping (LDDMM) framework [7] combined with Multidimensional scaling (MDS) [8] to reconstruct the nuclear shape space. The work we describe here extends this approach to a generative framework. To overcome the drawbacks of parametric methods, we propose an instance-based approach to model the shape space using kernel density estimation and a method to synthesize new shapes from it. We illustrate the method using 2D nuclear shapes but it is equally applicable to higher dimensional images.

Under the framework of computational anatomy [9], a set of shapes can be related by a group of diffeomorphic transformations, one-to-one smooth differentiable invertible mappings. Let Φ represent a group of diffeomorphisms, over a bounded domain Ω. A deformed image can be expressed as *I*(*ϕ*(*x*)), *x* Ω, then a series of images *I*_{1}, *I*_{2}, … can be generated by a template *I* and a group of diffeomorphisms {*I*(*ϕ _{i}*)

$$d({I}_{1},{I}_{2})=\underset{v}{\text{inf}}{\int}_{0}^{1}\Vert Lv(\cdot ,t)\Vert dt$$

(1)

where ‖ · ‖ means one standard *L*_{2} norm on the velocity field *v*(*x, t*), and *L* just represents a linear differential operator, for example *L* = (^{2} + *λI*).

This distance can be computed by solving the following optimization problem:

$$\underset{v}{\text{inf}}{\int}_{0}^{1}{\Vert v(\cdot ,t)\Vert}_{V}^{2}\phantom{\rule{0.2em}{0ex}}dt$$

(2)

subject to: *I*_{1}(*ϕ*(*x*, 0)) = *I*_{1}(*x*) and *I*_{1}(*ϕ*(*x*, 1)) = *I*_{2}(*x*), *x* Ω

In this paper, however, we use the more computationally efficient greedy algorithm [10], which looks for the locally-in-time optima instead of the global optima. In this method, we assume the velocity is constant in each time step Δ*t* and look for the locally optimal velocity field by

$${L}^{\ast}L{v}_{t}+{b}_{t}=0;$$

(3)

with *b _{t}* = −(

After we compute the diffeomorphic distances between all image pairs, a mutual distance matrix can be generated by: *D _{m,n}* =

Shape coordinates reside in a high dimension with a complex density distribution. We therefore apply the non-parametric method kernel density estimation [13] to approximate the shape space. To simplify the problem, we use a spherical Gaussian kernel since MDS normalize the coordinates in multi-dimensional Cartesian space. The *pdf* is formulated as

$${\widehat{p}}_{h}(\mathbf{\text{x}})=\frac{1}{n}\text{\u2211}_{j=1}^{n}\frac{1}{{(\sqrt{2\pi}h)}^{d}}\phantom{\rule{0.2em}{0ex}}\text{exp}\phantom{\rule{0.2em}{0ex}}\left(\text{\u2211}_{k=1}^{d}\frac{{({x}_{k}-{x}_{k}^{j})}^{2}}{2{h}^{2}}\right)$$

(4)

Optimal bandwidth is selected to minimize the Kullback-Leibler divergence [13] between approximate density * _{h}*(

$${D}_{KL}(p,{\widehat{p}}_{h})\phantom{\rule{0.2em}{0ex}}=\phantom{\rule{0.2em}{0ex}}\int p(\mathbf{\text{x}})log\frac{p(\mathbf{\text{x}})}{{\widehat{p}}_{h}(\mathbf{\text{x}})}d\mathbf{\text{x}}$$

(5)

which is equivalent to maximizing

$$\int log[{\widehat{p}}_{h}(\mathbf{\text{x}})]p(\mathbf{\text{x}})d\mathbf{\text{x}}\phantom{\rule{0.2em}{0ex}}=\phantom{\rule{0.2em}{0ex}}Elog[{\widehat{p}}_{h}(\mathbf{\text{x}})]$$

(6)

The expectation of the logarithmic likelihood can be approximated by leave-one-out cross-validation for fixed bandwidth *h* [14].

$$L({\mathbf{\text{x}}}_{1},{\mathbf{\text{x}}}_{2},\cdots ,{\mathbf{\text{x}}}_{n}\mid h)\phantom{\rule{0.3em}{0ex}}=\phantom{\rule{0.3em}{0ex}}\text{\u2211}_{i=1}^{n}log{\widehat{p}}_{h,i}({\mathbf{\text{x}}}_{i})$$

(7)

where * _{h,i}* is the leave-one-out likelihood estimator

$${\widehat{p}}_{h,i}(\mathbf{\text{x}})=\frac{1}{n-1}\sum _{j\ne i}\frac{1}{{(\sqrt{2\pi}h)}^{d}}\phantom{\rule{0.2em}{0ex}}\text{exp}\phantom{\rule{0.2em}{0ex}}\left(\text{\u2211}_{k=1}^{d}\frac{{({x}_{k}^{i}-{x}_{k}^{j})}^{2}}{2{h}^{2}}\right)$$

(8)

Full optimal bandwidth matrix can be estimated using a method involving Markov chain Monte Carlo [14].

The purpose of triangulation is to partition the shape space into triangular mesh grids in order to interpolate un-sampled points in the shape space. Given a set of shape points with reduced dimension *d*, Delaunay triangulation [15] is used to triangulate the space, yielding a set of *d* dimensional triangles with vertices of *d* + 1 points from the original point set. A point is located in the triangulated space by searching the enclosing Delaunay triangle. Both triangulation and triangle search algorithms are implemented in the computational geometry toolbox *qhull* [16].

With the original shapes and probabilistic description of shape space we can generate new shapes which also reside in the true shape space and are statistically meaningful, by deforming existing shapes located nearby in the shape space. First, sample a point from the distribution learned in 2.2 in the shape space and locate it in the triangulated space by finding the *d* + 1 vertices of the *d*-dimensional triangle which encloses it. Since diffeomorphisms can be calculated only between pairs of shapes, recursive projection and deformation algorithm is then performed to reach the shape represented by the sampled point, according to the following algorithm.

- Start from dimension
*d*, in a*k*dimensional triangle with vertices*X*_{1}, …,*X*_{k}_{+1}, “project”*X*_{k}_{+1}to the plane defined by*X*_{1}, …,*X*via_{k}*C*, that is, find the intersection_{k}*C*_{k}_{−1}of radial $\overrightarrow{{X}_{k+1}{C}_{k}}$ and plane*X*_{1}, …,*X*. Define ratio parameter ${\lambda}_{k}:\overrightarrow{{C}_{k}{C}_{k-1}}={\lambda}_{k}\overrightarrow{{X}_{k+1}{C}_{k}}$ and_{k}*C*_{k}_{−1}that satisfies$$\left|{\mathbf{\text{c}}}_{k-1}-{\mathbf{\text{x}}}_{1},{\mathbf{\text{x}}}_{2}-{\mathbf{\text{x}}}_{1},\dots ,{\mathbf{\text{x}}}_{k}-{\mathbf{\text{x}}}_{1}\right|=0$$(9)The iterative projection is solved by$$\{\begin{array}{ll}{\lambda}_{k}\hfill & =\frac{\left|{\mathbf{\text{c}}}_{k}-{\mathbf{\text{x}}}_{1},{\mathbf{\text{x}}}_{2}-{\mathbf{\text{x}}}_{1},\dots ,{\mathbf{\text{x}}}_{k}-{\mathbf{\text{x}}}_{1}\right|}{\left|{\mathbf{\text{c}}}_{k}-{\mathbf{\text{x}}}_{k+1},{\mathbf{\text{x}}}_{2}-{\mathbf{\text{x}}}_{1},\dots ,{\mathbf{\text{x}}}_{k}-{\mathbf{\text{x}}}_{1}\right|}\hfill \\ {\mathbf{\text{c}}}_{k-1}\hfill & =(1+{\lambda}_{k}){\mathbf{\text{c}}}_{k}-{\lambda}_{k}{\mathbf{\text{x}}}_{k+1}\hfill \end{array}$$(10) - Remove the null dimension of set {
**c**_{k}_{−1},**x**_{1}, …,**x**} since they are coplanar in the_{k}*k*-dimensional space. This is achieved by a coordinate conversion: projecting all coordinates onto the first*k*− 1 principal components of the covariance matrix.$$[{\mathbf{\text{c}}}_{k-1},{\mathbf{\text{x}}}_{1},\dots ,{\mathbf{\text{x}}}_{k}]\leftarrow {[{\mathbf{\text{P}}}_{1},\dots ,{\mathbf{\text{P}}}_{k-1}]}^{T}[{\mathbf{\text{c}}}_{k-1},{\mathbf{\text{x}}}_{1},\dots ,{\mathbf{\text{x}}}_{k}]$$ - On
*k*− 1 dimensional coordinate set {**c**_{k}_{−1},**x**_{1}, …,**x**} repeat step 1 and 2. Each time a ratio parameter_{k}*λ*is calculated by equation (7) until in the end only_{k}*c*_{1},*x*_{1},*x*_{2}are left, which are all scalars, and*λ*_{1}= (*x*_{2}−*c*_{1})/(*c*_{1}−*x*_{1}). - Knowing
*λ*_{1}, …,*λ*, starting from_{k}*λ*_{1}and*I*(*X*_{1}),*I*(*X*_{2}), the shapes represented by*X*_{1},*X*_{2}, generate an intermediate shape*I*(*C*) which corresponds to the coordinate_{k}**c**in the shape space by deforming shape_{k}*I*(*X*_{k}_{+1}) to shape*I*(*C*_{k}_{−1}) and capturing the shape at 1/(1 +*λ*) along the deformation path (_{k}*C*_{0}=*X*_{1}). - Repeat step 4 until the destination shape
*I*(*C*) is generated._{d} - Output the shape
*I*(*C*) as the synthesized shape image._{d}

To build a complete instance-based generative shape model, we used previously acquired two dimensional images of Hela cell nuclei expressing lamin modifications(a total of *n* = 160 nuclei images with relatively complex shapes), and pre-process images by removing variation in overall orientation, position and size, as described in [5].

We assume shapes generated from our recursive interpolation algorithm still reside in the shape space. To validate this assumption, we use leave-one-out re-interpolation. After construction of the shape space, for every shape point which does not reside on the convex hull enclosing all points, we remove it and use the remaining *n* − 1 points to construct the triangulated mesh. We locate the excluded point in its enclosing triangular simplex and regenerate the shape it stands for using our recursive interpolation algorithm. Comparison between original shape and interpolated shape shows little difference, see Figure 2.

By registering pairs of binary nuclear images, we calculate the diffeomorphic distance matrix *D* (*n* × *n*) and perform MDS on it. Residue variances (defined to be 1 − *R*^{2}(*, D*), (*d* × *d*) – pairwise distance matrix in reduced dimension space, which is an approximation of *D; R* – correlation coefficient between the entries of both matrices) at each reconstruction level are calculated to reflect distance preserving extent after dimension reduction. The intrinsic dimensionality of the data is estimated by looking for the “elbows” at which the residual variance ceases to decrease significantly with added dimension [6, 17]. Figure 3 shows the residue variance as a function of dimension used in distance matrix reconstruction and we conclude that *d* = 6 is a sufficiently descriptive dimension of the space.

In fitting shape space distribution with kernel density estimation, we run cross-validation on bandwidth *h* varying from 10^{−2} to 1 by steps of 10^{−3}. Figure 4 shows the peak part of the cross-validation score *Ê* log[* _{h}*(

We sample random vectors according to distribution with optimal bandwidth. Figure 5 shows synthesized nuclear shapes using the recursive interpolation algorithm. Since shapes synthesized from grid interpolation only reside inside the convex hull enclosing all instances, while the “point cloud” like distribution covers infinite space, we re-sample if a point outside the convex hull is generated.

We propose an instance-based generative modeling method for biological shapes. The model contains three parts: original shape instances, shape coordinate representations in shape space and distribution function of shape space. Under this framework, synthesized shapes generated by deforming real shapes are reasonable instances of the family. More important, LDDMM is not restricted by shape dimension or complexity, which makes the model highly generalizable.

Our method of building generative models for biological shapes can be applied to both 2D and 3D images. Since there is no assumption on the shapes to be modeled, it allows very complicated shapes, such as shapes with many invaginations. Moreover, applications are not limited to static shape modeling. Given time series cell images, we can build non-parametric predictive cell deforming (crawling, spreading etc.) models for live cell simulation. The method can also model composite shapes (piecewise constant images), for example, nested shapes of cell membrane combined with nuclei, revealing the spatial relations of different shape components.

Both non-parametric density estimation and neighboring interpolation requires as many original data as possible. During our validation studies, we noticed that some regenerated shapes did not match the original image well. This is presumably because of an overly sparse distribution of shapes around it, in which case the interpolation is done using shapes faraway in shape space. Determining criteria for the number of images required to partition the space finely enough for a desired level of interpolation accuracy remains a theoretical problem to be investigated. Additionally, all shapes we are able to generate this way must be inside the convex combination (in the shape manifold) of existing points (instances). This limitation is also presumably addressed by collecting large numbers of images, a task greatly simplified by automated microscopes.

We thank Karl Rohr for helpful discussions. The research described here was supported in part by NSF grant EF-0331657 and NIH grant GM075205.

1. Ideker T, Galitski T, Hood L. A new approach to decoding life. Annu Rev Genomics Hum Genet. 2001;2:343–372. [PubMed]

2. Kitano H. Computational systems biology. Nature. 2002;420:206–210. [PubMed]

3. Zhao T, Murphy RF. Automated learning of generative models for subcellular location: building blocks for systems biology. Cytometry Part A. 2007;71A:978–990. [PubMed]

4. Yang S, Khler D, Teller K, Cremer T, Le Baccon P, Heard E, Eils R, Rohr K. LNCS. Vol. 4190. Springer; Berlin Heidelberg: 2006. Non-rigid registration of 3d multi-channel microscopy images of cell nuclei; pp. 907–914. [PubMed]

5. Rohde GK, Ribeiro AJS, Dahl KN, Murphy RF. Deformation-based nuclear morphometry: capturing nuclear shape variation in HeLa cells. Cytometry. 2008;73A:341–350. [PubMed]

6. Rohde GK, Wang W, Peng T, Murphy RF. Deformation-based nonlinear dimension reduction: applications to nuclear morphometry. Proc IEEE Int Symp Biomed Imaging; 2008. pp. 500–503.

7. Beg MF, Miller MI, Trouve A, Younes L. Computing large deformation metric mappings via geodesic flows of diffeomorphisms. Intern J Comp Vis. 2005;61(2):139–157.

8. Cox T, Cox M. Multidimensional Scaling. Chapman and Hall; 1994.

9. Grenander U, Miller MI. Computational anatomy: An emerging discipline. Quart Appl Math. 1998;56:617–694.

10. Christensen GE, Rabbitt RD, Miller MI. Deformable templates using large deformation kinematics. IEEE Trans Imag Proc. 1996;5:1435–1447. [PubMed]

11. Klassen E, Srivastava A, Mio W, Joshi S. Analysis of planar shapes using geodesic paths on shape spaces. IEEE Trans Pattern Anal Mach Intell. 2004;26(3):372–383. [PubMed]

12. Saul L, Weinberger K, Sha F, Ham J, Lee D. Semisupervised Learning. MIT Press; 2006. Spectral methods for dimensionality reduction.

13. Wasserman L. All of Statistics: A Concise Course in Statistical Inference. Springer Texts in Statistics. 2005

14. Zhang X, King ML, Hyndman RL. Bandwidth selection for multivariate kernel density estimation using mcmc. Comput Stat Dat An. 2006;50(11):3009–3031.

15. de Berg M, Cheong O, van Kreveld M, Over-mars M. Computational Geometry: Algorithms and Applications. Springer-Verlag; 2008.

16. Barber CB, Dobkin DP, Huhdanpaa H. The quickhull algorithm for convex hulls. ACM Transactions on Mathematical Software. 1996;22(4):469–483.

17. Tenembaum JB, de Silva V, Langford JC. A global geometric framework for nonlinear dimensionality reduction. Science. 2000;290:2319–2323. [PubMed]