Our world consists not only of objects and scenes but also of materials of various kinds. Being able to recognize the materials that surround us (e.g., plastic, glass, concrete) is important for humans as well as for computer vision systems. Unfortunately, materials have received little attention in the visual recognition literature, and very few computer vision systems have been designed specifically to recognize materials. In this paper, we present a system for recognizing material categories from single images. We propose a set of low and mid-level image features that are based on studies of human material recognition, and we combine these features using an SVM classifier. Our system outperforms a state-of-the-art system [Varma and Zisserman, 2009] on a challenging database of real-world material categories [Sharan et al., 2009]. When the performance of our system is compared directly to that of human observers, humans outperform our system quite easily. However, when we account for the local nature of our image features and the surface properties they measure (e.g., color, texture, local shape), our system rivals human performance. We suggest that future progress in material recognition will come from: (1) a deeper understanding of the role of non-local surface properties (e.g., extended highlights, object identity); and (2) efforts to model such non-local surface properties in images.
doi:10.1007/s11263-013-0609-0
PMCID: PMC3728085
PMID: 23914070
material recognition; material classification; texture classification; Mechanical Turk; perception
Transportation-based metrics for comparing images have long been applied to analyze images, especially where one can interpret the pixel intensities (or derived quantities) as a distribution of ‘mass’ that can be transported without strict geometric constraints. Here we describe a new transportation-based framework for analyzing sets of images. More specifically, we describe a new transportation-related distance between pairs of images, which we denote as linear optimal transportation (LOT). The LOT can be used directly on pixel intensities, and is based on a linearized version of the Kantorovich-Wasserstein metric (an optimal transportation distance, as is the earth mover’s distance). The new framework is especially well suited for computing all pairwise distances for a large database of images efficiently, and thus it can be used for pattern recognition in sets of images. In addition, the new LOT framework also allows for an isometric linear embedding, greatly facilitating the ability to visualize discriminant information in different classes of images. We demonstrate the application of the framework to several tasks such as discriminating nuclear chromatin patterns in cancer cells, decoding differences in facial expressions, galaxy morphologies, as well as sub cellular protein distributions.
doi:10.1007/s11263-012-0566-z
PMCID: PMC3667970
PMID: 23729991
Optimal transportation; linear embedding
This paper proposes an original approach for the statistical analysis of longitudinal shape data. The proposed method allows the characterization of typical growth patterns and subject-specific shape changes in repeated time-series observations of several subjects. This can be seen as the extension of usual longitudinal statistics of scalar measurements to high-dimensional shape or image data.
The method is based on the estimation of continuous subject-specific growth trajectories and the comparison of such temporal shape changes across subjects. Differences between growth trajectories are decomposed into morphological deformations, which account for shape changes independent of the time, and time warps, which account for different rates of shape changes over time.
Given a longitudinal shape data set, we estimate a mean growth scenario representative of the population, and the variations of this scenario both in terms of shape changes and in terms of change in growth speed. Then, intrinsic statistics are derived in the space of spatiotemporal deformations, which characterize the typical variations in shape and in growth speed within the studied population. They can be used to detect systematic developmental delays across subjects.
In the context of neuroscience, we apply this method to analyze the differences in the growth of the hippocampus in children diagnosed with autism, developmental delays and in controls. Result suggest that group differences may be better characterized by a different speed of maturation rather than shape differences at a given age. In the context of anthropology, we assess the differences in the typical growth of the endocranium between chimpanzees and bonobos. We take advantage of this study to show the robustness of the method with respect to change of parameters and perturbation of the age estimates.
doi:10.1007/s11263-012-0592-x
PMCID: PMC3744347
PMID: 23956495
longitudinal data; statistics; shape regression; growth; spatiotemporal registration; time warp
Pizer, Stephen M. | Fletcher, P. Thomas | Joshi, Sarang | Thall, Andrew | Chen, James Z. | Fridman, Yonatan | Fritsch, Daniel S. | Gash, Graham | Glotzer, John M. | Jiroutek, Michael R. | Lu, Conglin | Muller, Keith E. | Tracton, Gregg | Yushkevich, Paul | Chaney, Edward L.
M-reps (formerly called DSLs) are a multiscale medial means for modeling and rendering 3D solid geometry. They are particularly well suited to model anatomic objects and in particular to capture prior geometric information effectively in deformable models segmentation approaches. The representation is based on figural models, which define objects at coarse scale by a hierarchy of figures – each figure generally a slab representing a solid region and its boundary simultaneously. This paper focuses on the use of single figure models to segment objects of relatively simple structure.
A single figure is a sheet of medial atoms, which is interpolated from the model formed by a net, i.e., a mesh or chain, of medial atoms (hence the name m-reps), each atom modeling a solid region via not only a position and a width but also a local figural frame giving figural directions and an object angle between opposing, corresponding positions on the boundary implied by the m-rep. The special capability of an m-rep is to provide spatial and orientational correspondence between an object in two different states of deformation. This ability is central to effective measurement of both geometric typicality and geometry to image match, the two terms of the objective function optimized in segmentation by deformable models. The other ability of m-reps central to effective segmentation is their ability to support segmentation at multiple levels of scale, with successively finer precision. Objects modeled by single figures are segmented first by a similarity transform augmented by object elongation, then by adjustment of each medial atom, and finally by displacing a dense sampling of the m-rep implied boundary. While these models and approaches also exist in 2D, we focus on 3D objects.
The segmentation of the kidney from CT and the hippocampus from MRI serve as the major examples in this paper. The accuracy of segmentation as compared to manual, slice-by-slice segmentation is reported.
PMCID: PMC3697155
PMID: 23825898
A new definition of affine invariant medial axis of planar closed curves is introduced. A point belongs to the affine medial axis if and only if it is equidistant from at least two points of the curve, with the distance being a minimum and given by the areas between the curve and its corresponding chords. The medial axis is robust, eliminating the need for curve denoising. In a dynamical interpretation of this affine medial axis, the medial axis points are the affine shock positions of the affine erosion of the curve. We propose a simple method to compute the medial axis and give examples. We also demonstrate how to use this method to detect affine skew symmetry in real images.
doi:10.1023/B:VISI.0000036835.28674.d0
PMCID: PMC3663081
PMID: 23710110
medial axis; affine invariant; symmetry; area; shape; pattern recognition
Inspired by the work by Gomes et al., we describe and analyze a vector distance function approach for the implicit evolution of closed curves of codimension larger than one. The approach is set up in complete generality, and then applied to the evolution of dynamic geometric active contours in R4 (codimension three case). In order to carry this out one needs an explicit expression for the zero level set for which we propose a discrete connectivity method. This leads us to make connections with the new theory of cubical homology. We provide some explicit simulation results in order to illustrate the methodology.
doi:10.1007/s11263-005-3849-9
PMCID: PMC3659211
PMID: 23700357
vector distance function; level set methods; dynamic active contours
We present an information theoretic approach to define the problem of structure from motion (SfM) as a blind source separation one. Given that for almost all practical joint densities of shape points, the marginal densities are non-Gaussian, we show how higher-order statistics can be used to provide improvements in shape estimates over the methods of factorization via Singular Value Decomposition (SVD), bundle adjustment and Bayesian approaches. Previous techniques have either explicitly or implicitly used only second-order statistics in models of shape or noise. A further advantage of viewing SfM as a blind source problem is that it easily allows for the inclusion of noise and shape models, resulting in Maximum Likelihood (ML) or Maximum a Posteriori (MAP) shape and motion estimates. A key result is that the blind source separation approach has the ability to recover the motion and shape matrices without the need to explicitly know the motion or shape pdf. We demonstrate that it suffices to know whether the pdf is sub-or super-Gaussian (i.e., semi-parametric estimation) and derive a simple formulation to determine this from the data. We provide extensive experimental results on synthetic and real tracked points in order to quantify the improvement obtained from this technique.
doi:10.1007/s11263-009-0313-2
PMCID: PMC3653339
PMID: 23682206
Structure from motion; Bundle adjustment; Blind source separation; Subspace analysis; Bayesian analysis
This paper addresses the problem of non-rigid video registration, or the computation of optical flow from a reference frame to each of the subsequent images in a sequence, when the camera views deformable objects. We exploit the high correlation between 2D trajectories of different points on the same non-rigid surface by assuming that the displacement of any point throughout the sequence can be expressed in a compact way as a linear combination of a low-rank motion basis. This subspace constraint effectively acts as a trajectory regularization term leading to temporally consistent optical flow. We formulate it as a robust soft constraint within a variational framework by penalizing flow fields that lie outside the low-rank manifold. The resulting energy functional can be decoupled into the optimization of the brightness constancy and spatial regularization terms, leading to an efficient optimization scheme. Additionally, we propose a novel optimization scheme for the case of vector valued images, based on the dualization of the data term. This allows us to extend our approach to deal with colour images which results in significant improvements on the registration results. Finally, we provide a new benchmark dataset, based on motion capture data of a flag waving in the wind, with dense ground truth optical flow for evaluation of multi-frame optical flow algorithms for non-rigid surfaces. Our experiments show that our proposed approach outperforms state of the art optical flow and dense non-rigid registration algorithms.
doi:10.1007/s11263-012-0607-7
PMCID: PMC3724559
PMID: 23908564
We develop a computational model of shape that extends existing Riemannian models of curves to multidimensional objects of general topological type. We construct shape spaces equipped with geodesic metrics that measure how costly it is to interpolate two shapes through elastic deformations. The model employs a representation of shape based on the discrete exterior derivative of parametrizations over a finite simplicial complex. We develop algorithms to calculate geodesics and geodesic distances, as well as tools to quantify local shape similarities and contrasts, thus obtaining a formulation that accounts for regional differences and integrates them into a global measure of dissimilarity. The Riemannian shape spaces provide a common framework to treat numerous problems such as the statistical modeling of shapes, the comparison of shapes associated with different individuals or groups, and modeling and simulation of shape dynamics. We give multiple examples of geodesic interpolations and illustrations of the use of the models in brain mapping, particularly, the analysis of anatomical variation based on neuroimaging data.
doi:10.1007/s11263-010-0323-0
PMCID: PMC2971560
PMID: 21057668
Multidimensional shape; Shape of surfaces; Elastic shapes
We present a framework for incorporating prior information about high-probability shapes in the process of contour extraction and object recognition in images. Here one studies shapes as elements of an infinite-dimensional, non-linear quotient space, and statistics of shapes are defined and computed intrinsically using differential geometry of this shape space. Prior models on shapes are constructed using probability distributions on tangent bundles of shape spaces. Similar to the past work on active contours, where curves are driven by vector fields based on image gradients and roughness penalties, we incorporate the prior shape knowledge in the form of vector fields on curves. Through experimental results, we demonstrate the use of prior shape models in the estimation of object boundaries, and their success in handling partial obscuration and missing data. Furthermore, we describe the use of this framework in shape-based object recognition or classification.
doi:10.1007/s11263-008-0179-8
PMCID: PMC2980332
PMID: 21076692
Shape extraction; Segmentation; Bayesian shape extraction; Tangent PCA; Intrinsic shape analysis; Elastic shapes; Riemannian metric
In this paper we present a new approach for the non-rigid registration of multi-modality images. Our approach is based on an information theoretic measure called the cumulative residual entropy (CRE), which is a measure of entropy defined using cumulative distributions. Cross-CRE between two images to be registered is defined and maximized over the space of smooth and unknown non-rigid transformations. For efficient and robust computation of the non-rigid deformations, a tri-cubic B-spline based representation of the deformation function is used. The key strengths of combining CCRE with the tri-cubic B-spline representation in addressing the non-rigid registration problem are that, not only do we achieve the robustness due to the nature of the CCRE measure, we also achieve computational efficiency in estimating the non-rigid registration. The salient features of our algorithm are: (i) it accommodates images to be registered of varying contrast+brightness, (ii) faster convergence speed compared to other information theory-based measures used for non-rigid registration in literature, (iii) analytic computation of the gradient of CCRE with respect to the non-rigid registration parameters to achieve efficient and accurate registration, (iv) it is well suited for situations where the source and the target images have field of views with large non-overlapping regions. We demonstrate these strengths via experiments on synthesized and real image data.
doi:10.1007/s11263-006-0011-2
PMCID: PMC2921662
PMID: 20717477
information theory; Shannon entropy; multi-modal non-rigid registration; B-splines
We propose an integrated registration and clustering algorithm, called “consistency clustering”, that automatically constructs a probabilistic white-matter atlas from a set of multi-subject diffusion weighted MR images. We formulate the atlas creation as a maximum likelihood problem which the proposed method solves using a generalized Expectation Maximization (EM) framework. Additionally, the algorithm employs an outlier rejection and denoising strategy to produce sharp probabilistic maps of certain bundles of interest. We test this algorithm on synthetic and real data, and evaluate its stability against initialization. We demonstrate labeling a novel subject using the resulting spatial atlas and evaluate the accuracy of this labeling. Consistency clustering is a viable tool for completely automatic white-matter atlas construction for sub-populations and the resulting atlas is potentially useful for making diffusion measurements in a common coordinate system to identify pathology related changes or developmental trends.
doi:10.1007/s11263-009-0217-1
PMCID: PMC2862392
PMID: 20442792
DTI; Anatomical atlas; Clustering; Segmentation; Tractography; Diffusion imaging; White matter atlas
We present a matching criterion for curves and integrate it into the large deformation diffeomorphic metric mapping (LDDMM) scheme for computing an optimal transformation between two curves embedded in Euclidean space ℝd. Curves are first represented as vector-valued measures, which incorporate both location and the first order geometric structure of the curves. Then, a Hilbert space structure is imposed on the measures to build the norm for quantifying the closeness between two curves. We describe a discretized version of this, in which discrete sequences of points along the curve are represented by vector-valued functionals. This gives a convenient and practical way to define a matching functional for curves. We derive and implement the curve matching in the large deformation framework and demonstrate mapping results of curves in ℝ2 and ℝ3. Behaviors of the curve mapping are discussed using 2D curves. The applications to shape classification is shown and experiments with 3D curves extracted from brain cortical surfaces are presented.
doi:10.1007/s11263-008-0141-9
PMCID: PMC2858418
PMID: 20419045
Large deformation; Diffeomorphisms; Vector-valued measure; Curve matching
This paper presents a novel and robust technique for group-wise registration of point sets with unknown correspondence. We begin by defining a Havrda-Charvát (HC) entropy valid for cumulative distribution functions (CDFs) which we dub the HC Cumulative Residual Entropy (HC-CRE). Based on this definition, we propose a new measure called the CDF-HC divergence which is used to quantify the dis-similarity between CDFs estimated from each point-set in the given population of point sets. This CDF-HC divergence generalizes the CDF based Jensen-Shannon (CDF-JS) divergence introduced earlier in the literature, but is much simpler in implementation and computationally more efficient.
A closed-form formula for the analytic gradient of the cost function with respect to the non-rigid registration parameters has been derived, which is conducive for efficient quasi-Newton optimization. Our CDF-HC algorithm is especially useful for unbiased point-set atlas construction and can do so without the need to establish correspondences. Mathematical analysis and experimental results indicate that this CDF-HC registration algorithm outperforms the previous group-wise point-set registration algorithms in terms of efficiency, accuracy and robustness.
doi:10.1007/s11263-009-0261-x
PMCID: PMC2835416
PMID: 20221321
We present a variational method for unfolding of the cortex based on a user-chosen point of view as an alternative to more traditional global flattening methods, which incur more distortion around the region of interest. Our approach involves three novel contributions. The first is an energy function and its corresponding gradient flow to measure the average visibility of a region of interest of a surface with respect to a given viewpoint. The second is an additional energy function and flow designed to preserve the 3D topology of the evolving surface. The third is a method that dramatically improves the computational speed of the 3D topology preservation approach by creating a tree structure of the 3D surface and using a recursion technique. Experiments results show that the proposed approach can successfully unfold highly convoluted surfaces such as the cortex while preserving their topology.
doi:10.1007/s11263-009-0214-4
PMCID: PMC2786089
PMID: 19960105
Geometric deformable models based on the level set method have become very popular in the last decade. To overcome an inherent limitation in accuracy while maintaining computational efficiency, adaptive grid techniques using local grid refinement have been developed for use with these models. This strategy, however, requires a very complex data structure, yields large numbers of contour points, and is inconsistent with the implementation of topology-preserving geometric deformable models (TGDMs). In this paper, we investigate the use of an alternative adaptive grid technique called the moving grid method with geometric deformable models. In addition to the development of a consistent moving grid geometric deformable model framework, our main contributions include the introduction of a new grid nondegeneracy constraint, the design of a new grid adaptation criterion, and the development of novel numerical methods and an efficient implementation scheme. The overall method is simpler to implement than using grid refinement, requiring no large, complex, hierarchical data structures. It also offers an extra benefit of automatically reducing the number of contour vertices in the final results. After presenting the algorithm, we demonstrate its performance using both simulated and real images.
doi:10.1007/s11263-009-0231-3
PMCID: PMC2784682
PMID: 19946381
Adaptive grid method; Geometric deformable model; Deformation moving grid; Topology preservation; Level set method
Active Appearance Models (AAMs) are generative, parametric models that have been successfully used in the past to model deformable objects such as human faces. The original AAMs formulation was 2D, but they have recently been extended to include a 3D shape model. A variety of single-view algorithms exist for fitting and constructing 3D AAMs but one area that has not been studied is multi-view algorithms. In this paper we present multi-view algorithms for both fitting and constructing 3D AAMs.
Fitting an AAM to an image consists of minimizing the error between the input image and the closest model instance; i.e. solving a nonlinear optimization problem. In the first part of the paper we describe an algorithm for fitting a single AAM to multiple images, captured simultaneously by cameras with arbitrary locations, rotations, and response functions. This algorithm uses the scaled orthographic imaging model used by previous authors, and in the process of fitting computes, or calibrates, the scaled orthographic camera matrices. In the second part of the paper we describe an extension of this algorithm to calibrate weak perspective (or full perspective) camera models for each of the cameras. In essence, we use the human face as a (non-rigid) calibration grid. We demonstrate that the performance of this algorithm is roughly comparable to a standard algorithm using a calibration grid. In the third part of the paper, we show how camera calibration improves the performance of AAM fitting.
A variety of non-rigid structure-from-motion algorithms, both single-view and multi-view, have been proposed that can be used to construct the corresponding 3D non-rigid shape models of a 2D AAM. In the final part of the paper, we show that constructing a 3D face model using non-rigid structure-from-motion suffers from the Bas-Relief ambiguity and may result in a “scaled” (stretched/compressed) model. We outline a robust non-rigid motion-stereo algorithm for calibrated multi-view 3D AAM construction and show how using calibrated multi-view motion-stereo can eliminate the Bas-Relief ambiguity and yield face models with higher 3D fidelity.
doi:10.1007/s11263-007-0050-3
PMCID: PMC2762225
PMID: 19838316
Active appearance models; Multi-view 3D face model construction; Multi-view AAM fitting; Non-rigid structure-from-motion; Motion-stereo; Camera calibration
Face datasets are considered a primary tool for evaluating the efficacy of face recognition methods. Here we show that in many of the commonly used face datasets, face images can be recognized accurately at a rate significantly higher than random even when no face, hair or clothes features appear in the image. The experiments were done by cutting a small background area from each face image, so that each face dataset provided a new image dataset which included only seemingly blank images. Then, an image classification method was used in order to check the classification accuracy. Experimental results show that the classification accuracy ranged between 13.5% (color FERET) to 99% (YaleB). These results indicate that the performance of face recognition methods measured using face image datasets may be biased. Compilable source code used for this experiment is freely available for download via the internet.
doi:10.1007/s11263-008-0143-7
PMCID: PMC2529479
PMID: 18776952
Face recognition; biometrics; FERET