Search tips
Search criteria

Results 1-10 (10)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Multiview stereo and silhouette fusion via minimizing generalized reprojection error☆ 
Accurate reconstruction of 3D geometrical shape from a set of calibrated 2D multiview images is an active yet challenging task in computer vision. The existing multiview stereo methods usually perform poorly in recovering deeply concave and thinly protruding structures, and suffer from several common problems like slow convergence, sensitivity to initial conditions, and high memory requirements. To address these issues, we propose a two-phase optimization method for generalized reprojection error minimization (TwGREM), where a generalized framework of reprojection error is proposed to integrate stereo and silhouette cues into a unified energy function. For the minimization of the function, we first introduce a convex relaxation on 3D volumetric grids which can be efficiently solved using variable splitting and Chambolle projection. Then, the resulting surface is parameterized as a triangle mesh and refined using surface evolution to obtain a high-quality 3D reconstruction. Our comparative experiments with several state-of-the-art methods show that the performance of TwGREM based 3D reconstruction is among the highest with respect to accuracy and efficiency, especially for data with smooth texture and sparsely sampled viewpoints.
PMCID: PMC4281271  PMID: 25558120
Multiview stereo; 3D reconstruction; Silhouette fusion; Convex relaxation; Reprojection error
2.  Nonverbal Social Withdrawal in Depression: Evidence from manual and automatic analysis 
Image and vision computing  2014;32(10):641-647.
The relationship between nonverbal behavior and severity of depression was investigated by following depressed participants over the course of treatment and video recording a series of clinical interviews. Facial expressions and head pose were analyzed from video using manual and automatic systems. Both systems were highly consistent for FACS action units (AUs) and showed similar effects for change over time in depression severity. When symptom severity was high, participants made fewer affiliative facial expressions (AUs 12 and 15) and more non-affiliative facial expressions (AU 14). Participants also exhibited diminished head motion (i.e., amplitude and velocity) when symptom severity was high. These results are consistent with the Social Withdrawal hypothesis: that depressed individuals use nonverbal behavior to maintain or increase interpersonal distance. As individuals recover, they send more signals indicating a willingness to affiliate. The finding that automatic facial expression analysis was both consistent with manual coding and revealed the same pattern of findings suggests that automatic facial expression analysis may be ready to relieve the burden of manual coding in behavioral and clinical science.
PMCID: PMC4217695  PMID: 25378765
Depression; Multimodal; FACS; Facial Expression; Head Motion
3.  Non-rigid Face Tracking with Local Appearance Consistency Constraint 
Image and vision computing  2010;28(5):781-789.
In this paper we present a new discriminative approach to achieve consistent and efficient tracking of non-rigid object motion, such as facial expressions. By utilizing both spatial and temporal appearance coherence at the patch level, the proposed approach can reduce ambiguity and increase accuracy. Recent research demonstrates that feature based approaches, such as constrained local models (CLMs), can achieve good performance in non-rigid object alignment/tracking using local region descriptors and a non-rigid shape prior. However, the matching performance of the learned generic patch experts is susceptible to local appearance ambiguity. Since there is no motion continuity constraint between neighboring frames of the same sequence, the resultant object alignment might not be consistent from frame to frame and the motion field is not temporally smooth. In this paper, we extend the CLM method into the spatio-temporal domain by enforcing the appearance consistency constraint of each local patch between neighboring frames. More importantly, we show that the global warp update can be optimized jointly in an efficient manner using convex quadratic fitting. Finally, we demonstrate that our approach receives improved performance for the task of non-rigid facial motion tracking on the videos of clinical patients.
PMCID: PMC4167730  PMID: 25242852
4.  The Painful Face – Pain Expression Recognition Using Active Appearance Models 
Image and vision computing  2009;27(12):1788-1796.
Pain is typically assessed by patient self-report. Self-reported pain, however, is difficult to interpret and may be impaired or in some circumstances (i.e., young children and the severely ill) not even possible. To circumvent these problems behavioral scientists have identified reliable and valid facial indicators of pain. Hitherto, these methods have required manual measurement by highly skilled human observers. In this paper we explore an approach for automatically recognizing acute pain without the need for human observers. Specifically, our study was restricted to automatically detecting pain in adult patients with rotator cuff injuries. The system employed video input of the patients as they moved their affected and unaffected shoulder. Two types of ground truth were considered. Sequence-level ground truth consisted of Likert-type ratings by skilled observers. Frame-level ground truth was calculated from presence/absence and intensity of facial actions previously associated with pain. Active appearance models (AAM) were used to decouple shape and appearance in the digitized face images. Support vector machines (SVM) were compared for several representations from the AAM and of ground truth of varying granularity. We explored two questions pertinent to the construction, design and development of automatic pain detection systems. First, at what level (i.e., sequence- or frame-level) should datasets be labeled in order to obtain satisfactory automatic pain detection performance? Second, how important is it, at both levels of labeling, that we non-rigidly register the face?
PMCID: PMC3402903  PMID: 22837587
active appearance models; support vector machines; pain; facial expression; automatic facial image analysis; FACS
5.  A Portable Stereo Vision System for Whole Body Surface Imaging 
Image and vision computing  2010;28(4):605-613.
This paper presents a whole body surface imaging system based on stereo vision technology. We have adopted a compact and economical configuration which involves only four stereo units to image the frontal and rear sides of the body. The success of the system depends on a stereo matching process that can effectively segment the body from the background in addition to recovering sufficient geometric details. For this purpose, we have developed a novel sub-pixel, dense stereo matching algorithm which includes two major phases. In the first phase, the foreground is accurately segmented with the help of a predefined virtual interface in the disparity space image, and a coarse disparity map is generated with block matching. In the second phase, local least squares matching is performed in combination with global optimization within a regularization framework, so as to ensure both accuracy and reliability. Our experimental results show that the system can realistically capture smooth and natural whole body shapes with high accuracy.
PMCID: PMC2811888  PMID: 20161620
Whole body scanner; 3D surface imaging; stereo vision; stereo matching; disparity
6.  Modelling and Recognition of the Linguistic Components in American Sign Language 
Image and vision computing  2009;27(12):1826-1844.
The manual signs in sign languages are generated and interpreted using three basic building blocks: handshape, motion, and place of articulation. When combined, these three components (together with palm orientation) uniquely determine the meaning of the manual sign. This means that the use of pattern recognition techniques that only employ a subset of these components is inappropriate for interpreting the sign or to build automatic recognizers of the language. In this paper, we define an algorithm to model these three basic components form a single video sequence of two-dimensional pictures of a sign. Recognition of these three components are then combined to determine the class of the signs in the videos. Experiments are performed on a database of (isolated) American Sign Language (ASL) signs. The results demonstrate that, using semi-automatic detection, all three components can be reliably recovered from two-dimensional video sequences, allowing for an accurate representation and recognition of the signs.
PMCID: PMC2757299  PMID: 20161003
American Sign Language; handshape; motion reconstruction; multiple cue recognition; computer vision
8.  Figure-Ground Segmentation Using Factor Graphs 
Image and vision computing  2009;27(7):854-863.
Foreground-background segmentation has recently been applied [26,12] to the detection and segmentation of specific objects or structures of interest from the background as an efficient alternative to techniques such as deformable templates [27]. We introduce a graphical model (i.e. Markov random field)-based formulation of structure-specific figure-ground segmentation based on simple geometric features extracted from an image, such as local configurations of linear features, that are characteristic of the desired figure structure. Our formulation is novel in that it is based on factor graphs, which are graphical models that encode interactions among arbitrary numbers of random variables. The ability of factor graphs to express interactions higher than pairwise order (the highest order encountered in most graphical models used in computer vision) is useful for modeling a variety of pattern recognition problems. In particular, we show how this property makes factor graphs a natural framework for performing grouping and segmentation, and demonstrate that the factor graph framework emerges naturally from a simple maximum entropy model of figure-ground segmentation.
We cast our approach in a learning framework, in which the contributions of multiple grouping cues are learned from training data, and apply our framework to the problem of finding printed text in natural scenes. Experimental results are described, including a performance analysis that demonstrates the feasibility of the approach.
PMCID: PMC2755638  PMID: 20160994
figure-ground segmentation; belief propagation; factor graphs; text detection
9.  Efficient Constrained Local Model Fitting for Non-Rigid Face Alignment 
Image and vision computing  2009;27(12):1804-1813.
Active appearance models (AAMs) have demonstrated great utility when being employed for non-rigid face alignment/tracking. The “simultaneous” algorithm for fitting an AAM achieves good non-rigid face registration performance, but has poor real time performance (2-3 fps). The “project-out” algorithm for fitting an AAM achieves faster than real time performance (> 200 fps) but suffers from poor generic alignment performance. In this paper we introduce an extension to a discriminative method for non-rigid face registration/tracking referred to as a constrained local model (CLM). Our proposed method is able to achieve superior performance to the “simultaneous” AAM algorithm along with real time fitting speeds (35 fps). We improve upon the canonical CLM formulation, to gain this performance, in a number of ways by employing: (i) linear SVMs as patch-experts, (ii) a simplified optimization criteria, and (iii) a composite rather than additive warp update step. Most notably, our simplified optimization criteria for fitting the CLM divides the problem of finding a single complex registration/warp displacement into that of finding N simple warp displacements. From these N simple warp displacements, a single complex warp displacement is estimated using a weighted least-squares constraint. Another major advantage of this simplified optimization lends from its ability to be parallelized, a step which we also theoretically explore in this paper. We refer to our approach for fitting the CLM as the “exhaustive local search” (ELS) algorithm. Experiments were conducted on the CMU Multi-PIE database.
PMCID: PMC2799037  PMID: 20046797
Constrained Local Models; Non-Rigid Face Alignment; Active Appearance Models
10.  The role of image registration in brain mapping 
Image and vision computing  2001;19(1-2):3-24.
Image registration is a key step in a great variety of biomedical imaging applications. It provides the ability to geometrically align one dataset with another, and is a prerequisite for all imaging applications that compare datasets across subjects, imaging modalities, or across time. Registration algorithms also enable the pooling and comparison of experimental findings across laboratories, the construction of population-based brain atlases, and the creation of systems to detect group patterns in structural and functional imaging data. We review the major types of registration approaches used in brain imaging today. We focus on their conceptual basis, the underlying mathematics, and their strengths and weaknesses in different contexts. We describe the major goals of registration, including data fusion, quantification of change, automated image segmentation and labeling, shape measurement, and pathology detection. We indicate that registration algorithms have great potential when used in conjunction with a digital brain atlas, which acts as a reference system in which brain images can be compared for statistical analysis. The resulting armory of registration approaches is fundamental to medical image analysis, and in a brain mapping context provides a means to elucidate clinical, demographic, or functional trends in the anatomy or physiology of the brain.
PMCID: PMC2771890  PMID: 19890483
Brain mapping; Image registration; Brain atlas

Results 1-10 (10)