Accurate reconstruction of 3D geometrical shape from a set of calibrated 2D
multiview images is an active yet challenging task in computer vision. The existing
multiview stereo methods usually perform poorly in recovering deeply concave and thinly
protruding structures, and suffer from several common problems like slow convergence,
sensitivity to initial conditions, and high memory requirements. To address these issues,
we propose a two-phase optimization method for generalized reprojection error minimization
(TwGREM), where a generalized framework of reprojection error is proposed to integrate
stereo and silhouette cues into a unified energy function. For the minimization of the
function, we first introduce a convex relaxation on 3D volumetric grids which can be
efficiently solved using variable splitting and Chambolle projection. Then, the resulting
surface is parameterized as a triangle mesh and refined using surface evolution to obtain
a high-quality 3D reconstruction. Our comparative experiments with several
state-of-the-art methods show that the performance of TwGREM based 3D reconstruction is
among the highest with respect to accuracy and efficiency, especially for data with smooth
texture and sparsely sampled viewpoints.
Multiview stereo; 3D reconstruction; Silhouette fusion; Convex relaxation; Reprojection error
The relationship between nonverbal behavior and severity of depression was investigated by following depressed participants over the course of treatment and video recording a series of clinical interviews. Facial expressions and head pose were analyzed from video using manual and automatic systems. Both systems were highly consistent for FACS action units (AUs) and showed similar effects for change over time in depression severity. When symptom severity was high, participants made fewer affiliative facial expressions (AUs 12 and 15) and more non-affiliative facial expressions (AU 14). Participants also exhibited diminished head motion (i.e., amplitude and velocity) when symptom severity was high. These results are consistent with the Social Withdrawal hypothesis: that depressed individuals use nonverbal behavior to maintain or increase interpersonal distance. As individuals recover, they send more signals indicating a willingness to affiliate. The finding that automatic facial expression analysis was both consistent with manual coding and revealed the same pattern of findings suggests that automatic facial expression analysis may be ready to relieve the burden of manual coding in behavioral and clinical science.
Depression; Multimodal; FACS; Facial Expression; Head Motion
In this paper we present a new discriminative approach to achieve consistent and efficient tracking of non-rigid object motion, such as facial expressions. By utilizing both spatial and temporal appearance coherence at the patch level, the proposed approach can reduce ambiguity and increase accuracy. Recent research demonstrates that feature based approaches, such as constrained local models (CLMs), can achieve good performance in non-rigid object alignment/tracking using local region descriptors and a non-rigid shape prior. However, the matching performance of the learned generic patch experts is susceptible to local appearance ambiguity. Since there is no motion continuity constraint between neighboring frames of the same sequence, the resultant object alignment might not be consistent from frame to frame and the motion field is not temporally smooth. In this paper, we extend the CLM method into the spatio-temporal domain by enforcing the appearance consistency constraint of each local patch between neighboring frames. More importantly, we show that the global warp update can be optimized jointly in an efficient manner using convex quadratic fitting. Finally, we demonstrate that our approach receives improved performance for the task of non-rigid facial motion tracking on the videos of clinical patients.
Pain is typically assessed by patient self-report. Self-reported pain, however, is difficult to interpret and may be impaired or in some circumstances (i.e., young children and the severely ill) not even possible. To circumvent these problems behavioral scientists have identified reliable and valid facial indicators of pain. Hitherto, these methods have required manual measurement by highly skilled human observers. In this paper we explore an approach for automatically recognizing acute pain without the need for human observers. Specifically, our study was restricted to automatically detecting pain in adult patients with rotator cuff injuries. The system employed video input of the patients as they moved their affected and unaffected shoulder. Two types of ground truth were considered. Sequence-level ground truth consisted of Likert-type ratings by skilled observers. Frame-level ground truth was calculated from presence/absence and intensity of facial actions previously associated with pain. Active appearance models (AAM) were used to decouple shape and appearance in the digitized face images. Support vector machines (SVM) were compared for several representations from the AAM and of ground truth of varying granularity. We explored two questions pertinent to the construction, design and development of automatic pain detection systems. First, at what level (i.e., sequence- or frame-level) should datasets be labeled in order to obtain satisfactory automatic pain detection performance? Second, how important is it, at both levels of labeling, that we non-rigidly register the face?
active appearance models; support vector machines; pain; facial expression; automatic facial image analysis; FACS
This paper presents a whole body surface imaging system based on stereo vision technology. We have adopted a compact and economical configuration which involves only four stereo units to image the frontal and rear sides of the body. The success of the system depends on a stereo matching process that can effectively segment the body from the background in addition to recovering sufficient geometric details. For this purpose, we have developed a novel sub-pixel, dense stereo matching algorithm which includes two major phases. In the first phase, the foreground is accurately segmented with the help of a predefined virtual interface in the disparity space image, and a coarse disparity map is generated with block matching. In the second phase, local least squares matching is performed in combination with global optimization within a regularization framework, so as to ensure both accuracy and reliability. Our experimental results show that the system can realistically capture smooth and natural whole body shapes with high accuracy.
Whole body scanner; 3D surface imaging; stereo vision; stereo matching; disparity
The manual signs in sign languages are generated and interpreted using three basic building blocks: handshape, motion, and place of articulation. When combined, these three components (together with palm orientation) uniquely determine the meaning of the manual sign. This means that the use of pattern recognition techniques that only employ a subset of these components is inappropriate for interpreting the sign or to build automatic recognizers of the language. In this paper, we define an algorithm to model these three basic components form a single video sequence of two-dimensional pictures of a sign. Recognition of these three components are then combined to determine the class of the signs in the videos. Experiments are performed on a database of (isolated) American Sign Language (ASL) signs. The results demonstrate that, using semi-automatic detection, all three components can be reliably recovered from two-dimensional video sequences, allowing for an accurate representation and recognition of the signs.
American Sign Language; handshape; motion reconstruction; multiple cue recognition; computer vision
Foreground-background segmentation has recently been applied [26,12] to the detection and segmentation of specific objects or structures of interest from the background as an efficient alternative to techniques such as deformable templates . We introduce a graphical model (i.e. Markov random field)-based formulation of structure-specific figure-ground segmentation based on simple geometric features extracted from an image, such as local configurations of linear features, that are characteristic of the desired figure structure. Our formulation is novel in that it is based on factor graphs, which are graphical models that encode interactions among arbitrary numbers of random variables. The ability of factor graphs to express interactions higher than pairwise order (the highest order encountered in most graphical models used in computer vision) is useful for modeling a variety of pattern recognition problems. In particular, we show how this property makes factor graphs a natural framework for performing grouping and segmentation, and demonstrate that the factor graph framework emerges naturally from a simple maximum entropy model of figure-ground segmentation.
We cast our approach in a learning framework, in which the contributions of multiple grouping cues are learned from training data, and apply our framework to the problem of finding printed text in natural scenes. Experimental results are described, including a performance analysis that demonstrates the feasibility of the approach.
figure-ground segmentation; belief propagation; factor graphs; text detection
Active appearance models (AAMs) have demonstrated great utility when being employed for non-rigid face alignment/tracking. The “simultaneous” algorithm for fitting an AAM achieves good non-rigid face registration performance, but has poor real time performance (2-3 fps). The “project-out” algorithm for fitting an AAM achieves faster than real time performance (> 200 fps) but suffers from poor generic alignment performance. In this paper we introduce an extension to a discriminative method for non-rigid face registration/tracking referred to as a constrained local model (CLM). Our proposed method is able to achieve superior performance to the “simultaneous” AAM algorithm along with real time fitting speeds (35 fps). We improve upon the canonical CLM formulation, to gain this performance, in a number of ways by employing: (i) linear SVMs as patch-experts, (ii) a simplified optimization criteria, and (iii) a composite rather than additive warp update step. Most notably, our simplified optimization criteria for fitting the CLM divides the problem of finding a single complex registration/warp displacement into that of finding N simple warp displacements. From these N simple warp displacements, a single complex warp displacement is estimated using a weighted least-squares constraint. Another major advantage of this simplified optimization lends from its ability to be parallelized, a step which we also theoretically explore in this paper. We refer to our approach for fitting the CLM as the “exhaustive local search” (ELS) algorithm. Experiments were conducted on the CMU Multi-PIE database.
Constrained Local Models; Non-Rigid Face Alignment; Active Appearance Models
Image registration is a key step in a great variety of biomedical imaging applications. It provides the ability to geometrically align one dataset with another, and is a prerequisite for all imaging applications that compare datasets across subjects, imaging modalities, or across time. Registration algorithms also enable the pooling and comparison of experimental findings across laboratories, the construction of population-based brain atlases, and the creation of systems to detect group patterns in structural and functional imaging data. We review the major types of registration approaches used in brain imaging today. We focus on their conceptual basis, the underlying mathematics, and their strengths and weaknesses in different contexts. We describe the major goals of registration, including data fusion, quantification of change, automated image segmentation and labeling, shape measurement, and pathology detection. We indicate that registration algorithms have great potential when used in conjunction with a digital brain atlas, which acts as a reference system in which brain images can be compared for statistical analysis. The resulting armory of registration approaches is fundamental to medical image analysis, and in a brain mapping context provides a means to elucidate clinical, demographic, or functional trends in the anatomy or physiology of the brain.
Brain mapping; Image registration; Brain atlas