|Home | About | Journals | Submit | Contact Us | Français|
There is a growing concern about chronic diseases and other health problems related to diet including obesity and cancer. The need to accurately measure diet (what foods a person consumes) becomes imperative. Dietary intake provides valuable insights for mounting intervention programs for prevention of chronic diseases. Measuring accurate dietary intake is considered to be an open research problem in the nutrition and health fields. In this paper, we describe a novel mobile telephone food record that will provide an accurate account of daily food and nutrient intake. Our approach includes the use of image analysis tools for identification and quantification of food that is consumed at a meal. Images obtained before and after foods are eaten are used to estimate the amount and type of food consumed. The mobile device provides a unique vehicle for collecting dietary information that reduces the burden on respondents that are obtained using more classical approaches for dietary assessment. We describe our approach to image analysis that includes the segmentation of food items, features used to identify foods, a method for automatic portion estimation, and our overall system architecture for collecting the food intake information.
There is a growing concern about chronic diseases and other health problems related to diet including obesity and cancer. Dietary intake, the process of determining what someone eats during the course of a day, provides valuable insights for mounting intervention programs for prevention of many chronic diseases. Measuring accurate dietary intake is considered to be an open research problem in the nutrition and health fields. The increasing prevalence of obesity among the youth is of great concern  and has been linked to an increase in type 2 diabetes mellitus . Accurate methods and tools to assess food and nutrient intake are essential in monitoring the nutritional status of this age group for epidemiological and clinical research on the association between diet and health.
The collection of food intake and dietary information provides some of the most valuable insights into the occurrence of disease and subsequent approaches for mounting intervention programs for prevention. The assessment of food intake in adolescents has been evaluated in the past by a food record (FR), the 24-hour dietary recall (24 HR), and a food frequency questionnaire (FFQ) with external validation by doubly-labeled water (DLW) and urinary nitrogen –. Currently, there are too few validation studies in children to justify one particular method over another for any given study design.
The accurate assessment of diet is problematic, especially in adolescents . The availability of “smart” mobile telephones with higher resolution imaging capability, improved memory capacity, network connectivity, and faster processors allow these devices to be used in health care applications. Mobile telephones can provide a unique mechanism for collecting dietary information that reduces burden on record keepers. A dietary assessment application for a mobile telephone would be of value to practicing dietitians and researchers . Previous results among adolescents showed that dietary assessment methods using a technology-based approach, e.g., a personal digital assistant with or without a camera or a disposable camera, were preferred over the traditional paper food record . This suggests that for adolescents, dietary methods that incorporate new mobile technology may improve cooperation and accuracy. To adequately address these challenges, we describe a mobile telephone food record  that we have developed using a mobile device (e.g., a mobile telephone or PDA-like device) to provide an accurate account of daily food and nutrient intake. Fig. 1 shows the overall architecture of our proposed system, which we describe in detail in Section V. Our goal is to use a mobile device with a built-in camera, network connectivity, integrated image analysis and visualization tools with a nutrient database, to allow a user to discretely record foods eaten. Images acquired before and after foods are eaten can be used to estimate the amount of food and nutrients consumed , . We have deployed a prototype system on an iPhone. This prototype system is available only for testing, not for commercial distribution, and it is currently being tested by dietitians and nutritionists in the Department of Foods and Nutrition at Purdue University for various adolescent and adult controlled diet studies.
The paper is organized as follows. Section II reviews current dietary assessment methods. Section III describes the image analysis methods used for our system. Section IV illustrates methods for automatic portion estimation and visual refinement. Section V describes the deployment of our dietary assessment system on a mobile device. In Section VI, we present experimental results. We conclude with the discussion of our system, and future work in Section VII.
A review of some of the most popular dietary assessment methods is provided in this section. The objective here is to analyze the advantages and major drawbacks of these methods. This will demonstrate the significance of our mobile telephone food record which can be used for population and clinical based studies to improve the understanding of diet among adolescents.
The 24-hour dietary recall (24 HR) consists of a listing of foods and beverages consumed the previous day or the 24 hours prior to the recall interview. Foods and amounts are recalled from memory with the aid of an interviewer who has been trained in methods for soliciting dietary information. A brief activity history may be incorporated into the interview to facilitate probing (i.e., asking questions) for foods and beverages consumed. The Food Surveys Research Group (FSRG) of the United States Department of Agriculture (USDA) has devoted considerable effort to improving the accuracy of this method.
The major drawback of the 24 HR is the issue of underreporting of the food consumed . Factors such as obesity, gender, social desirability, restrained eating and hunger, education, literacy, perceived health status, age, and race/ethnicity have been shown to be related to under-reporting –. Youth, in particular, are limited in their abilities to estimate portion sizes accurately . The most common method of evaluating the accuracy of the 24 HR with children is through observation of school lunch and/or school breakfast  and comparing foods recalled with foods either observed as eaten or foods actually weighed. These recalls have demonstrated both under-reporting and over-reporting, and incorrect identification of foods.
The 24 HR is useful in population based studies; however, the preferred dietary assessment method for clinical studies is the food record. Depending on the primary nutrient or nutrients or foods of interest, the minimum number of food records needed is rarely less than two days. Training the subjects, telephoning with reminders for recording, reviewing the records for discrepancies, and entering the dietary information into a nutrient database can take a large amount of time and requires trained individuals .
The food record is especially vulnerable to under-reporting due to the complexity of recording food , . A study among 10–12 year old children found significant under-reporting of total energy intake (TEI) when the intake was compared against an external marker, doubly-labeled water (DLW) . As adolescents snack frequently, have unstructured eating patterns, and consume greater amounts of food away from the home, their burden of recording is much greater compared to adults. It has been suggested that these factors, along with a combination of forgetfulness and irritation and boredom caused by having to record intake frequently may be contributing to the under-reporting in this age group . Dietary assessment methods perceived as less burdensome and time-consuming may improve compliance .
Portion size estimation may be one contributor to under-reporting. In , it was found that 45 minutes of training in portion-size estimation among 9–10 year olds significantly improved estimates for solid foods which were measured by dimensions or cups, and liquids estimated by cups. Amorphous foods were estimated least accurately even after training and some foods still exhibited an error rate of over 100%. Thus, training can improve portion size estimation; however, more than one session may be needed and accuracy may be unattainable.
The number of days needed to estimate a particular nutrient depends on the variability of the nutrient being assessed and the degree of accuracy desired for the research question –. Most nutrients require more than four days for a reliable estimate , . However, most individuals weary of keeping records beyond four days which may decrease the quality of the records .
Another challenge in evaluating dietary assessment methods is comparing the results of the dietary assessment method to some measure of “truth.” This is best achieved by identifying a biomarker of a nutrient or dietary factor , . The underlying assumption of a biomarker is that it responds to intake in a dose-dependent relationship . The two methods that have widest consensus as valid biomarkers are DLW for energy ,  and 24-hour urinary nitrogen for protein intake , , . A biomarker does not rely on a self-report of food intake; thus, theoretically the measurement errors of the biomarker are not likely to be correlated with those of the dietary assessment method. Other biomarkers collected from urine samples include potassium and sodium . Plasma or serum biomarkers that have been explored are levels of ascorbic acid for vitamin C intake , , β-carotene for fruits and vegetables or antioxidants –. These latter markers are widely influenced by factors such as other metabolic pathways, smoking status and supplement use; thus, their interpretation to absolute intake is limited.
As one can see from the above discussion, measuring accurate dietary intake is considered to be an open research problem in the nutrition and health fields. There is a tremendous need for new methods for collecting dietary information. Preliminary studies have indicated that the use of a mobile device using a camera to obtain images of the food consumed may provide a more accurate method for dietary assessment. This is the goal of the mobile telephone food record described in the next sections.
There has been previous work reported for automatic recognition of some types of food items. Jimenez et al.  described an automatic fruit recognition system, which recognized spherical fruits in different situations such as shadows, bright areas, occlusions, and overlapping fruits. A three-dimensional scanner was used to scan the scene and generate five images to represent the azimuth and elevation angles, range, attenuation, and reflectance. The position of the fruits obtained by thresholding and clustering and the Circular Hough Transform was used to identify the center and radius of the fruits. A robust method to segment the food items from the background of color images was proposed in . A color image was converted to a high-contrast grayscale image from an optimal linear combination of the RGB color components. The image is then segmented using a global threshold estimated by a statistical approach to minimize the intraclass variance. The segmented regions were subjected to a morphological process to remove small objects, to close the binary image by dilation followed by erosion and to fill the holes in the segmented regions.
We have developed methods to automatically estimate the food consumed at a meal from images acquired using a mobile device. Our goal is to identify food items using a single image acquired from the mobile device. The system must be easy to use and not place a burden on the user by having to take multiple images, carry another device, or attaching other sensors to their mobile device. Our approach is shown in Fig. 2. Each food item is segmented, identified, and its volume is estimated. “Before” meal and “after” meal images can be used to estimate the food intake. From this information, the energy and nutrients consumed can be determined. In this section, we describe our methods some of which were presented earlier in , .
Our goal is to automatically determine the regions in an image where a particular food is located (segmentation) and correctly identify the food type based on its features (classification or food labeling). Automatic identification of food items in an image is not an easy problem. We fully understand that we will not be able to recognize every food. Some food items look very similar, e.g., margarine and butter. In other cases, the packaging or the way the food is served will present problems for automatic recognition. For example, if the food is in an opaque container then the we will not be able to identify it.
In some cases, if a food is not correctly identified or its volume is incorrect it may not make much difference with respect to the energy or nutrients consumed. For example, if our system identifies a “brownie” as “chocolate cake” there is not a significant amount of differences in the energy or nutrient content. Similarly, if we incorrectly estimate the amount of lettuce consumed this will also have little impact on the estimate of the energy or nutrients consumed in the meal due to the low energy content of lettuce , . Again, we emphasize that our goal is to provide a tool for better assessment of dietary intake to professional dietitians and researchers that is currently available using existing methods.
Our system uses various approaches to segment the food items in the image. In particular, we use connected component analysis, active contours, and normalized cuts. Since we are interested in measuring the amount of food in the image, we have developed a very simple protocol for users of our system , . This protocol involves the use of a calibrated fiducial marker consisting of a checkerboard (color checkerboard) that is placed in the field of view of the camera. This allow us to do geometric and color correction to the images so that the amount of food present can be estimated.
We have investigated a two step approach to segment food items using connected components . In the first step, the color image is converted to grayscale and thresholded to form a binary image. Our goal here is to separate the plate from the tablecloth. The plate was empirically found assuming it was brighter than the table cloth (similar process can be used if the plate is darker than the tablecloth). For segmenting the food items on the plate, the binary image is searched in eight-point connected neighbors for the low intensity value (i.e., 0) in the thresholded image. Since we used a fixed threshold, pixels corresponding to the food items might be labeled as the plate. As a result, we need to refine the estimates of the food locations. Next, the RGB image is converted to the YCbCr color space. Using the chrominance components, Cb and Cr, the mean value of the histogram corresponding to the plate was found. Pixel locations which were not segmented during the first step were compared with the mean value of the color space histogram of the plate to identify potential food items. These pixels were given a different label from that of the plate, then eight-point connected neighbors for the labeled pixels were searched to segment the food items.
Active contours are used to detect objects in an image using techniques of curve evolution. The basic idea is to deform an initial curve to the boundary of the object, under some constraints from the image. The use of active contours to segment food images is described in  where a snake model, a controlled continuity spline, is described. Energy functionals are needed to make snakes useful for image analysis problems. Three different energy functionals are used to detect features such as lines, edges and terminations. The edge functional is used in  to segment food items such as a pear and a potato. Similar approaches described in  and  also use the gradients of the image to locate edges. These methods are suitable for images with strong object boundaries, but are generally sensitive to the initialization of the active contour. Therefore, we prefer the region-base models – which identify each region of interest by using a region descriptor to guide the motion of the active contour. These methods are less sensitive to the initialization of the active contour, but tend to rely on intensity homogeneity in each of the regions to be segmented. In particular, we employed the approach described in  to partition an image into foreground and background regions. Let ui,0 be the ith channel of an image with i = 1,…, N and C the evolving curve. Let and be two unknown constant vectors. The goal is to minimize the following energy function
where μ > 0 and are parameters for each channel. In our implementation, we used the RGB color components of the image.
The active contours model works well when the food items are separated from each other; however, it sometimes fails to distinguish multiple food items that are connected. We use this approach in some of the controlled diet studies done by the nutritionist where simple types of food are given to test subjects for evaluation.
Normalized cut is a graph partition method first proposed by Shi and Malik . This method treats an image pixel as a node of a graph and considers segmentation as a graph partitioning problem. In this method, the image is modeled as a weighted, undirected graph. Each pixel is a node in the graph, and an edge is formed between every pair of pixels. The weight of an edge is a measure of the similarity between the pixels. The image is partitioned into disjoint sets (segments) by removing the edges connecting the segments. The optimal partitioning of the graph is the one that minimizes the weights of the edges that were removed (the cut). Shi's technique seeks to minimize the normalized cut, which is the ratio of the cut to all of the edges in the set. The technique uses a graph-theoretic criterion for measuring the “goodness” of an image partition, where both the total dissimilarity between the different groups as well as the total similarity within the groups are measured. The minimization of this criterion can be formulated as a generalized eigenvalue problem.
Various image features such as intensity, color, texture, contour continuity, motion are treated in one uniform framework. Let X(i) be the spatial location of node i, i.e., the coordinates in the original image I, and F(i) be a feature vector, we can define the graph edge weight connecting the two nodes i and j as
We used intensity and color as the image features for using normalized cut on food images.
Two types of features are extracted/measured for each segmented food region, color features and texture features. As noted above, as part of the protocol for obtaining food images the subjects are asked to take images with a calibrated fiducial marker consisting of a color checkerboard that is placed in the field of view of the camera. This allows us to correct for color imbalance in the mobile device's camera. For color features, the average value of the pixel intensity (i.e., the gray scale) along with two color components are used. The color components are obtained by first converting the image to the CIELAB color space. The L* component is known as the luminance and the a* and b* are the two chrominance components. For texture features, we use Gabor filters to measure local texture properties in the frequency domain. Gabor filters describe properties related to the local power spectrum of a signal and have been used for texture analysis . A Gabor impulse response in the spatial domain consists of a sinusoidal plane wave of some orientation and frequency, modulated by a two-dimensional Gaussian envelope and is given by
In our work, we use the Gabor filter-bank proposed in . It is highly suitable for our use where the texture features are obtained by subjecting each image (or in our case each block) to a Gabor filtering operation in a window around each pixel and then estimating the mean and the standard deviation of the energy of the filtered image. A Gabor filter-bank consists of Gabor filters with Gaussians of several sizes modulated by sinusoidal plane waves of different orientations from the same Gabor-root filter as defined in (3), it can be represented as
where = a−m(x cosθ + y sinθ),ỹ = a−m(−x sinθ + y cosθ), θ = nπ/K (K = total orientation, n = 0,1,…, K − 1, and m = 0,1,…,S − 1), and h(·,·) is defined in (3). Given an image IE(r, c) of size H × W, the discrete Gabor filtered output is given by a 2-D convolution
As a result of this convolution, the energy of the filtered image is obtained and then the mean and standard deviation are estimated and used as features. In our implementation, we divide each segmented food item into N × N non-overlapped blocks and use Gabor filters on each block. We use the following Gabor parameters: four scales (S = 4), and six orientations K = 6).
Once the food items are segmented and their features are extracted, the next step is to identify the food items using statistical pattern recognition techniques , . For classification of the food item, we use a support vector machine (SVM) –. A classification task usually involves training and testing data. Each element in the training set contains one class label and several “attributes” (features). The feature vectors used for our system contain 51 values, 48 texture features and three color features. The feature vectors for the training images (which contain only one food item in the image) are extracted and a training model is generated using the SVM. We use LIBSVM , a library for support vector machines.
The labeled food type along with the segmented image are sent to the automatic portion estimation module where camera parameter estimation and model reconstruction are utilized to determine the volume of food.
One of the challenging problems of image-based dietary assessment is the accurate estimation of food portion size from a single image. As we have indicated above, this is done to minimize the burden on the user. We have developed a method to automatically estimate portion size of a variety of foods through volume estimation. These “portion volumes” utilize camera parameter estimation and model reconstruction to determine the volume of food items, from which nutritional content is then determined.
Our volume estimation consists of camera calibration and 3-D volume reconstruction. Fig. 3 illustrates this process. Two images are used as inputs, one is the food image taken by the user, the other image is the segmented image described in the previous section. The camera calibration step estimates camera parameters, comprised of intrinsic parameters (distortion, the principal point, and focal length) and extrinsic parameters (camera translation and orientation). We use the fiducial marker discussed above as a reference for the scale and pose of the food item identified. The fiducial marker is detected in the image and the pose is estimated. The system for volume estimation partitions the space of objects into “geometric classes,” each with their own set of parameters. Feature points are extracted from the segmented region image and unprojected into the 3-D space. A 3-D volume is reconstructed by the unprojected points based on the parameters of the geometric class. Once the volume estimate for a food item is obtained, the nutrient intake consumed is derived from the estimate based on the USDA Food and Nutrient Database for Dietary Studies (FNDDS) . Next, we summarize the methods we have developed. A complete description of our volume estimation methods is presented in .
Both a spherical approximation model and a prismatic approximation model have been used to perform 3-D volume reconstruction in our work. Our spherical approximation model is inspired by Dandelin spheres to recover the radius and position of a sphere from a single view . One key to recovering sphere parameters is that the sphere is tangent to the ground plane. The method for fixing the position makes use of a particular arrangement of two spheres, a cone, and a plane, known as Dandelin spheres . The intersection of a plane and a cone forms an elliptical conic section. To estimate the sphere position, feature points from the elliptical region in the screen space are projected onto the table plane. We reorient the resulting points on to a two-dimensional plane to find ellipse parameters. This is achieved by first using the translation vector, followed by the inverse of the rotation matrix, which yields coordinate triples with negligible z-values. The ellipse of the shadow area is usually more elongated than that of the apparent contour. The ellipse parameters for the shadow area are recovered by estimating the ellipse that best fits in a least square sense to the contour points. Under perspective projection, the circumference of the apparent contour of a sphere is smaller than that of the sphere's great circle—the circle that cuts the sphere into two equal halves and shares its center. Thus, it is somewhat more difficult to estimate a radius with a perspective camera than under orthogonal projection. We use the method proposed by Heron  which describes the area of a triangle given the length of each side and the semiperimeter of the triangle to obtain the radius of a circle inscribed in this triangle.
To support general shapes of food items, we also developed a prismatic approximation model. We assume that the segmented region representing the food item corresponds exactly to the physical area upon which the food contacts the plate surface. This assumption is valid when the image is taken at a high angle, such that there are no self-occluding boundaries. Alternatively, the assumption is preserved for images acquired at a shallow (oblique) angle by manually supplying surface contact information, or automatic computation by symmetry cues. For each pixel on the boundary of a given segmented region, a vertex in the world space may be calculated as the intersection of back-projected screen rays with the table surface. Fig. 4 illustrates the 3-D volume construction of scrambled eggs using our prismatic approximation model. We obtain feature points on the boundary of a segmented region obtained from image segmentation. Fig. 4(b) shows extracted feature points on the boundary of scrambled eggs. Since the planar shape constructed with the extracted points is not always convex, we triangulate the planar polygon using Delaunay triangulation  and sum all the areas of the triangles to obtain the area of the planar polygon. Finally, this area is extruded towards the tangential direction of the table surface to produce the volume of the food item.
Interactive parameter adjustment enables the user to supply information that may be absent from the two-dimensional scene image with the implicit knowledge that they possess of the scene, as well as correct estimation errors in our reconstruction algorithm. Our visual refinement allows the user to reposition the spherical estimator volume at any point tangent to the table surface and adjust the radius, as shown in Fig. 5. The height of the prismatic estimator can be interactively adjusted with real-time feedback (this will be described in more detail in the next section).
We have developed two different configurations for our dietary assessment system: a standalone configuration and a client-server configuration. Each approach has potential benefits depending on the operational scenario.
The Client–Server configuration is shown in Fig. 1. In most applications this will be the default mode of operation. The process starts with the user sending the image and metadata (e.g., date, time, and perhaps GPS location information) to the server over the network (step 1) for food identification and volume estimation (step 2 and 3), the results of step 2 and 3 are sent back to the client where the user can confirm and/or adjust this information if necessary (step 4). Once the server obtains the user confirmation, food consumption information is stored in another database at the server, and is used for finding the nutrient information using the FNDDS database  (step 6), the FNDDS database contains the most common foods consumed in the U.S., their nutrient values, and weights for typical food portions. Finally, these results can be sent to dietitians and nutritionists in the research community or the user for further analysis (step 7). We have implemented our system on the Apple iPhone as the client. A prototype system has been deployed on the Apple iPhone as the client and we have verified its functionality with various combination of foods. A prototype of the client software has also been deployed on the Nokia N810 Internet Tablet.
It is important to note that our system has two modes for user input. In the “automatic mode,” the label of the food item, the segmented image, and the volume estimation can be adjusted/resized after automatic analysis by the user using the touch screen on the mobile device. These corrections will then be used for nutrient estimation using the FNDDS.
The other mode addresses the problem when no image is available. For some scenarios, it might be impossible for users to take meal images. For example, the user may not have their mobile telephone with them or may have forgotten to take meal images. To address these situations, we developed an Alternative Method in our system that is based on user interaction and food search using the FNDDS database . With the help of experts from the Foods and Nutrition Department at Purdue University, the Alternative Method captures sufficient information for a dietitian to perform food and nutrient analysis, including date and time, food name, measure description, and the amount of intake. For a more detailed description of the Alternative Method, please refer to .
The idea here is to perform all the image analysis and volume estimation on the mobile device. By doing the image analysis on the device, the user does not need to rely on network connectivity. One of the main disadvantages of this approach is the higher battery consumption on the mobile device. Optimization of the image analysis techniques is one of our priorities when designing our system. We are also exploring strategies to perform some parts of the image analysis on the mobile device and others on the server. Having a standalone configuration allows us to determine how each part of the process affects power consumption, processor utilization, and device memory. It also helps us detect what are the most resource demanding tasks so we can implement these tasks on the server.
Several controlled diet studies were conducted by the Department of Foods and Nutrition at Purdue University whereby participants were asked to take pictures of their food before and after meals . These meal images were used for our experiments. Currently, we have collected more than 3000 food images. To assess the accuracy of our various methods, it is important to develop groundtruth data for the images. For each image, we manually extracted each food item in the scene using a Cintiq Interactive Pen LCD Display and Adobe Photoshop. Given a meal image, we traced the contour of each food item and generated corresponding mask images along with the correct food labels. As a control, different individuals were asked to ground truth the same images and the results were shown to graduate students in the Department of Foods and Nutrition at Purdue University for evaluation. Since these were controlled studies the correct nutrient information was also available.
For our classification tests we considered 19 food items from three different meal events (a total of 63 images). All images were acquired in the same room with the same lighting conditions. Three experiments were conducted depending on the number of images used for training and the number of images used for testing. For these experiments, we used the groundtruth segmentation data described above to evaluate the performance of the classification. First, we considered 10% of images for training and the rest, 90%, for testing. In the second experiment we used 25% of images for training and 75% for testing. We then considered 50% of images for training and for testing in the third experiment. Table I presents results from the three experiments in terms of average correct classifications for all food items. A tenfold cross-validation is performed to include the mean and variance of the classification results.
Examples of correctly classified objects and misclassified objects are shown in Fig. 9. We randomly select training and testing data; therefore, when we consider only 10% of the data for training purposes, each data item has a large influence on the classifier's performance. Some foods are inherently difficult to classify due to their similarity in the feature space we use. Examples of such errors are scrambled eggs misclassified as margarine and Catalina dressing misclassified as ketchup. We also have shown from our experiments that the performance of the image segmentation plays a crucial role in achieving correct classification results.
To measure the accuracy of our volume estimation for both spherical and prismatic objects, we used seven food items (five spherical and two prismatic objects) for the experiment. No manual refinement for the spherical objects was performed in this experiment. The average error rates are summarized in Table II. Estimated radii had good agreement with measured radii in the spherical trials, falling within 0.07 inches for every fruit but one (the nectarine (considered the least spherical of the fruits) was overestimated by 0.14 inches). Nectarine radii derived from cross-sectional areas were within 0.01 inches of the direct radius measurement, the estimation error may be due partly to the choice of the cross-section used. The average volume error in the nectarine, by the use of a water displacement ground truth method (0.51%), was the smallest of all the fruits. However, by the radius method (7.17%), it was surpassed only by that of the plum (14.6%), for which there was fairly high disagreement between the two ground truth methods of obtaining volume estimates. The average error rate of volume estimates for oranges was smaller than those of the other spherical fruits as oranges were almost spherical objects. As shown in the experiment, it is often the case that the fruit deviates from the ideal spherical object. Hence, ellipsoid based approximation methods would be a better estimation. However, it is very challenging to extract the major diameter and minor diameter of a spherical object from a single view image in perspective projection. Whereas, volume estimation results on synthetic spheres were highly accurate, as ground truth values for volume are known exactly.
For prismatic objects, as shown in Table II, prismatic area and height were not as accurate with 10% volume error in the worst trial although the Jell-O estimation fared well. The brownie volume errors ranged from 6% to 14% as compared to the nominal volume. The error rates of the brownies were higher than those of the Jell-Os. This was attributed from the image segmentation since the boundary of the segmented regions for the brownies were not smooth. We also performed an experiment for analyzing the accuracy of gram weights estimated from the volume estimates, produced from our volume estimation process. We chose two food items, a garlic bread and a yellow cake. Table III shows the error rates between estimated and measured gram weights.
Nutrient information and meal images were collected from the controlled studies where a total of 78 participants (26 males, 52 females) ages 11 to 18 years used our system. The energy intake measured from the known food items for each meal was used to validate the performance of our system. Based on the number of images used for training, we estimated the mean percentage error of our automatic methods compared to nutrient data collected from the studies. With 10% training data, the automatic method reported within 10% margin of the correct nutrient information. With 25% training data, the automatic method improved to within 3% margin of the correct nutrient information. With 50% training data, the improvement was within 1% margin of the correct nutrient information. Our experimental results indicated that the use of a mobile device using a camera to obtain images of the food consumed is a valid and accurate tool for dietary assessment.
In this paper, we described the development of a dietary assessment system using mobile devices. As we indicated, measuring accurate dietary intake is considered to be an open research problem in the nutrition and health fields. We feel we have developed a tool that will be useful for replacing the traditional food record methods currently used. We are continuing to refine and develop the system to increase its accuracy and usability.
The authors would like to thank their colleagues and collaborators, T. Schap and B. Six of the Department of Foods and Nutrition and J. Chae and K. Ostmo of the School of Electrical and Computer Engineering at Purdue University for their help in collecting and processing the images used in their studies. More information about their project can be found at www.tadaproject.org.
This work was supported by the National Institutes of Health under Grants NIDDK 1R01DK073711-01A1 and NCI 1U01CA130784-01.The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Vikram Krishnamurthy.
Fengqinq Zhu (S'05) received the B.S. and M.S. degrees in electrical and computer engineering from Purdue University, West Lafayette, IN, in 2004 and 2006, where she is currently pursuing the Ph.D. degree in the area of communications, networking, and signal and image processing.
During the summer of 2007, she was a Student Intern at the Sharp Laboratories of America, Camas, WA. Her research interests include video compression, image/video processing, image analysis, and computer vision.
Marc Bosch (S'05) received the degree on electrical engineering from Technical University of Catalonia (UPC), Barcelona, Spain, in 2007, and the M.S. degree in electrical and computer engineering from Purdue University, West Lafayette, IN, in 2009, where he is currently pursuing the Ph.D. degree.
His research interest include image/video processing, video compression, image analysis, and computer vision.
Mr. Bosch received the Archimedes Award for the best undergraduate engineering thesis from the Science and Education Ministry of Spain.
Insoo Woo received the B.S. degree in computer engineering from Dong-A University, Busan, Korea, in 1998. He is currently pursuing the Ph.D. degree in School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN.
He was a Software Engineer during 1997 to 2006. He is a Research Assistant in the Purdue University Rendering and Perception Lab. His research interest is GPU-aided techniques for computer graphics and visualization.
SungYe Kim received the B.S. and M.S. degrees from Chung-Ang University, Seoul, Korea, in 1998 and 2000, respectively, both in computer science and engineering. She is currently pursing the Ph.D. degree in electrical and computer engineering at Purdue University, West Lafayette, IN,
She is a Research Assistant in the Purdue University Rendering and Perception Lab. From 2000 to 2006, she was a Research Engineer on a computer graphics team in the Electronics Telecommunications Research Institute in Korea. Her research interests include computer graphics, rendering techniques, illustrative visualization, mobile graphics, and mobile visual analytics.
Carol J. Boushey received the B.Sc. from the University of Washington, Seattle, the M.P.H. degree from the University of Hawaii, Manoa, and the Ph.D. degree from the University of Washington through the interdisciplinary nutrition program and the epidemiology program.
She is an Associate Professor at Purdue University, West Lafayette, IN. She is a member of the National Community Engagement Steering Committee of the Clinical and Translational Science Awards (CTSA) and the executive committee of the Community Health Engagement Program of Indiana's Clinical and Translational Sciences Institute. Her research interests include dietary assessment methods, adolescent dietary behaviors, school-based interventions, food insecurity, and applications of quantitative methods. Her career as a practicing dietitian included working for the Washington State Health Department, the Waianae Coast Comprehensive Health Center on Oahu, and the University of Hawaii. As a university faculty member, she has directed two multi-site randomized school trials, “No Bones About It!” and “Eat Move Learn”; and the statewide Safe Food for the Hungry program in Indiana. She serves on the Board of Editors of the Journal of The American Dietetic Association. She is the coeditor for the second edition of the Elsevier publication, Nutrition in the Treatment and Prevention of Disease, released in the spring of 2008. Her published research has appeared in book chapters and journals, such as Pediatrics, the Journal of Nutrition, and JAMA. She has presented on numerous occasions at regional, statewide, national, and international meetings. While pursuing the M.PH. degree, she was a Research Assistant with the Epidemiology Branch of the Cancer Research Center of Hawaii. Currently, she is a Registered Dietitian with the Commission on Dietetic Registration.
David S. Ebert (S'87–M'87–SM'04–F'09) is the Silicon Valley Professor of Electrical and Computer Engineering at Purdue University, West Lafayette, IN, a University Faculty Scholar, and Director of the Visual Analytics for Command Control and Interoperability Center (VACCINE), the Visualization Science team of the Department of Homeland Security's Command Control and Interoperability Center of Excellence. He performs research in novel visualization techniques, visual analytics, volume rendering, information visualization, perceptually based visualization, illustrative visualization, mobile graphics and visualization, and procedural abstraction of complex, massive data.
Prof. Ebert has been very active in the visualization community, teaching courses, presenting papers, cochairing many conference program committees, serving on the ACM SIGGRAPH Executive Committee, serving as Editor-in-Chief of the IEEE Transactions on Visualization and Computer Graphics, serving as a member of the IEEE Computer Society's Publications Board, serving on the IEEE Computer Society Board of Governors, and successfully managing a large program of external funding to develop more effective methods for visually communicating information.
Edward J. Delp (S'70–M'79–SM'86–F'97) was born in Cincinnati, OH. He received the B.S.E.E. (cum laude) and M.S. degrees from the University of Cincinnati and the Ph.D. degree from Purdue University, West Lafayette, IN. In May 2002, he received an Honorary Doctor of Technology from Tampere University of Technology, Tampere, Finland.
From 1980 to 1984, he was with the Department of Electrical and Computer Engineering, University of Michigan, Ann Arbor. Since August 1984, he has been with the School of Electrical and Computer Engineering and the School of Biomedical Engineering, Purdue University. In 2002, he received a Chaired Professorship and currently is The Silicon Valley Professor of Electrical and Computer Engineering and Professor of Biomedical Engineering. His research interests include image and video compression, multimedia security, medical imaging, multimedia systems, communication, and information theory.
Dr. Delp is a Fellow of SPIE, the Society for Imaging Science and Technology (IS&T), and the American Institute of Medical and Biological Engineering. He is Co-Chair of the SPIE/IS&T Conference Security, Steganography, and Watermarking of Multimedia Contents that has been held since January 1999. He was the Program Co-Chair of the IEEE International Conference Image Processing that was held in Barcelona in 2003. In 2000, he was selected a Distinguished Lecturer of the IEEE Signal Processing Society. He received the Honeywell Award in 1990, the D. D. Ewing Award in 1992, and the Wilfred Hesselberth Award in 2004, all for excellence in teaching. In 2001, he received the Raymond C. Bowman Award for fostering education in imaging science from the Society for Imaging Science and Technology (IS&T). In 2004, he received the Technical Achievement Award from the IEEE Signal Processing Society. In 2002 and 2006, he was awarded Nokia Fellowships for his work in video processing and multimedia security.
Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
Fengqing Zhu, School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907 USA (Email: ace/at/ecn.purdue.edu)
Marc Bosch, School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907 USA (Email: ace/at/ecn.purdue.edu)
Insoo Woo, School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907 USA (Email: ace/at/ecn.purdue.edu)
SungYe Kim, School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907 USA (Email: ace/at/ecn.purdue.edu)
Carol J. Boushey, Department of Foods and Nutrition, Purdue University, West Lafayette, IN 47907 USA.
David S. Ebert, School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907 USA (Email: ace/at/ecn.purdue.edu)
Edward J. Delp, School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907 USA (Email: ace/at/ecn.purdue.edu)