|Home | About | Journals | Submit | Contact Us | Français|
To examine five available software packages for the assessment of abdominal adipose tissue with magnetic resonance imaging, compare their features and assess the reliability of measurement results.
Feature evaluation and test–retest reliability of softwares (NIHImage, SliceOmatic, Analyze, HippoFat and EasyVision) used in manual, semi-automated or automated segmentation of abdominal adipose tissue.
A random sample of 15 obese adults with type 2 diabetes.
Axial T1-weighted spin echo images centered at vertebral bodies of L2–L3 were acquired at 1.5 T. Five software packages were evaluated (NIHImage, SliceOmatic, Analyze, HippoFat and EasyVision), comparing manual, semi-automated and automated segmentation approaches. Images were segmented into cross-sectional area (CSA), and the areas of visceral (VAT) and subcutaneous adipose tissue (SAT). Ease of learning and use and the design of the graphical user interface (GUI) were rated. Intra-observer accuracy and agreement between the software packages were calculated using intra-class correlation. Intra-class correlation coefficient was used to obtain test–retest reliability.
Three of the five evaluated programs offered a semi-automated technique to segment the images based on histogram values or a user-defined threshold. One software package allowed manual delineation only. One fully automated program demonstrated the drawbacks of uncritical automated processing. The semi-automated approaches reduced variability and measurement error, and improved reproducibility. There was no significant difference in the intra-observer agreement in SAT and CSA. The VAT measurements showed significantly lower test–retest reliability. There were some differences between the software packages in qualitative aspects, such as user friendliness.
Four out of five packages provided essentially the same results with respect to the inter- and intra-rater reproducibility. Our results using SliceOmatic, Analyze or NIHImage were comparable and could be used interchangeably. Newly developed fully automated approaches should be compared to one of the examined software packages.
Increased central (abdominal) adiposity, and in particular, increased visceral adipose tissue (VAT), is related to numerous metabolic aberrations (including insulin resistance, type 2 diabetes mellitus, hepatic steatosis, hypertension and hyperlipidemia).1–7 Many direct and indirect methods have been used to quantify body fat, such as anthropometric measurements8 (for example, waist circumference), underwater weighing,8,9 bioelectric impedance,10 dual-energy X-ray absorptiometry,11,12 computed tomography (CT)13–16 and magnetic resonance imaging (MRI). However, at present, most published research endorses either CT or MRI as the best in vivo methods for accurate and reliable data about the amount and distribution of abdominal adipose tissue. Of these, MRI has the advantage of not exposing the subject to ionizing radiation. This is especially useful in research projects that require serial whole-body measurements to examine the response to diet or exercise.7,17
MRI assessment of adipose tissue has been validated in phantoms,18,19 animals14,20,21 and human cadavers.22,23 Abdominal adipose tissue measurement by in vivo MRI has been shown to be in good agreement with values produced by dissection and chemical analysis.17,20,22,24 Whole-body MRI data make it possible to assess the influence of weight loss and other interventions on regional body composition.25–28
Identifying and segmenting MR images accurately, and measuring abdominal VAT, subcutaneous adipose tissue (SAT) and total adipose tissue (TAT) precisely are of key importance. In the literature, MRI data appear as either area values (cm2) obtained from a single image or multiple images, or as volume values (cm3) derived using the tissue area measurements from multiple images. Recent studies suggest using single-slice images in the upper abdomen (L1–L3) to minimize scanning and data analysis time and cost. The results show that single images are a good surrogate (P<0.01–<0.05 5,6,29–33) for volume measurements of VAT or TAT.
A wide variety of software packages and methodologies has been described in the published literature on MR body fat measurements.14,17,34–39 Most approaches are composed of two steps: in the first step, based on the histogram a threshold between adipose and lean tissues is determined manually or automatically and, in the second step, the areas of SAT and VAT are delineated. The segmentation routine can also include data pre-processing steps like de-noising and smoothing of the MR images to facilitate segmentation.40
Semi-automated approaches support the user by determining a histogram-based threshold,30,41–46 boundary enhancement,47 histogram-based region growing41,47 or clustering.40 Hybrid algorithms, which incorporate anatomical knowledge with segmentation algorithms,32,33,38,39 have been successfully used to segment high contrast, artifact-free images. Unfortunately, the intensity variation at tissue boundaries in abdominal MR images is gradual and not sharp causing poor differentiation of tissue interfaces. The complex visceral anatomy of the abdomen still requires substantial operator involvement for the delineation of VAT.
Recently, two fully automated approaches for the segmentation of abdominal adipose and lean tissues have been published.34–37 Positano et al.34–36 proposed a fuzzy c-means approach. An active contour algorithm was applied to identify and estimate the volumes of VAT and SAT. Liou et al.37 determined a threshold slice by slice based on the average signal intensity of the imaged slice. The segmentation consisted of a four-step procedure in which masks for the body contour, non-fat tissue, TAT, VAT and SAT were applied to determine the tissue boundaries. The results were visually examined by a physician and compared to a manual segmentation approach.
To our knowledge, no study has systematically determined the difference between software packages that are commonly used to measure abdominal adipose tissue. Being able to apply a fast, reliable and accurate routine to segment abdominal adipose tissue is a key element in obtaining reliable data. This study examines the software packages that were cited most frequently for the estimation of abdominal adipose tissue area. The goal is to compare and contrast their features and to compare the reproducibility and reliability of the quantitative measurement results. The focus is on the software packages; the study will not compare various MR methodologies (for example, extrapolation of single-slice versus multi-slice MRI acquisitions), MRI sequences or examine the influence of the measurement site (L1–L2, L2–L3, L4–L5, umbilicus), subsampling or the position of the volunteer with regard to the isocenter of the scanner during the data acquisition.
A random sample of 15 obese adults with type 2 diabetes from the Fatty Liver Ancillary Study to the Action For Health in Diabetes (Look AHEAD) trial was selected. The sample consisted of 10 men and 5 women; 5 African-American (2 men) and 10 Caucasian (3 men) subjects, aged 53–76 years (mean age 65 years), body mass index (BMI) 25.8–45.5 kg/m2 (mean BMI 34.5 kg/m2), waist circumference 93.9–131 cm (mean 111cm). Look AHEAD is a multicenter randomized clinical trial to examine the effects of a lifestyle intervention designed to achieve and maintain weight loss over the long term on the incidence of cardiovascular disease (CVD) events and mortality. The design of the trial has been previously published.48 The single-center Fatty Liver Ancillary study looked at non-alcoholic fatty liver disease within this group of subjects. All volunteers provided written, informed consent to participate. Both the ancillary study and the Look AHEAD trials were approved by the Western Institutional Review Board.
Axial T1-weighted spin echo images (8 slices, 10 mm thickness, 1 mm interslice distance, repetition time/echo time = 150/4.6 ms, acquisition matrix 256 × 256, field of view 53.0cm × 53.0cm) centered at vertebral bodies of L2–L3 were acquired with a Philips (Philips Medical System Best, The Netherlands) Gyroscan NT Intera 1.5T scanner using the body coil with the patient in supine position. A breath–hold technique was applied to avoid breathing-induced artifacts. An example of the anatomic position of the axial slices on a mid-sagittal image is shown in Figure 1. The observers rated the quality of all MR images on a scale of 1–6 (6 being optimal signal-to-noise ratio, 1 being unreadable). The average MR image quality was 4.6 (range 3 to 6).
Software packages were identified by using Internet search engines (for example, Google, Yahoo, AltaVista), as well as a review of the published MRI studies and reference sections. Keywords were used to limit the search to imaging software used in the evaluation of abdominal adipose tissue in MRI. In addition, MRI researchers were consulted to identify any additional MRI analysis software packages. The final selection was based on frequency of citation in the reviewed literature, availability and cost.
First, we made note of the URL of each software package in our survey. The year each package was first released and the version used were included in our study as an index of the program’s stability, popularity and maturity. We noted whether a package is freeware, including in-house software that was made available upon request, shareware (software which is made available with a request to donate money to the authors) or commercially available. The software and operating system requirements for each software package were noted.
Next we evaluated each package on nine key features. The completeness of instructions for installation, download and set up was rated, and the installation medium was noted. The source for instructions and documentation for program package was observed. In our ratings, we differentiated between the completeness and the helpfulness of the user’s manual. We distinguished between the learning to use (including understanding how routines are performed) and operating the packages. Completeness of the software manual, the helpfulness of the manual, the ease of installation, the ease of learning how to use the software, using the software and the graphical user interface (GUI) design were rated on a five-point scale, where 1 was the most difficult or bad performance and 5 was very easy or very good performance. Two observers independently evaluated each package and the average rating for each feature was calculated. Ratings were also given for features tested and the MR image quality.
All packages are GUI based, and we rated the design and user friendliness of the GUI on a five-point scale. Image display features and image manipulation tools as well as the availability to store the segmented images and region of interest (ROI) measurements were documented. We examined segmentation routines and the average time an observer needed to segment 10 MR images was measured.
Finally, the intra- and inter-rater reproducibility of the cross-sectional area (CSA), SAT, VAT and TAT measurements were examined. The results obtained using the software packages were compared with each other.
The image data were transferred to personal computers in Digital Imaging and Communication in Medicine (DICOM) format. The platform for analysis depended upon the software package used. SliceOmatic, HippoFat, Analyze were installed on a personal computer equipped with an Intel Pentium 4 CPU, 2.4 GHz, 1 GB of RAM, the Philips EasyVision was installed on a Sparc Sun Ultra2 workstation and NIHImage was used on a Mac Powerbook G4, 1.5 GHz, 512 MB of RAM.
The three central slices out of eight acquired MR images were selected from each subject’s data set to define SAT, VAT, TAT and CSA. Given the high grayscale variation within the image regions representing fat tissue and the grayscale values of some other tissues like liver, the determination of an adequate threshold of the signal intensity was performed in each patient individually to distinguish between adipose and lean tissues on MR images. The images were then segmented into adipose and lean tissues. ROIs describing the whole CSA, the SAT and the VAT were created, and the respective areas were measured. CSA was defined as the sum of all pixels inside a line tracing the skin. Abdominal SAT area was defined as the area of adipose tissue between the skin and the outermost aspect of the abdominal muscle wall. All adipose tissue pixels within the intra-abdominal cavity at the innermost aspect of the abdominal and oblique wall musculature and the anterior aspect of the vertebral body were considered VAT. Retroperitoneal adipose tissue compartments were excluded from the VAT measurement, using anatomical points such as ascending and descending colon, aorta and inferior vena cava. For images in which the kidneys appeared, an oblique line was drawn from the aorta and inferior vena cava to the anterior border of the kidney and continued to the abdominal wall. High-intensity non-fat pixels arising from bone marrow, liver and fatty intestinal contents within the VAT were avoided or manually removed when possible. TAT was calculated by adding SAT and VAT (TAT = SAT+VAT); therefore, some small intramuscular deposits of adipose tissue were not accounted for. Similar to other published reports,5,7,18,22,25,29,33–37,40–45,47,49–63 VAT and SAT were not further subdivided.
To assess the reproducibility of this procedure, images of all volunteers were analyzed by two observers independently. In addition, one observer performed the segmentations on two separate occasions 3 months apart. Both observers were blinded to the results obtained earlier. Next, all software packages were compared against the NIHImage software package, which was chosen as the standard to compare against because it is free and widely available. Each observer used and rated the software packages in the same, pre-set order.
A series of tests were conducted to assess the reliability of the five MRI software packages among the 15 selected subjects. Intra- and inter-rater reliability were conducted for each software package individually using the intra-class correlation coefficient (ICC).64 In the intra-observer analyses, two measurements of the primary rater were compared. In the inter-observer analyses, the average of the primary raters’ results were compared to a second rater. For the inter-test comparison, the average of the primary raters’ measurements were compared among each of the programs. For this analysis, one-way analysis of variance was also used to calculate ICC. The coefficient of variability (CV), another index of reliability, was also calculated to assess the variation between replicate measurements. A low CV (for example, <10%) corresponds to a high ICC, indicating good precision in the measurement of each respective software package.64
Finally, graphical assessment against a line of perfect agreement (for example 45° line) was used across all three series of analyses. All statistical analysis was performed using Stata version 8.2 (Stata Corp LP, College Station, TX, USA).
The selection process is reflected in Figure 2. The original search identified 16 programs. Seven in-house programs were excluded because they were not available2,7,18,33,53,65–69 or had been discontinued. One commercial program was not available to us because it was part of an MR scanner system’s software. Two cited commercial packages were outdated.70,71 Finally, one commercial package was too costly ($4800) and was outside the budget of this project. Thus, a total of five software packages were examined as part of this study.
Table 1 presents an overview of the evaluation of the five software packages included in this study. NIHImage was the first software package each observer used, followed by SliceOmatic, Analyze, HippoFat and the Philips EasyVision workstation.
Analyze and HippoFat support multiple platforms, NIHImage only runs on Mac OS 9.2. 2 or OsX and SliceOmatic solely runs on Windows. The 126.96.36.199 version of Philips EasyVision used in this study was installed on a Sparc Sun Ultra 2 workstation, whereas newer versions of the Philips software are Windows based. HippoFat was the only software that required the ITT Visual Information Solution, Boulder, Co, USA (IDL) Virtual Machine (http://www.rsinc.com/idlvm/index.asp). IDL is a higher level commercial programming package for data analysis, visualization and cross-platform application development.
Analyze, HippoFat, NIH-Image and SliceOmatic provide instructions for installing the software. There was some variability in the ease of installment for these programs, with SliceOmatic getting the lowest rating on this feature (3 of 5). Since the EasyVision workstation was pre-installed, no installation instructions are provided for this system.
All packages provide a user’s manual; the HippoFat manual was very short and concise. NIHImage also offers online discussion groups. NIHImage and SliceOmatic provide web sites that answer frequently asked questions. Analyze, HippoFat and SliceOmatic offer email support. The commercial packages Analyze and SliceOmatic also offer user training workshops for a fee.
The ease of learning to use each software package ranged from 2 to 4, with EasyVision getting the highest rating. All other packages were considered harder to learn since they offered more options and had longer, more complex manuals. For ease of use, the scores ranged from 1 to 4. Both observers rated HippoFat the lowest because the fully automated segmentation did not provide acceptable results. Especially VAT and surrounding non-fat tissue were often misclassified. The subsequent manual correction of the segmentation was very tedious and time consuming. Finally, the GUI ratings ranged from 2 to 4. Examples of the highest and lowest rated GUI are shown in Figures 3a and b.
None of the software packages required pre-processing of the data to reduce noise levels. All packages could open and visualize the DICOM image data, except NIHImage, which did not recognize our specific data set and the data had to be imported in a ‘raw’ format. All packages displayed the DICOM header information as well as the image.
The software packages automatically picked a windowing and brightness that were appropriate for the data but these settings could also be manually modified. All program packages offered zooming, rotating and scaling of the image, except HippoFat, which had only limited options. NIH-Image, Analyze and SliceOmatic could save the measurements as either square centimeter or square millimeter as well as the segmented image in various formats. HippoFat only stored the measurement, while EasyVision could not be saved in an appropriate format (text file or segmented image) at all and the results had to be copied manually.
The segmentation procedure for each software package varies slightly. We tried to use a simple, fast and accurate approach for each package. NIHImage provides several segmentation approaches. We choose to separate SAT and VAT areas manually and then set a threshold value for separation of lean and fat tissues to the lowest point of bright pixels corresponding to adipose tissue. In the second step, ROIs were drawn using the tracing tool. The whole abdominal circumference and the outer margin of the abdominal cavity were determined to measure CSA and SAT. VAT was manually selected with the abdominal cavity. SliceOmatic is a package with numerous ROI routines. We selected a region-growing routine to measure SAT and VAT. The CSA was selected using a ‘snake’ (a rough manual outline will ‘fit’ the body contour based on the difference between pixel values of the body versus the background). Analyze provides a large variety of ROI-drawing routines. We manually separated SAT from VAT and used the ‘Auto Trace’ routine to select adipose tissue based on the pixel values. HippoFat is fully automated. It uses a fuzzy c-means clustering approach. The automated segmentation was rarely accurate and a manual correction was necessary for every one of our images. The software only measures adipose tissue, the area of lean tissue and, therefore, CSA could not be evaluated. The Philips EasyVision workstation does not provide many options, so we choose to perform a manual circumscribe SAT, VAT and CSA to measure these areas.
The average processing time was about 135 min to segment 10 images, with time decreasing as the observer became more familiar with the utilization of the software interface. As a rule, it took the novice user longer to learn and use the software packages, and there was substantially more variability in the time to analyze 10 images. For the more experienced user, there was little variability in time to analyze 10 images. This analysis time did seem to be related to the order in which the software packages were evaluated.
The intra-rater reliability for each software package is shown in Table 2 and Figure 4a. Overall, the reliability was excellent, ranging from a low of 0.934 for VAT intra-rater reproducibility of HippoFat to a high of 0.999 for CSA intrarater reproducibility of NIHImage and SliceOmatic. VAT and TAT measurements showed highest agreement using SliceOmatic, while HippoFat had the highest correlation for SAT.
The inter-observer reproducibility (Table 3 and Figure 4b) was highest for SliceOmatic for TAT and VAT. NIHImage was best with regard to CSA measurements. The agreement between the two observers was highest when using Analyze with regard to SAT measurements. In general, intra- and inter-rater reliability were lowest for VAT across all programs.
The results of the intra-class correlation comparing NIHImage to the other software packages are displayed in Table 4 and Figure 5. The values displayed are the results of a comparison of the average over two measurements of observer 1. These comparisons varied more widely. Analyze and NIHImage are closest with respect to SAT, while SliceOmatic correlates best to NIHImage for TAT and VAT measurements.
All the GUI-based packages we examined require minimal prerequisite computer skills and were perceived as very user friendly. We consider SliceOmatic a very good choice for segmentation of abdominal adipose tissue on MR images. It performed very well for all observed areas and the processing time was low for both observers. Since the agreement of the experts on the segmentation results proved to be equivalent for all examined software packages, we find it hard to recommend one software package solely based on the reproducibility. Analyze was rated lower because the multitude of options this package provides made it hard to determine an optimal segmentation routine. SliceOmatic offers a number of additional features, but the segmentation routine was easily determined and performed. HippoFat was rated slightly lower although the basic use is easy and straightforward, but the manual correction of mislabeled areas was very tedious.
The time it took to segment the MR images varied between the two observers and, again, NIHImage performed poorer than other software packages, most likely due to the fact that it was the first package we examined. Both observers had the lowest processing times using SliceOmatic. Although EasyVision required a manual segmentation, the processing time was equal to the time required to perform the semi-automated segmentation with Analyze. Using Analyze or NIHImage was perceived as more complicated and time consuming than segmenting images with SliceOmatic.
The amount of effort and time laboratory members are willing to commit to understanding and choosing between particular segmentation algorithms and software packages is an important consideration. The trade-off between manual interaction and performance is a significant factor. Manual interaction can improve accuracy by incorporating prior anatomical knowledge of the operator. The type of interaction can vary from complete manual delineating of adipose tissue to thresholding or selection of a seed point for region growing. In this study, the Philips EasyVision workstation was the only software package that did not offer any of these approaches, but the manual segmentation was straightforward and did not require more time than some of the computer-aided approaches, although the reproducibility of the VAT was not as good as the one of Analyze or SliceOmatic. Analyze, NIHImage and SliceOmatic required some user interaction for specifying initial parameters, like a threshold or the manual separation of SAT and VAT. The values of these parameters can significantly affect performance and results. An automated and unsupervised approach would allow data sets to be processed in a short time without any user interaction and would be inherently free from any variability introduced by the human observer. Positano et al.36 reported a high correlation (SAT: r=0.9917, P<0.0001; VAT: r=0.09601, P=0.0001) between the unsupervised method they proposed and manual segmentation of 20 patients (32 images each) using their software package HippoFat. They reported that the unsupervised method slightly overestimated the SAT and underestimated the VAT. Using the same software package for the automated segmentation of the MR images examined in this study resulted in unsatisfactory segmentation. This was most likely due to the complex structure of the viscera, patient movement, partial volume averaging, tissue inhomogeneity, poor contrast and poor signal-to-noise ratio or the image acquisition protocol, although the image quality did not affect any other software package to this extent. The results show that it requires substantial operator involvement for the re-delineation of VAT. Sometimes the SAT selection had to be adjusted as well. While the unsupervised algorithm itself was fairly quick, both observers perceived the manual corrections as time consuming and tedious.
Financial constraints might also affect the decision. NIH-Image and HippoFat were two free software packages. Many NIHImage users have added plug-ins, which are also free and can be downloaded. The upside of acquiring commercial software is the better support and a very comprehensive manual, which both Analyze and SliceOmatic offer.
Nonetheless the intra- and inter-rater reproducibility were very good for all packages. We decided to use NIHImage as the standard for our comparisons. We report intra-class correlation comparing NIHImage to all other packages, since it is a freely available program package, which has been frequently used to measure adipose tissue compartments. Using the pairwise correlations (data not shown) among all programs, we obtained similar results. The reproducibility of our measurements was affected by the quality of the image. Because of a smaller partial volume effect, the SAT could be best distinguished from the lean tissue and the inter-observer reproducibility is highest. The intra- and inter-rater reproducibility of VAT measurements were slightly lower. This might be due to the fact that signal of intestinal content with short T1 and bone marrow could be interpreted as VAT. Another issue is that the small amount of intramuscular and paravertebral adipose tissues may influence the accuracy of the measurement of VAT. However, it is still controversial whether such compartments of adipose tissue should be included in VAT. We tried to exclude these adipose tissues in VAT. Since our aim was the analysis of large amounts of image data such as in epidemiological studies, a simplified adipose tissue classification and ignoring small numbers of high-intensity non-fat pixels could be justified. When comparing the results of the five packages, we noted that the slightly inferior results for NIHImage, especially the low VAT reproducibility, might be due to the fact that it was the first software package the observers used. This can account for a larger number of misclassified pixels and as a result a higher variability of measurement results. It was, therefore, found more difficult to understand and it took longer to establish a routine, whereas the other packages benefited from the knowledge we had acquired using NIHImage.
Limitations to this study must be acknowledged. The reliability and reproducibility of MR assessment of adipose tissue might vary across degrees of adiposity, but only obese individuals were included in this study. However, we are reassured that in cadaver and animal studies of lean and obese subjects, the MRI areas of adipose tissue correlated well with dissection results.20,22 We evaluated only one fully automated program, one free software package and two commercially available programs, whereas a number of other programs are available. Most in-house packages had been developed to process MR images and estimate adipose tissue, none of the free software packages or the commercial packages were similarly specialized. Since we did not expect to find large difference between the commercial image processing packages, we selected the commercially available programs based on the cost and frequency of citations they received in other publications assessing abdominal adipose tissue. At the time this project started, only a few fully automated approaches had been published and had been made available. Recently published algorithms, like the one developed by Liou et al.37 may perform well or better and should be validated. Furthermore, we did not validate the software packages through use of a phantom, animal or cadaver model, but relied on previously published studies.17,20,22,24 Instead we compared the segmentation of semi-automated, automated and manual delineation approaches, none of which guarantees a perfect true model.
The fact that the results obtained with NIHImage and the results obtained with the four other software packages respectively correlate highly, supports our opinion that each of the examined software packages will allow for rapid, reliable and accurate measurements of abdominal SAT and VAT. We conclude that rather than trying to fully automate the segmentation process, the aim should be to support the medical expert with an image-based, objective segmentation tool, but without giving up the visual control of the expert and the expert’s possibility to interact.
In summary, four packages provided essentially the same results with respect to the inter- and intra-rater reproducibility, only the fully automated approach (HippoFat) proved slightly inferior. The choice of which package to adopt would depend on financial resources of the users as well as the user’s computer skills. NIHImage is free and can essentially perform most routines that Analyze and SliceOmatic offer. SliceOmatic has been widely used for MR image segmentation and is quick and reliable. Analyze requires a little more computer knowledge, but it is a very powerful software package that can be used for a large variety of image analysis purposes. The use of the EasyVision software is limited to a certain group of users since it is associated with a Philips scanner. To be able to compare measurements in longitudinal studies, researchers should consider reporting the reliability of their data, especially if more than one observer is evaluating the MR images. Although it may be beneficial to use the same software package for repeated measurements, our results using SliceOmatic, Analyze or NIHImage were comparable and could be used interchangeably. New fully automated approaches, like the one published by Liou et al.37 should be compared to the semi-automated measurements, since they could shorten the time and effort dramatically and perhaps provide even better reproducibility of the results.
The study was supported in part by NIH/NIDDK Grant RO1-DK060427 and UO1-DK57149. We thank Charlette Diggs, Project Coordinator for the Fatty Liver Ancillary Study, Kathleen Kahl and Terry Brawner, MR technicians, and the participants for their time and commitment.