|Home | About | Journals | Submit | Contact Us | Français|
We developed a semi-automated method based on a graph-cuts algorithm for segmentation and volumetric measurements of the cartilage from high-resolution knee magnetic resonance (MR) images from the Osteoarthritis Initiative (OAI) database and assessed the intra- and inter-observer reproducibility of measurements obtained via this method.
MR image sets from 20 subjects of varying Kellgren–Lawrence (KL) grades (from 0 to IV) on fixed flexion knee radiographs were selected from the baseline double-echo and steady-state (DESS) knee MR images in the OAI database (0.B.1 Imaging Data set). Two trained radiologists independently performed the segmentation of knee cartilage twice using the semi-automated method. The volumes of segmented cartilage were computed and compared. The intra- and inter-observer reproducibility were determined by means of the coefficient of variation (CV%) of repeated cartilage segmented volume measurements. The subjects were also divided into the low- (0, I or II) and high-KL (III or IV) groups. The differences in cartilage volume measurements and CV% within and between the observers were tested with t tests.
The mean (±SD) intra-observer CV% for the 20 cases was 1.29 (±1.05)% for observer 1 and 1.67 (±1.14)% for observer2, while the mean (±SD) inter-observer CV% was 1.31 (±1.26)% for session 1 and 1.79 (±1.72)% for session 2. There was no significant difference between the two intra-observer CV%’s (P = 0.272) and between the two inter-observer CV%’s (P = 0.353). The mean intra-observer CV% of the low-KL group was significantly smaller than that for the high-KL group for observer 1 (0.83 vs 1.86%: P = 0.025). The segmentation processing times used by the two observers were significantly different (observer 1 vs 2): (mean 49 ± 12 vs 33 ± 6 min) for session 1 and (49 ± 8 vs 32 ± 8 min) for session 2.
The semi-automated graph-cuts method allowed us to segment and measure cartilage from high-resolution 3 T MR images of the knee with high intra- and inter-observer reproducibility in subjects with varying severity of OA.
Recent advances in magnetic resonance (MR) imaging and quantitative image analysis have increased our understanding of the pathophysiologic changes in articular cartilage and bone in osteoarthritis (OA), including the serial monitoring of changes over time1. Moreover, recognizing the necessity of new imaging and biochemical markers for OA, the Osteoarthritis Initiative (OAI) was recently launched (www.niams.nih.gov/ne/oai). The OAI cohort consists of almost 4800 participants at four sites who undergo detailed clinical assessment annually that includes high-resolution, state-of-the-art 3 Tesla (3 T) MR imaging of both knees over 4 years.
A critical and time-consuming process in the quantitative measurement of cartilage from MR images is the segmentation of the cartilage region from the surrounding tissue. The task of cartilage segmentation is not trivial, however, because of the complexity of cartilage geometry, its small size, and the inherently low-contrast between the cartilage and surrounding tissues. The most laborious method, which is the technique most widely used in clinical studies, is the manual outlining of cartilage regions on each MR image by an analyst2–4. Although the process of manual segmentation is straightforward, it is resource intensive, time-consuming (requiring several hours to segment the cartilage in each knee) and is subject to analyst bias and error. To overcome these limitations, development of an automated approach is highly desirable5. To date, however, a fully automated technique for cartilage segmentation has not been established due to cartilage inhomogeneities, low tissue-contrast and shape irregularity, particularly in advanced OA cartilage1.
At present, semi-automated segmentation methods that combine the perceptual recognition of a human expert and the reliability of a computer seem the most appropriate and practical approach for cartilage segmentation. Although several semi-automated segmentation methods have been developed, they are still very time-consuming. An improved semi-automated segmentation method is critically needed to process a large quantity of MR imaging data such as that within the OAI data set. Recently, graph-cuts algorithms have been suggested as solutions to a wide range of image processing problems including segmentation6–8. While we recognize that graph-cut algorithms work well with the application of segmenting knee cartilage from MR images, to our knowledge no one has reported the use of graph-cut algorithms for this application. Thus, the purpose of this study is to propose a semi-automated method that is based on a graph-cuts algorithm for segmentation and volumetric measurements of the cartilage from high-resolution MR images and to assess the intra- and inter-observer reproducibility of the cartilage volume measurements obtained via this method from high-resolution OAI knee MR images in subjects with varying severity of OA.
An overview of the segmentation process using the graph-cuts algorithm is described in the flowchart shown in Fig. 1.
The observer reviews and scrolls through consecutive MR images and initializes the segmentation process by manually placing seeds that indicate cartilage vs non-cartilage (i.e., menisci, synovium, joint fluid) regions [Fig. 2(b)]. The initial seeding process typically starts in the mid-slice of the MR image data set. Algorithmically speaking, these seeds correspond to hard constraints that guide the computation of segmentation and they are particularly crucial for the segmentation of low-contrast, difficult-to-segment regions. In the current implementation, seeds consist of curvilinear lines of different color manually drawn by a radiologist using an electronic pen on a tablet pad. The exact position of the seeds does not affect the correct performance of the algorithm as long as the seeds are placed totally within the region to be segmented. It is not necessary to place seeds on every slice. While seeds are placed on every fifth to tenth slice in regions with relatively constant morphology and high tissue-contrast, they may need to be placed more frequently in order to demarcate and differentiate regions of relatively low tissue-contrast or in areas that are more difficult to segment. Intrinsically high-contrast interfaces such as the cartilage and bone marrow interface on fat-saturation MR images may not necessitate the placement of seeds for segmentation.
The seeds placed by the observer over one slice are propagated by the computer program backward and forward to neighboring slices. This process eliminates the requirement of manually placing seeds in every slice and substantially improves the efficiency of the segmentation. The placed seeds are copied and pasted automatically to the same location in the neighboring images. The copied seeds are thinned by one-pixel width as they propagate and are pasted to each subsequent neighboring image. The rationale for using this pixel-thinning (“decremental thickness”) scheme with the propagated seeds is to reflect the fact that the reliability of the initial seeding reduces with increased anatomical variation as images are further away from the initial seeding image. The propagation of seeds continues to subsequent images until the seeds become thinned out and disappear.
Using the seeds as hard constraints that narrow the search space for possible segmentations, the cartilage region is automatically segmented by means of a graph-cuts algorithm9 [Fig. 2(c)]. The optimal segmentation is defined as a segmentation that satisfies global optimum criteria which are described mathematically in Appendix.
While reviewing the initial segmentation result, the observer can revise structures that are deemed to be incompletely or erroneously segmented by placing additional or revised seeds and re-running the graph-cuts computation [Fig. 2(d)]. This iterative reseeding-computation process can be repeated multiple times until the segmented outcome is satisfactory [Fig. 2(e)].
To evaluate the intra- and inter-observer reproducibility of the segmentation method, MR image sets from 20 subjects were selected from the baseline double-echo and steady-state (DESS) MR images in the OAI database (0.1.1 Clinical Data set and 0.B.1 Imaging Data set). These subjects were chosen arbitrarily but with consideration of representing OA of varying severity as determined by Kellgren–Lawrence (KL) grade in Table I. The KL grade (0–IV) was determined by a central reading of each subject’s fixed flexion posterior–anterior knee radiograph: KL grade 0 = normal with absence of osteophyte or joint space narrowing, while KL grade IV = most severe OA. The number of the subjects in each group of KL grade 0, I, II, III, and IV was 2, 3, 6, 6, and 3, respectively. For the analysis, we divided the KL grades in to low- (0, I or II) and high-KL (III or IV). MR imaging was performed with a 3 T whole-body magnet (Trio; Siemens Medical Systems, Erlangen, Germany) with a quadrature transmit-receive knee coil (USA Instruments, Aurora, OH). Each MR image set is a three dimensional (3-D) sagittal DESS MR image data set with 140 × 140 mm2 field of view (FOV), 384 × 384 in-plane matrix size, 160 slices, and 0.36 × 0.36 × 0.70 mm3 voxel resolution that provides full coverage of the knee. Other MR acquisition parameters are repetition time (TR)/echo time (TE) = 16.32/4.71 ms, bandwidth = 185 Hz/pixel, flip angle =25°, and number of averages = 1.
Two trained radiologists (CT, SC) independently performed the segmentation of knee cartilage twice using the semi-automated method. To reduce the memory effect, there was a minimum of a 4-week interval between the initial and repeat segmentation measurements. In addition, we documented the image analysis and processing time required for the segmentation of each case, including both the time spent placing seeds and the computation time with the graph-cuts algorithm. The segmented cartilage of each knee was further divided into three compartments [femur, tibia, and patella; Fig. 3(c)].
For each of the 20 cases, cartilage was segmented twice by the two observers at two separate sessions. The volumes of segmented whole-knee and compartmentalized cartilage were computed. The intra- and inter-observer reproducibility were determined by means of the coefficient of variation (CV%) of repeated cartilage segmented volume measurements. The CV% for repeated measurements (i.e., the SD of paired measurements divided by the mean value computed for each subject, then averaged over subjects and multiplied by 100) is commonly used to describe precision errors in segmented cartilage volume measurements10,11. For each of the 20 cases, four CV%’s were calculated and compared: (1) the two repeated measurements by observer 1; (2) the two repeated measurements by observer 2; (3) the first measurements of observer 1 and 2; and (4) the second measurements of observer 1 and 2.
The differences between the repeated cartilage volume measurements within and between the observers were calculated and tested for statistical significance with paired t test. The differences in CV% within and between the two observers were also tested with paired t test. Furthermore, the 20 cases were divided into two groups according to the KL values, KL 0–II (low-KL group) and KL III–IV (high-KL group). The differences in the intra- and inter-observer volumes and CV% between these two groups were also tested for statistical significance with unpaired t test. The statistical analysis was performed with SPSS Statistical Software (Version 16.0, SAS Institute, Inc., Cary, NC). The level of significance was set at 0.05.
Two examples of MR images with cartilage segmentation are shown in Fig. 2 (a subject with KL grade II) and Fig. 3 (a subject with KL grade IV). Successful segmentation of cartilage was performed in all 20 cases.
The cartilage volume measurements of whole knee by the two observers at two different sessions in the 20 cases are summarized in Table I. The mean ± SD volume measurements of the segmented whole-knee cartilage were (24.85 ± 5.76, 25.26 ± 5.88 cc) for (observer 1, 2) at session 1 and (25.18 ± 5.87, 24.86 ± 5.81 cc) for (observer 1, 2) at session 2. The mean and percent intra-observer differences between the two sessions (i.e., session 1 minus session 2) were −0.33 cc and −1.30% for observer 1 and +0.40 cc and +1.60% for observer 2. These small differences between the sessions were significant in both observers (P = 0.014 for observer 1 and P = 0.008 for observer 2) and the differences were in opposite directions. In addition, the mean and percent inter-observer differences between the two observers (i.e., observer 1 minus observer 2) were −0.41 cc and −1.62% for session 1 and +0.32 cc and +1.29% for session 2. These differences between the observers, although small, were also significant in both sessions (P = 0.004 for session 1 and P = 0.021 for session 2).
When the low- (0, I or II) and high-KL (III or IV) groups were compared, the mean cartilage volumes of whole knee of the low-KL group were consistently lower than those for the high-KL group in all comparisons (Table I). However, none of them reached statistical significance (P = 0.054–0.082). By compartment, the same trend of the mean cartilage volumes of the low-KL group lower than those for the high-KL group was observed in all comparisons. A statistical significance was observed in the femoral compartment (P = 0.34–0.050) not in the tibial (P = 0.358–0.509) and patellar (P = 0.121–0.245) compartments.
The intra- and inter-observer CV%’s of the whole knee cartilage volume measurements for the 20 cases are summarized in Table II. The mean (±SD) intra-observer CV% for the 20 cases was 1.29(±1.05)% for observer 1 and 1.67(±1.14)% for observer 2. These relatively small CV% values indicate that cartilage volumes measured with the semi-automated method were highly reproducible within each observer. The reproducibility for observer 1 was slightly higher than that of observer 2 but this difference was not statistically significant (P = 0.272). The mean (±SD) inter-observer CV% for the 20 cases was 1.31(±1.26)% for session 1 and 1.79(±1.72)% for session 2. This indicates that the whole knee cartilage volume measurements were highly reproducible between the two observers with the semi-automated method. The differences between the two sessions were not statistically significant (P = 0.353).
By compartment, the mean (±SD) intra- and inter-observer CV% of each compartmental cartilage volume measurements is summarized in Table III. All mean values were within 5%. The maximum mean intra-observer CV% was 3.63% for the high-KL patella compartment, while the maximum mean inter-observer CV% was 4.7% for the same group. No statistical significant reproducibility differences or trends for each of the three compartments were detected between the two observers or between the sessions.
When the low- and high-KL groups were compared for the whole knee cartilage volume measurements, the mean intra-observer CV% of the low-KL group tended to be smaller than that for the high-KL group for both observers (low- vs high-KL): (0.83 vs 1.86%) for observer 1 and (1.43 vs 1.95%) for observer 2. This was statistically significant only for observer 1 (P = 0.025) but not for observer 2 (P = 0.320). Similar to the intra-observer CV%, the mean inter-observer CV% of the low-KL group tended to be smaller than that for the high-KL group for both sessions (low- vs high-KL): (1.20 vs 1.45%) for session 1 and (1.30 vs 2.39%) for session 2. However, neither the session 1 (P = 0.676) nor session 2 (P = 0.164) differences in inter-observer CV% for low-KL vs high-KL grade cases were statistically significant. The comparisons by compartment between the low- and high-KL groups are shown in Table III. The mean intra-observer CV% of the low-KL group was smaller than that for the high-KL group in all but one (in the tibial compartment by observer 1). Statistically significant differences were noted in two comparisons: the femoral compartment by observer 1 (P = 0.013) and the patellar compartment by observer 2 (P = 0.017). For the inter-observer CV%, the mean of the low-KL group was consistently smaller than that for the high-KL group without an exception. However, statistically significant difference was observed only in the patellar compartment in session 2 (P = 0.017).
When the intra- and inter-observer CV% were compared for the whole knee cartilage volume measurements within the low-KL or high-KL group, none of the comparisons were statistically significant. The P-values for the intra-observer CV% comparison between observers 1 and 2 were 0.154 for the low-KL and 0.872 for the high-KL group, while those for the inter-observer CV% comparison between sessions 1 and 2 were 0.835 for the low-KL and 0.362 for the high-KL group.
The mean ± SD segmentation processing times spent on the 20 cases by the two observers at two sessions were (observer 1 vs 2) (49.1 ± 11.7 vs 33.5 ± 5.9min) for session 1 and (49.3 ± 8.2 vs 31.8 ± 7.9 min) for session 2. In all but one case, observer 1 spent more time than observer 2 in segmentation (P < 0.001). The mean ± SD segmentation processing times broken down between the low- and high-KL groups were (low-KL vs high-KL, P-value) (50.2 ± 14.0 vs 47.8 ± 8.6 min, P = 0.660) for observer 1 at session 1; (48.4 ± 8.9 vs 50.3 ± 7.5 min, P = 0.604) for observer 1 at session 2; (30.6 ± 4.8 vs 37.0 ± 5.3 min, P = 0.011) for observer 2 at session 1; and (31.5 ± 8.5 vs 32.1 ± 7.7 min P = 0.860) for observer 2 at session 2. Thus, the significant difference in the segmentation time between the low- and high-KL groups was noted only in observer 2 at session 1.
Since only 1–2 min was required for the computation of the segmentation by the graph-cuts algorithm in each case, more than 90% of the semi-automated segmentation processing time was used by the observers to place seeds. The number of iterations for reseeding and computation was tracked for each semi-automated segmentation process; it varied according to the observer and the complexity of the knee anatomy of the test cases. The mean number (±SD, min–max) of iterations for reseeding and computation was 4.7 (±0.66, 2–5) for observer 1 and 3.5 (±0.85, 2–5) for observer 2.
Degradation and eventual loss of articular cartilage are believed to be crucial elements in the pathophysiology of OA12. Morphometric assessment of cartilage structure such as volume, thickness, and area using MR images has been shown to provide an accurate and precise measure for OA progression1. To derive these measurements, cartilage needs to be accurately and precisely segmented from surrounding tissues. The task of cartilage segmentation is not trivial because of the complexity of cartilage geometry, small size, and the inherently low-contrast between the cartilage and surrounding tissues.
The most laborious method is the manual outlining of cartilage regions on each MR image by an analyst2–4, which is the technique most widely used in clinical studies. Although the process of manual segmentation is straightforward, it is resource intensive, time-consuming (requiring several hours to segment the cartilage in each knee from high-resolution MR images) and is subject to analyst bias and error. To overcome these limitations, development of an automated approach is highly desirable5. To date, however, a fully automated technique for cartilage segmentation has not been established due to cartilage inhomogeneities, low tissue-contrast and shape irregularity, particularly in advanced OA cartilage1.
At present, semi-automated segmentation methods that combine the perceptual recognition of a human expert and the reliability of a computer seem the most appropriate and practical approach for cartilage segmentation. An optimally implemented semi-automated segmentation method will yield accurate and precise segmentation results in a reasonable amount of time with minimal variability. A number of semi-automated segmentation techniques have been proposed and published, including edge detection5, region growing4,13–17, active shape models16, b-spline snakes18–21, live wires22, and multispectral analysis23. An extensive review of these methods is provided in Eckstein et al.10. The success of these techniques has been limited, in part because they were developed prior to the advent of a high-resolution MR imaging (3 T) equipped with newer imaging sequences and superior image processing algorithms. As a result, despite a plethora of semi-automated methods, most large-scale studies have still relied on manual segmentation of cartilage structures. Some investigators10 found that correction of the results of these semi-automated methods often took longer and was no more reliable than manual segmentation by skilled experts.
An improved semi-automated segmentation method is critically needed to process a large quantity of MR imaging data such as that within the OAI data set and other longitudinal studies. Recently, graph-cuts algorithms have been suggested as solutions to a wide range of image processing problems including segmentation6–8. While we recognize that graph-cut algorithms work well with the application of segmenting knee cartilage from MR images, to our knowledge no one has reported the use of graph-cut algorithms for this application. Thus, based on graph-cut framework, we have specifically developed and implemented a semi-automated segmentation method for the segmentation of knee cartilage. In this method, a user provides a simple manual initialization of the bone, cartilage and non-cartilage regions on sampled MR image sections. Cartilage is then automatically segmented by the graph-cuts algorithm on the whole volume of sections without requiring further user input. The segmented cartilage boundaries are then reviewed and revised as necessary by an expert observer. This interactive semi-automated approach, requiring only simple initiation and minimal interaction by the user, improves the efficiency and reproducibility of the cartilage segmentation process.
The results of our study demonstrate that knee cartilage can be segmented efficiently and reproducibly from 3 T high-resolution MR images using the semi-automated graph-cuts segmentation method that we have developed. The cartilage volume measurements were highly correlated and were closely matched within and between the readers. Nevertheless, there were small but statistically significant differences in the volume measurements between the observers and between the sessions. This shows the importance of measuring serial cartilage volumes by a single observer in a single session to reduce any precision errors and biases, particularly when the changes in cartilage volumes are small1. With our segmentation method, however, the differences were only on the order of 1.3–1.6% (in actual estimated volume) whether comparing two repeat segmentations of knee cartilage from the same MR image or comparing two readers segmenting the same MR image.
The cartilage volume measurements in our study were highly consistent and reproducible. The mean intra- and inter-observer CV%’s for the whole knee cartilage volume measurements were better than those for the compartmentalized (femur, tibia, patella) cartilage volume measurements. This is not surprising because additional variations are likely introduced during the cartilage compartmentalization process. When we compare the reproducibility of the three compartmentalized cartilage volume measurements in our study with the reported CV%’s (range of 2–8%) in various studies of knee cartilage segmentation and volumetric measurement from knee MR images4,10,11,20,24, our results for the mean intra- (1.18–3.57%) and inter-observer CV%’s (0.94–3.32%) are comparable or better than with the published results.
The mean image processing time of 30–50 min for the segmentation of the cartilage from the high-resolution MR image set of 160 slices is much faster than the segmentation by the manual delineation of cartilage on each of 160 slices. Although the comparison of the segmentation processing time with other semi-automated segmentation techniques is difficult because very few studies report this variable, our method seems twice faster than 75.4 min per knee published by Duryea et al.24 and approximately 10 times faster than that reported (57–78 min for a single patella or medial tibial plateau) by McWalter et al.25. In addition, because over 90% of the semi-automated segmentation processing time was used by the observer to place seeds, the processing time may be shortened further in the future with refinement of graph-cuts algorithm. Seed placement can be improved and automated with use of pattern recognition techniques (e.g., use of a priori information on shape, location and signal intensity of the cartilage).
Although the mean intra-observer CV% of observer 1 was lower than that for observer 2, no statistically significant differences were found in CV% within and between the observers. This suggests that the cartilage volume can be measured reproducibly using the semi-automated method without being significantly affected by the performance of individual trained observers. The intra-observer CV% may be reduced with more meticulous placement of seeding as evidenced by longer processing times spent by observer 1 (mean 49 min) than observer 2 (33 min). Such a trade-off between efficiency and reproducibility is not surprising but was not investigated in the current study.
The intra- and inter-observer CV%’s for the low-KL group were consistently lower than those for the high-KL group in both whole-knee cartilage and compartmentalized cartilage volume measurements. This finding may reflect that the segmentation of cartilage in advanced knee OA is less reproducible and more difficult than that for less advanced knee OA. No definite trend for the low- and high-KL groups was observed in the image processing time.
The intra- and inter-observer CV%’s of the semi-automated method were not 0%. In the manual segmentation method, each manually drawn boundary corresponds to the final segmentation outcome. Any variations in manual boundary delineation will likely result in reduced reproducibility and affect the precise quantitative measurement of knee cartilage. On the other hand, in our method, while a large portion of the segmentation operation was performed algorithmically by the computer based on the graph-cuts algorithm, the placement of seeds was determined subjectively by an observer. This subjective interpretation of what is considered cartilage vs background, particularly in the regions of low tissue-signal, would affect intra- and inter-reproducibility.
In conclusion, the semi-automated graph-cuts method allows us to segment and measure cartilage from high-resolution 3 T MR images of the knee with high intra- and inter-observer reproducibility in subjects with varying severity of OA.
This study was supported by a Research Grant from the Western Pennsylvania Chapter of the Arthritis Foundation.
A segmentation of a 3-D volume image V can be represented as a vector X = (x1,…, xv,…, xV), where v represents a voxel within V and each binary component xv is assigned to one of two values, either “cartilage” or “background”. Then, the optimal segmentation is the one that minimizes a cost function which is commonly described for the graph-cuts image segmentation as follows:
The first term Rv(xv) is called as the data-dependent term or the regional cost and represents the individual cost that incurs when each voxel v is assigned to the “cartilage” or the “background”, i.e., Rv(“cartilage”) or Rv(“background”). These regional costs are typically expressed by the negative logarithm value of the likelihood of voxel v being either “cartilage” or “background”, which is the conditional probability that voxel v has intensity value of Iv, given that voxel v belongs to either “cartilage” or “background”, i.e., Rv(“cartilage”) = –ln (P(Iv“cartilage”)) and Rv(“background”) = –ln (P(Iv“ background”)).
These two conditional probabilities are computed using the two histograms that have been acquired from intensity values of the two types of seed voxels (“cartilage” and “background”)26.
The smoothness term or the boundary cost term, Buv(xu,xv), represents the cost associated with a discontinuity between neighboring voxels u and v as defined within a neighborhood system N (i.e., the 18-connectivity in the current implementation) and is described by the following ad hoc function6:
This cost function assigns a high value to discontinuity between the voxels of similar intensities (Iu – Iv < σ) but a low value to discontinuity between the voxels of dissimilar intensities (Iu – Iv > σ). Consequently, the optimal segmentation Xopt is designed to cut through edges whose voxels are of different intensity values. The σ represents the standard deviation of the noise calculated from the air region outside of the body. In addition, the boundary cost function is formulated inversely proportional to the spatial distance between the voxels, dist(u,v).
Conflict of interest The authors have no conflict of interest. The author, CKK received research grant from AstraZeneca.