|Home | About | Journals | Submit | Contact Us | Français|
To determine whether computer-based analysis can detect features predictive of osteoarthritis (OA) development in radiographically normal knees.
A systematic computer-aided image analysis method (wnd-charm) was used to analyze pairs of weight-bearing knee X-rays. Initial X-rays were all scored as normal Kellgren-Lawrence (KL) grade 0, and on follow-up approximately 20 years later either developed OA (defined as KL grade ≥2) or remained normal.
The computer-aided method predicted whether a knee would change from KL grade 0 to grade 3 with 72% accuracy (P<0.00001), and to grade 2 with 62% accuracy (P<0.01). Although a large part of the predictive signal comes from the image tiles that contained the joint, the region adjacent to the tibial spines provided the strongest predictive signal.
Radiographic features detectable using a computer-aided image analysis method can predict the future development of radiographic knee OA.
Due to the increasing prevalence of knee osteoarthritis (OA) and related effects on functional limitation, reduced health-related quality of life, health care utilization and total joint arthoplasty, there is a growing need for clinical and scientific tools that can reliably detect knee OA early in its development. Conventional radiographic images remain the “gold standard” for the diagnosis of knee OA, but lack sensitivity for the detection of early disease. Classification systems have been developed that reliably quantify the presence and severity of radiographic features of knee OA (Kellgren & Lawrence, 1957; Kellgren, Jeffrey & Ball, 1963). However, these are based on the presence or size of osteophytes and the degree of joint space narrowing as assessed by human readers.
Computer-aided image analysis methods have the advantages of detecting subtle differences in textures and intensity variations within an image, without clinical bias. Such methods have been used to analyze changes in the radiographic texture of the bone in knee and hip OA (Boniatis et al., 2006, 2007; Lynch, Hawkes & Buckland-Wright, 1991; Messent et al., 2005; Podsiadlo, Wolski & Stachowiak, 2008; Podsiadlo et al., 2008; Shamir et al., 2008a).
We previously published the application and validation of a computer-aided image analysis method to a set of knee radiographs with Kellgren & Lawrence grades assigned by two independent readers as the gold standard (Shamir et al., 2008a). In that previous study, the method achieved detection of radiographs with KL grade 3 and 2 with 91.5% and 80.4% accuracy, respectively. Here we apply this method to test the hypothesis that radiographic features predictive of subsequent development of radiographic knee OA development can be detected years before radiographic classification by human readers.
Automatic image analysis usually applies a first-step of computing image features that numerically reflect the content of the image in a fashion that can be handled by pattern recognition tools. Here we use the WND-CHARM algorithm (Shamir et al., 2008b; Orlov et al., 2008), which first extract a generic set of image features that covers a broad range of image characteristics such as high-contrast features (e.g., object statistics), textures (e.g., Haralick, Tamura), statistical distribution of the pixel values (e.g., multi-scale histogram, first four moments), and factors from polynomial decomposition of the image. For image feature extraction we use the following algorithms, described more throughly in (Shamir et al., 2008b; Orlov et al., 2008):
In order to extract more image descriptors, the algorithms are applied not only on the raw pixels, but also on several transforms of the image and transforms of transforms. The image transforms are FFT, Wavelet (Symlet 5, level 1) two-dimensional decomposition of the image, and Chebyshev transform. Another transform that is used is Edge Transform, which is simply the magnitude component of the Prewitt gradient. The various combinations of the compound image transforms are described in Figure 1. The entire set of image features extracted from all image transforms described in Figure 1 consists of a total of 2633 numeric image descriptors. Detailed description and source code of the image features are available at (Shamir et al., 2008b; Orlov et al., 2008).
While this set of image features provides a numeric description of the image content, not all image features are assumed to be equally informative, and some of these features are expected to represent noise. In order to select the most informative features while rejecting noisy features, each image feature is assigned with a simple Fisher score (Bishop, 2006), and 85% of the features (with the lowest Fisher scores) are rejected and do not affect the classification. The Fisher score can be conceptualized as the ratio of variance of class means from the pooled mean to the mean of within-class variances, and therefore reflect the discriminative power of the feature. The feature vectors can then be classified by a simple weighted nearest neighbor rule, such that the feature weights are the Fisher scores.
The data used for this study are fully extended weight bearing radiographs taken during a regularly scheduled study visit of the Baltimore Longitudinal Study of Aging (BLSA) (Shock et al., 1984). The dataset included longitudinal conventional film-screen radiographs of the knees, AP, standing, with resolution of approximately 8–10 line pairs/mm. All images were taken at a standard distance and would have had the same magnification. Although the sizes of the actual knees can be different, no attempt was made to normalize for the size in order to preserve the signal of the low-level image features. The X-ray films were digitized using a UMAX PowerLook 1100 scanner at 300 DPI, and saved as 2550×3000 16-bit (14-bit effective) lossless TIFF files. All images were then normalized to a fixed mean and standard deviation. Each knee image was independently assigned a KL grade (0–4) by two trained readers as described by Hochberg, Lethbridge-Cejku & Tobin (2004), with discordant grades adjudicated by a third reader. It is important to note that all X-rays were taken from participants in the BLSA study without any specific selection criteria, and not necessarily from people who reported pain symptoms or were diagnosed as having OA in one of their other joints. This policy provided a uniform and unbiased representation across the study population.
In previous work (Shamir et al., 2008a) we used WND-CHARM to diagnose existing OA. Here we use a similar approach to analyzing scanned film X-rays to detect early signs of developing OA, or predict risk of developing OA in the future. As before, the method first detects the center of the knee joint and extracts 700×500 pixels around it to form a sub-image of a centered joint. Figure 4 is an example of a 700×500 joint center image used for the computer-aided image analysis. Finding the joint in a given knee X-ray image is performed by first downscaling the image by 0.1, and then scanning the image with a 15×15 shifted window. For each position, the Euclidean distances between the 15×15 pixels of the shifted window and 20 15×15 pre-defined joint center images are computed using Equation 1,
where Wx,y is the intensity of pixel x,y in the shifted window W, Ix,y is the intensity of pixel x, y in the joint image I, and di,w is the Euclidean distance between the joint image I and the 15×15 shifted window W.
Since the proposed implementation uses 20 joint images, 20 different distances are computed for each possible position of the shifted window, but only the shortest of the 20 distances is recorded. After scanning the entire (width/10–15)×(height/10–15) possible positions, the window that recorded the smallest Euclidean distance is determined as the center of the joint, and the 700 × 500 pixels around this center form an image that is used for the automated analysis. Since each image contains exactly one joint, and since the rotational variance of the knees is fairly minimal, this simple and fast method was able to successfully find the joint center in all images in the dataset.
Figure 2 summarizes the entire process of the data handling, from the acquisition of the film X-rays to the image classification.
The computer-aided OA detection method (Shamir et al., 2008a) was used to classify two sets of X-rays. The first set contained 39 pairs of X-rays wherein the first was classified by expert radiologists as KL grade 0 (normal), and the second, obtained ~20 years later, was classified as KL grade 3 (moderate OA). The second set contained 84 pairs of knee X-rays wherein both were classified by expert radiologists as KL grade 0 (normal) at both time points. The mean difference between the date of the initial classification and the follow-up was 21.4 years, with a standard deviation of 1.35 years. The mean age of the participants in both sets was set to 52 years by removing 22 images from the set of the knees that remained healthy (originally there were 106 images). Male/female ratio in the two sets was ~0.64 and ~0.61, respectively. Table 1 summarizes the distribution of the data by KL grade.
The automatic image classifier was trained using 30 images of future KL grade 3 and 30 images of future KL grade 0, and was tested using nine images from each of the two classes. The experiment was repeated 100 times, such that in each run different images of each class were randomly assigned for training and testing.
Experimental results show that the accuracy of predicting the development of moderate OA (KL 3) in a normal knee in the following ~20 years is 72% (P<0.00001), with sensitivity of 74% and specificity of 70%. In a similar experiment, the 39 future moderate OA X-rays were replaced with 25 X-rays of knees that later progressed to KL grade 2 (mild definite OA), such that 20 images were used for training, and five for testing. Experimental results show that the development of mild OA in the next ~20 years can be predicted with an accuracy of 62% (P<0.01). For KL grade 1 (doubtful OA), no prediction better than random was achieved (30 images for training, 9 images for testing). Table 2 summarizes the experimental result figures.
As described in Section 2, the early detection of moderate OA is determined by the system based on the similarity of a given X-ray image to known samples of KL-0 knees that later developed into KL grade 3. The raw image similarity values to each class of images are then normalized to the interval [0, 1], and represent the likelihoods of the image to belong in the different classes, as explained in (Shamir et al., 2008b). Figure 3 shows the ROC that is produced by changing the threshold similarity value for detecting a future KL-3 OA.
To determine the areas of the joint that provide the strongest predictive signal for the development of moderate OA, each X-ray image was split into 25 rectangular, equal-sized tiles. This provided 25 different datasets, where each dataset consisted of all tiles at a specific location in the image. Figure 4 shows the prediction accuracy of moderate OA development in the different areas of the knee joint.
As expected, a large part of the predictive signal comes from areas containing the joint. Interestingly, a substantial and even slightly higher signal comes from a part of the tibia just beneath the joint. This shows that the structure of the tibia is not only informative for the detection of present OA (Podsiadlo, Wolski & Stachowiak, 2008; Lankester et al., 2008), but also for early detection many years before radiographic signs of OA can be noticed in the X-ray. It also shows that alterations of the bone structure associated with the early stages of OA (Bolbos et al., 2008) starts before cartilage degeneration is noticeable using plain radiography, and can be detected at a much earlier stage. Other areas of the tibia and the femur that are more distant from the joint center were also tested, but provided no predictive signal.
The P values are computed automatically by the software used for the experiments (Shamir et al., 2008b), and simply reflect the probability of a binary distribution of the two sets that is equal or better than the classification accuracy. For instance, the P value for the detection of future KL grade 3 is the probability that the set of X-rays used for that experiment (future KL grade 0 X-rays and future KL grade 3 X-rays) can be randomly divided into two sets such that 72% or more of the future KL grade 3 X-rays fall within one set, and 72% or more of the future KL grade 0 fall within the other.
As discussed in Section 2, not all image features are expected to be equally informative. Figure 5 shows the Fisher scores (sum of the Fisher scores of all bins of a feature group) of the different features extracted from the different image transforms.
As the graph shows, the most informative image features are computed using the image transforms, such as the Zernike features computed from the Wavelet transform and the first four moment features computer from the Fourier transform of the image. These combinations are highly non-intuitive for the human perception. Informative image features computed using the raw pixels, such as Zernike and Chebyshev statistics, are based on the polynomial decomposition of the image and are also non-intuitive to the human eye. The informativeness of these features can be used by an algorithm such as WND-CHARM, which applies a systematic search among a large set of image features for the most informative content descriptors, and is not biased by an hypothesis or by visual features that are easier to sense by the human eye.
Here we showed that predictive radiographic features of OA development can be detected by computer-aided analytic methods years before radiographic OA is noticeable by the unaided eye. The prediction accuracy of 72% is certainly higher than random, and shows that radiographic evidence of OA can be detected using X-rays at the time when the joint is considered normal using the standard KL classification system. Furthermore, the results of this study suggest that the regions of knee image that contain the joint and the tibia just beneath the joint space provide the most predictive information.
We acknowledge the following study limitations. First, this study is comprised of a limited sample of images. Second, the X-rays available for use were obtained in the fully upright position with weight-bearing, and were not enhanced by use of the semi-flexed or fluoroscopically-aligned images. Third, this study focuses on radiographic OA development without regard to symptom presence or severity both of which are important clinical outcomes to include in future studies. Finally, because follow-up images were obtained during routine visits, we cannot determine when knees transitioned from normal to a higher KL grade.
As discussed in Section 3, the association between the mathematically expressed image content descriptors and their corresponding features of OA progression is not trivial, and it is difficult to link each specific image feature to its corresponding structural alteration of the joint. However, as shown by Figure 5, the most informative image content descriptors are the low-level image features such as Zernike polynomials and Chebyshev statistics, that measure small variations in the pixel values that can be difficult to sense by the unaided eye. As discussed by Boniatis et al. (2006), the low-level pixel intensity variations reflect anatomical structures in the joint (Bocchi et al., 1997), and correlate with biochemical, biomechanical and structural alterations of the articular cartilage and the subchondral bone tissues (Aigner & McKenna, 2002; MartelPelletier & Pelletier, 2003). These processes have been associated with cartilage degeneration in OA (Buckwalter & Mankin, 1997; Rodenacker & Bengtsson, 2003), and are therefore expected to affect the joint tissues in a fashion that can be sensed by the low-level texture and pixel variations of the radiograph.
Despite the aforementioned limitations, this study uniquely utilizes existing images that were routinely obtained in all willing participants at two points in time, and is therefore independent of the bias introduced by clinical indications (i.e. presence of painful symptoms). Similarly, this method of automated-image analysis is not subject to the bias inherent in clinical X-ray readings. The results imply that this data-driven analysis method can utilize radiographic images in their entirety, to detect features present in normal X-rays that are predictive of OA development and therefore could be applicable to large-scale observational studies, and evaluated for risk stratification in interventional studies.
This research was supported entirely by the Intramural Research Program of the NIH, National Institute on Aging.