|Home | About | Journals | Submit | Contact Us | Français|
There are two widely used scoring systems for knee MRI in OA and the strengths and weaknesses of each system in terms of ease of use and association with known risk factors and outcomes are unknown..
To compare WORMS and BLOKS scales using longitudinal MRI and x-ray data
In the Osteoarthritis Initiative (OAI), knee radiographs, long limb films for alignment and MRI's were acquired in the interval from 0 to 24 months follow-up. OAI MRI's from baseline and 24 months were read separately using BLOKS and WORMS scales. X-rays were scored semiquantitatively for joint space loss and long limb films were measured for alignment angle. We evaluated which of the WORMS or BLOKS cartilage loss scores best correlated with joint space loss on the x-ray and which was best predicted by varus malalignment on long limb film. To examine the validity of BML and meniscal scales, we tested which of WORMS or BLOKS baseline scores for BML or meniscus best predicted cartilage loss from baseline to 24 months. We critically evaluated strengths and weaknesses of each scoring system also.
Of 113 knees read longitudinally, 33 showed any cartilage loss using BLOKS and 30 using WORMS with high agreement between the scales. In the medial compartment, both BLOKS and WORMS picked up only 42% of the knees with x-ray joint space loss with similar specificity (88 vs. 86%). Varus knees were more likely to be a risk factor for medial cartilage loss in BLOKS (adj OR 5.9 (95% CI1.5, 24.0)) than in WORMS (adj OR 2.1 (95% CI 0.7,6.3)). WORMS BML scores predicted cartilage loss more strongly than any BLOKS BML variables and some BLOKS BML measures did not affect risk of cartilage loss at all. However, across the range of scores, meniscal tear scores in BLOKS predicted cartilage loss better for each abnormality than did WORMS meniscal tear scores and the meniscal signal abnormality scored in BLOKS but not in WORMS, predicted cartilage loss. BLOKS took longer and was more difficult to score longitudinally especially for bone marrow lesion scores.
In a comparison of instruments limited by small numbers of knees compared, BLOKS meniscal score was preferable to WORMS meniscal scale in predicting cartilage loss most likely because it includes potentially important pathology missed by WORMS. On the other hand, BML scoring in WORMS was preferable in that it better predicted later cartilage loss,was easier to score and did not include potentially extraneous measures. Neither method was definitively better for cartilage scoring.
The Osteoarthritis Initiative (OAI) and other longitudinal studies pose a great challenge for the semi-quantitative reading of MRIs. While extant semi-quantitative systems were developed to assess MRI's at one time point, in OAI knee MRIs were acquired repeatedly to track disease progression. Currently, prominent and important prognostic features such as bone marrow lesions (BMLs) and meniscal tears are evaluated using semi-quantitative readings, and there are a number of scales using different approaches that have been proposed to read these. Further, semiquantitative reading of cartilage is widely used. Thus, for at least hyaline cartilage, meniscal features and BMLs, choosing among different proposed semi-quantitative approaches is necessary.
To arrive at an optimal semiquantitative MRI scoring approach for longitudinal studies such as OAI, we compared two recently proposed and widely used instruments for reading MRIs of the knee, BLOKS (1) and WORMS(2). The first paper described how these two scoring systems differ in assessing cartilage loss, meniscal damage and BMLs and reported on agreement between BLOKS and WORMS for these scales(3). In the current paper, using evidence on predictive validity and feedback from the musculoskeletal (MSK) radiologists who performed the comparative exercise, our goal was to choose the best scoring system for each feature. For example, if meniscal scoring in BLOKS was superior in its predictive ability and comprehensiveness to that of WORMS, we would choose the meniscal approach to scoring for BLOKS but not necessarily choose BLOKS for other features. The choice for other features would depend similarly on how it performed compared to WORMS. We did not necessarily intend to choose the best overall instrument.
To make these selections, we carried out longitudinal readings of 115 MRI's from OAI at 2 time points and asked two expert MSK radiologists experienced in using both methods to read the MRI's using BLOKS and WORMS. Using these data, we addressed the predictive validity and feasibility of WORMS and BLOKS longitudinal readings. Lastly, we evaluated each of these scoring systems by feasibility concerning ease of use, time effectiveness and whether there was any awkwardness both in the cross-sectional or longitudinal scoring using these systems.
The selection of knees read in this study is described in the part I paper by Lynch et al (3). To briefly summarize, we selected knees from OAI knees that had baseline and 24 month MRIs and with a heightened risk of cartilage loss based on knee risk factors for loss. One knee per person was selected. 115 Knees studied had baseline K&L grade of 2-3 with OARSI grade 1 or 2 joint space narrowing from an OAI central reading or osteophytes and JSN from the OAI clinic reading. Plus they had one or more of the following: progression of JSN from baseline to 12 months, varus or valgus malalignment by >=2 degrees from a full limb radiograph or a large baseline BML.
The protocol for reading MRI's is detailed in the part I paper by Lynch et al(3). Briefly, two experienced MSK radiologists (AG, FWR) read the MRIs from each knee paired and blinded both to order and to subjects' clinical characteristics. Readers were not blinded to copathology on the MRI's. Each pair of knee MRIs was scored by the same reader using both WORMS and BLOKS in separate reading sessions separated by at least two weeks.
For assessing change between time points in cartilage lesions and BMLs using WORMS, readers scored changes of a full grade.
X-ray joint space loss was determined semiquantitatively (see (4)) using within grade changes from serial PA fixed-flexion films that were read blinded to sequence by each of two experienced readers. If there was disagreement about joint space loss, the reading was adjudicated by a panel of three readers. Using full limb films obtained at the 12 month examination, we measured Hip-Knee-Ankle angle (also known at HKA or mechanical axis) and relied on commonly used cutpoints to characterize limbs as varus, valgus or neutral (5).
We compared scales from WORMS and BLOKS for their predictive validity----for WORMS and BLOKS cartilage scores, this constituted the relation of change in cartilage score with joint space loss on PA x-ray in the same compartment. For concurrent JSN progression we used data on baseline to 24-month changes, if available, and if not, we used baseline to 12-month changes (OAI knee x-ray semi-quantitative reading data set versions 0.2, 1.2 and 2.2).
We recognize that MRI is more sensitive to cartilage loss than x-ray joint space loss (6), but joint space loss provided an independent surrogate measure of cartilage loss and the association of BLOKS and WORMS MRI-defined cartilage loss with x-ray joint space loss provided one means to compare BLOKS and WORMS. For BLOKS, we classified a compartment or knee as showing cartilage loss if there was an increase in the extent of either partial or full thickness loss. For WORMS, we classified any increase in grade as indicating cartilage loss. We focused on medial compartment cartilage loss and varus malalignment vs. neutral valgus because of the small number of limbs in the valgus category.
For BMLs and meniscal lesions, we compared baseline BLOKS and WORMS scales on their ability to predict subsequent cartilage loss using either BLOKS or WORMS cartilage score changes. We primarily used the odds ratios to quantify the associations with cartilage loss and used a compartment specific approach---testing bone marrow and meniscal lesions for their relation to cartilage loss in the same compartment. Since analyses in previous studies (7;8) have suggested that the largest bone marrow lesion has a similar or slightly greater effect on cartilage loss than the sum of lesion scores in a knee or compartment, we focused on the largest lesions.
We used sensitivity and specificity to describe the agreement of cartilage loss with x-ray joint space loss longitudinally and calculated exact 95% confidence intervals based on binomial probabilities. To compute odds ratios to evaluate the relation of malalignment with cartilage loss and of bone marrow lesions and meniscal pathology with cartilage loss, we used logistic regression with cartilage loss as the dependent variable. Multivariable models included adjustment for age, sex and body mass index (BMI).
Of the knees in the longitudinal reading study, 52% were drawn from male subjects and 54% of the subjects had BMI >=30. The knees often had lesions of interest at baseline: 85% had some degree of meniscal damage (by WORMS or BLOKS); 97% had at least a small BML and 99% had a cartilage lesion (WORMS >= grade 2) in at least one compartment.
Of the 113 knees which had readable MRI's for cartilage loss using both WORMS and BLOKS scales, 100 knees showed agreement between the methods on cartilage loss in the whole knee (see table 1). In fact, examining by compartment, the agreement was similar within each compartment; 8 compartments showed loss using BLOKS but no loss on WORMS and 5 compartments showed the opposite. It should be noted that 4 medial compartments at baseline had BLOKS scores at the ceiling (2 of these were eligible for medial joint space loss on longitudinal x-ray), whereas none of the WORMS scores had scores at the top of the scale.
We focused first on the agreement between cartilage loss assessed on MRI and joint space loss assessed on the x-ray (see Table 2). We found that there was a similar poor sensitivity for WORMS and BLOKS cartilage loss but better specificity for both instruments.
However, when we looked at the malalignment in the limb and whether that predicted cartilage loss on the MRI (see Table 3) we found that varus malalignment was a stronger predictor based on odds ratio evidence using the BLOKS than WORMS cartilage loss scales. For example, varus knees had only an increased adjusted odds of cartilage loss of 2.1 using the WORMS scale but using the BLOKS cartilage scale it rose to 5.9, substantially higher. The primary difference was in the proportion of neutral varus knees that showed worsening, 8 with WORMS and 4 with BLOKS.
Next, we focused on the BML scales of BLOKS and WORMS and compared them with respect to their association with WORMS defined cartilage loss on MRI (see Table 4a). The WORMS BML score, which focuses on the volume of the BMLs was a more powerful predictor of cartilage loss than were any elements of the BLOKS BML scale. WORMS BML score was also a more powerful predictor of BLOKS cartilage loss score (Table 4b). Although numbers were small, there was little evidence for a relation of adjacency of BML (the proportion of the BML adjacent to the subchondral plate) with subsequent cartilage loss (highest score for cartilage loss was 0.3 for the adjusted odds ratio). Other ways of assessing BMLs using the BLOKS scale also did not work as well to predict cartilage loss as did the WORMS BML scale.
Next we looked at the meniscal scales for WORMS versus BLOKS. In this case, we examined compartment specific cartilage loss as an effect of the maximal meniscal lesion in the same compartment. Although we collapsed (see Table 5) different scores on the scales, the individual scores did not show results different from the table. The WORMS maximal meniscal tear score showed a significant association using the adjusted odds ratio of 3.9 for minor tears and 2.2 for displaced tears or maceration. However, in the BLOKS scale, the associations were considerably stronger using the odds ratio as the metric. For example, signal abnormalities which are not scored in the WORMS scale at all showed a high odds ratio for cartilage loss, a finding which would have been missed entirely had the WORMS scale been used. The absolute risk of cartilage loss among categories with meniscal tears appear similar for BLOKS and WORMS scores, suggesting that the main difference between the scales is the inclusion in BLOKS of information on signal abnormalities. In terms of meniscal extrusion, WORMS and BLOKS scores which are similar in their approach to scoring extrusion yielded similar odds ratios for their association with cartilage loss.
We asked the two readers to identify challenges in using each of the instruments when reading longitudinally. Readers noted that the BLOKS scoring was considerably more time consuming estimating that for reading one knee over two visits, BLOKS reading takes 120 minutes vs. only 80 minutes for WORMS. Readers opined that the meniscal scales of BLOKS made more sense in terms of differentiating between clinically different types of meniscal tears with different scores, whereas the WORMS scale did not do this. However, it was felt that the BLOKS BML scales would be hard to use longitudinally as each BML had to be scored for its size which was confusing if BMLs merged or separated in follow up, a common event. The WORMS approach to this problem was to simply summarize the amount of the region that is encompassed by BMLs. This is easier to score longitudinally and even cross-sectionally than the BLOKS approach.
Using a sample of knees in which both BLOKS and WORMS scales were scored separately by the same readers, we have compared the validity of BLOKS and WORMS scales and found modest differences between the scales in terms of their performance. The clear cut differences were that the WORMS BML scoring which is based solely on the size or volume of all BMLs in a given subregion had consistent superiority over BLOKS approach to BML scoring. First, WORMS BML scores predicted cartilage loss better than the size variable in the BLOKS scale. Further, the BLOKS scale for BML's includes measures such as adjacency and cyst percentage which were tedious to score and did not appear to provide any predictive value with respect to cartilage loss. Scoring the change in size of individual BMLs was especially challenging when, as often occurred, BML's split or merged in follow-up images, a problem not encountered in WORMS scoring. It has to be noted that cyst assessment using WORMS which is separate from the BML score was not part of this analysis.
The other main finding was that the meniscal scoring of BLOKS appeared consistently superior to that of WORMS in several ways. First the inclusion of signal abnormality which is scored on the BLOKS scale but does not have an equivalent score in WORMS predicted cartilage loss based on the small amount of data we collected. This finding would be missed and many menisci with this abnormality would be characterized as completely normal if the WORMS scale for meniscal scoring were used. Meniscal scores in BLOKS predicted cartilage loss better than WORMS scores. Also, specific types of tears were differentiated in BLOKS and not in WORMS. A deficiency of the BLOKS meniscal scale is that it does not differentiate between partial and full maceration (or partial vs. full meniscal loss). Also, BLOKS defines subregions for cartilage and BMLs scores that are not concordant with the location of meniscal damage, making it difficult to match a particular meniscal lesion geographically with a lesion in cartilage or bone. This latter deficiency can be rectified by using the WORMS regions to score cartilage and bone and the former can be solved by adding a partial maceration meniscal loss to the BLOKS meniscal scale.
We should note that our comparisons are based on small numbers. For malalignment and cartilage loss (where we suggested that BLOKS had an edge), a change in reading of 4 knees (of over 100) would have negated our findings. A similar small difference was seen for BML's and cartilage loss where we suggested that WORMS was preferable to BLOKS. Since BLOKS and WORMS yield very similar readings (see Part I paper) and even though we read over 100 knees, it is clear that a major differentiation of BLOKS and WORMS would probably require reading 500 knees using the same approach as ours, an effort that would be highly expensive and time consuming. We note that we combine the data analysis with reader insights which may be as valuable as the analysis.
One other study has compared BLOKS and WORMS BML scoring (4) and reported that BLOKS BML size score correlated better with pain than did WORMS BML score. However, that study measured WORMS on nonfat suppressed sagittal MRI's and BLOKS on fat suppressed coronal MRI's from the same study (BOKS) which may have invalidated the comparison.
It is less clear which of the two scales is preferable for scoring cartilage loss. On the one hand, the BLOKS scale clearly performed better in terms of detecting effects of malalignment. On the other hand, the WORMS cartilage scale was slightly superior in agreeing with joint space loss. We note that the WORMS scale for cartilage scoring clearly is not a linear scale. For example, going from 2 to 3 on the WORMS scale (we did not use 1 as it does not denote a loss in cartilage substance) is not at all the same as going from 5 to 6 whereas BLOKS scores are in more of a linear format. When we looked at cartilage loss as a consequence of meniscal lesions(table 5) or BML (table 4), we found that the WORMS scale better detected the effects of these risk factors, one piece of evidence that WORMS, at least for cartilage loss, performed better than BLOKS.
We used x-ray joint space loss as the standard against which to assess MRI cartilage loss mostly because this was the standard for progression that was available to us. Since x-rays are insensitive to cartilage loss relative to MRI and JSN may reflect other factors besides cartilage loss, it could be argued that this was a poor choice (6). The poor sensitivity of cartilage loss scales, however, probably does not speak to the insensitivity of MRI in detecting cartilage loss, but rather reflects issues in the assessment of progressive joint space narrowing using x-rays. Changes in knee positioning relative the central ray of the x-ray beam between serial radiographs can cause changes in the appearance of the joint space such that even with adjudication, films adjudicated as showing joint space loss, may not have such loss. This has been shown recently for joint space width assessed on serial knee radiographs from OAI that, despite using standardized positioning techniques, had inconsistent positioning over time (9). Further, it has been shown that joint space loss is can be due to meniscal change, such as extrusion and not cartilage loss(10) (11) In addition, x-rays acquired under weight bearing conditions and MRI acquired in the supine, non-weight bearing position may give different results in terms of loss over time. Since MRI is more sensitive to cartilage loss than x-ray, the imperfect specificity of MRI may not represent false positives but rather true positives missed by the x-ray measure. Finally, even though the JSN is an imperfect standard against which to compare MRI, the performance of BLOKS and WORMS cartilage scores were assessed in relation to the same JSN measures.
One important limitation of our study was the absence of quantitative longitudinal data on cartilage loss, data derived from segmenting cartilage and reporting on its thickness and volume. (Currently, quantitative measures of bone marrow lesions and meniscal damage are not widely available.) There is controversy as to whether quantitative cartiage measures provide superior information to semiquantitative data, especially in early or mild osteoarthritis (9;10;12). but it was not within the capability of this study to examine this question.
We selected knees likely to progress in this study and as a consequence most of these knees had meniscal pathology and bone marrow lesions. Effective comparisons between WORMS and BLOKS might have been easier if there were more variability in knee findings.
The wide range of moderate-sized areas of cartilage loss (e.g. BLOKS: 10-75%) of these frequently used categories may limit the sensitivity of both methods to detect longitudinal change, especially for BLOKS which has much larger tibiofemoral subregions than WORMS. This is also true for BML scores where the grade 2 scores have even larger ranges. To increase sensitivity for scoring progression in longitudinal studies, allowing for “within-grade” changes or using categories with a smaller range of affected area may be useful.
In summary, we recommend an amalgamated MRI reading system for OAI, a scoring system with elements from WORMS and BLOKS. For menisci, it is our view that the BLOKS system is superior and for BML's that WORMS is better. We recommend further that the WORMS regions be used so that investigations into whether certain lesions are in the same small regions as other lesions. For cartilage scoring, the two systems produced comparable results and the use of one or the other may be based on other considerations including ease of scoring and psychometric properties of the scales.
The OAI is a public-private partnership comprised of five contracts (N01-AR-2-2258; N01-AR-2-2259; N01-AR-2-2260; N01-AR-2-2261; N01-AR-2-2262) funded by the National Institutes of Health, a branch of the Department of Health and Human Services, and conducted by the OAI Study Investigators. Private funding partners include Merck Research Laboratories; Novartis Pharmaceuticals Corporation, GlaxoSmithKline; and Pfizer, Inc. Private sector funding for the OAI is managed by the Foundation for the National Institutes of Health. This manuscript has received the approval of the OAI Publications Committee based on a review of its scientific content and data interpretation. We appreciate the time and effort of those at OAI clinics and study participants for making this investigation possible. Additional support was provided by NIH AR47785 and NIH AR051568. The funders had no role in this publication.
Competing Interest Statement: Drs. Guermazi and Roemer have stock in a company whose service is to read MRI's semiquantitatively, but this company reads using both WORMS and BLOKS scales. There are no other potential competing interests.
Contributions:Conception and design: Drs. Felson and Nevitt
Collection and assembly of data: Drs. Lynch, Guermazi, Roemer, McAlindon
Statistical Analysis: Dr. Niu
Interpretation of the data: Drs. Niu, Roemer, Guermazi, Lynch, Felson, Nevitt, McAlindon
Drafting of article: Drs. Felson and Nevitt
Critical revision: Drs. Niu, Roemer, Guermazi, Lynch, McAlindon
Final Approval: all authors
Obtaining Funding: Drs. Nevitt and Felson.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.