In our study, the operating characteristics of the ORAI and SCORE instrument differed significantly from previous reports. The differences observed in sensitivity and specificity are not necessarily unexpected findings. The initial estimates of operating characteristics for many diagnostic tests tend to change when reevaluated in different settings; however, the magnitude of differences reported here are notable. Although the AUC for both instruments were similar in this study, both were significantly lower than previously reported. 17, 20– 24
The lower AUC also suggests that both instruments may have less discriminating power than previously assumed.
Using published cut points, the SCORE instrument tended to be slightly more accurate than the ORAI, but the ORAI was more consistent across racial/ethnic groups. The racial/ethnic differences were most apparent in African-American women. The SCORE instrument achieved superior accuracy by avoiding unnecessary DXA scans (predominately in African-American women), but failed to identify the majority (70%) of African-American women with osteoporosis. Both instruments maintained relatively high negative predictive values, overall and across ethnic groups, but in this setting where the prevalence of osteoporosis was 10.8%, even an indiscriminant test would have a high negative predictive value.
The intent of clinical risk stratification or clinically based prescreening is to strike a balance between minimizing testing to save health care dollars and not missing anyone who might benefit from diagnosis and treatment. The prevalence of disease, burden of illness, sensitivity and specificity, and accuracy of the prescreening algorithm determine this balance. The magnitude of the differences in operating characteristics reported in this study brings into question the usefulness of the instruments studied. 17, 20– 24
For example, consider a hypothetical cohort of postmenopausal women, 45 years of age and older, which reflects the prevalence of osteoporosis reported in NHANES III 27– 29
and represented by 60% non-Hispanic white, 25% African-American, and 15% Hispanic women. Compared with universal screening, the ORAI, based on our findings, would reduce screening by 55%; however, it would miss 32% of women with osteoporosis. Similarly, the SCORE instrument would reduce screening by 64%, but would miss 46% of women with osteoporosis. The false negative rates may not be clinically acceptable, despite the considerable reduction in rates of screening.
Several reasons may explain the differences we observed. First, the SCORE instrument may underestimate the risk of osteoporosis in African-American women. Second, the operating characteristics of the instruments may vary according to anatomic site of osteoporosis. In our study, the anatomic site of osteoporosis differed across racial/ethnic groups. In particular, osteoporosis was limited to the lumbar spine in the African-American women. This observation is consistent with other reports 30– 32
that show that in African-American women, the hip is less likely to be involved with osteoporosis than the lumbar spine. The SCORE instrument was actually developed in reference to the hip, but in Cadarette's 24
study, it was applied to the hip and lumbar spine and was associated with similar sensitivity but lower specificity. Finally, another potential source of inaccuracy for both instruments is the way in which weight is modeled. Both instruments attribute an increasing “protective” effect as weight increases. The women in our sample were 30 pounds heavier on an average than the women in the ORAI and SCORE development studies. African-American women were the heaviest group in our study. Therefore, in our study population, both instruments would yield lower scores, especially in African-American women, which could have resulted in lower sensitivities.
The prevalence of osteoporosis also differed from the expected. The observed prevalence in non-Hispanic white women was surprisingly low. This finding may be explained partially by the fact that non-Hispanic white women were more likely to have used estrogen/progesterone therapies, which could also have lowered the prevalence of osteoporosis in this group. They also tended to weigh more, which could have enhanced BMD, especially in the hip. Finally, a large proportion of the non-Hispanic white women were excluded based on a previous diagnosis of osteoporosis or current treatment with bone active medications (). This leads us to suspect that a clinical bias in screening, operating before the initiation of the study, favored DXA screening for non-Hispanic white women. Taking the excluded cases into account and the average age of the sample, the number of women with osteoporosis more closely approximates the expected racial/ethnic distribution for osteoporosis.
Our study had several limitations. First, the limited number of participants in our sample yielded wide confidence intervals for the sensitivities and specificities of the instruments for each racial/ethnic group. However, the overall sensitivities and specificities were clearly different from other published reports. The small sample size may have contributed to the differences in prevalence of osteoporosis. Second, we did not extensively confirm the self-reported data contained in the 2 instruments. This was particularly problematic for subjects with rheumatoid arthritis, where we did confirm the diagnosis in the medical record. On further review we determined that the problem in assessing rheumatoid arthritis occurred primarily in the Hispanic population and was likely due to translation in the Spanish version. Finally, a preexisting clinical bias that favored previous DXA screening for non-Hispanic white women may have influenced the prevalence of osteoporosis in that group. This bias suggests that African-American and Hispanic women may not have been referred for DXA screening as frequently as non-Hispanic white women. This observation is consistent with other reports. 15, 33
The ORAI requires only a simple checklist. The SCORE instrument is much more cumbersome and requires mathematical manipulations and truncations. Moreover, the inclusion of rheumatoid arthritis and history of non-traumatic fractures adds another dimension to the SCORE instrument that goes beyond primary screening and risk stratification. Rheumatoid arthritis or a history of non-traumatic fractures probably justifies DXA scanning as a diagnostic test rather than a primary screening test.
Considering the overall performance of both instruments, the ease of use in the clinical setting, and the more consistent performance of the ORAI across racial/ethnic groups, we believe that the ORAI, which nearly replicates the 2002 recommendations of the USPSTF, is the better instrument for identifying women, from an ethnically diverse population, who should be referred for DXA scans. However, the poorer performance of the ORAI in our sample, compared with previous reports, precludes recommending widespread use of the instrument until more research is conducted in other diverse populations. The USPSTF recommendations offer an alternative prescreening strategy. However, the USPSTF recommendations were not developed in clinical studies, and to our knowledge, have not been validated in clinical studies.
Additional studies, conducted in larger populations that reflect the racial/ethnic distribution of other primary care populations and particularly include Asian women, are needed to compare and evaluate the utility of clinical risk assessment instruments and guidelines for osteoporosis screening. Similarly, the concept of clinical risk stratification for osteoporosis should be expanded to include men. Finally, clinical trials are needed to determine if screening algorithms, including the recommendations of the USPSTF, affect health-related quality of life, and the morbidity and mortality associated with osteoporosis and related fractures.