|Home | About | Journals | Submit | Contact Us | Français|
Two radiologists evaluated images of the spine from computed tomography (CT) scans on two occasions to diagnose vertebral fracture in 100 individuals. Agreement was fair to good for mild fractures, and agreement was good to excellent for more severe fractures. CT scout views are useful to assess vertebral fracture.
We investigated inter-reader agreement between two radiologists and intra-reader agreement between duplicate readings for each radiologist, in assessment of vertebral fracture using a semi-quantitative method from lateral scout views obtained by CT.
Participants included 50 women and 50 men (age 50-87 years, mean 70 years) in the Framingham Study. T4-L4 vertebrae were assessed independently by two radiologists on two occasions using a semi-quantitative scale as normal, mild, moderate, or severe fracture.
Vertebra-specific prevalence of grade ≥1 (mild) fracture ranged from 3% to 5%. We found fair (κ=56-59%) inter-reader agreement for grade ≥1 vertebral fractures and good (κ=68-72%) inter-reader agreement for grade ≥2 fractures. Intra-reader agreement for grade ≥1 vertebral fracture was fair (κ=55%) for one reader and excellent for another reader (κ=77%), whereas intra-reader agreement for grade ≥2 vertebral fracture was excellent for both readers (κ=76% and 98%). Thoracic vertebrae were more difficult to evaluate than the lumbar region, and agreement was lowest (inter-reader κ=43%) for fracture at the upper (T4-T9) thoracic levels and highest (inter-reader κ=76-78%) for the lumbar spine (L1-L4).
Based on a semi-quantitative method to classify vertebral fractures using CT scout views, agreement within and between readers was fair to good, with the greatest source of variation occurring for fractures of mild severity and for the upper thoracic region. Agreement was good to excellent for fractures of at least moderate severity. Lateral CT scout views can be useful in clinical research settings to assess vertebral fracture.
Vertebral fractures are the hallmark of osteoporosis and the most common osteoporotic fracture affecting as many as half of US white women and men in their 80s [1–3]. In Europe, one in eight individuals greater than 50 years have vertebral fractures , and worldwide, the estimated number of clinical vertebral fractures for the year 2000 was 1.4 million  In 2005, incident vertebral fractures were responsible for more than 1.0 billion dollars in direct healthcare costs in the US . Vertebral fractures cause pain, disfigurement, depression, and functional impairment [7–9]. Moreover, vertebral fractures are harbingers for future disabling fractures as well as death [10–12]. Despite the impact of vertebral fracture, less than one third of individuals with vertebral fractures receive medical attention and even fewer are treated [13–15].
Numerous methods have been developed to assess vertebral fracture from conventional lateral radiographs [16, 17]. More recently, these methods for detecting vertebral fracture have been applied to lateral spine images from dual-energy X-ray absorptiometry (DXA) vertebral fracture assessment (VFA) [18–20] with the intention to improve diagnosis and treatment without increasing cost and exposure to radiation. Increasingly, older individuals are undergoing thoracoabdominal computed tomography (CT) examinations that include acquisition of posterior-anterior and lateral scout view images which are 2D digital radiographs used to localize the area to be imaged. Although resolution of a scout view is lower than for a conventional radiograph, there is increasing interest in use of CT scout view images for evaluation of vertebral fracture [21, 22]. Whereas reliability of vertebral fracture assessment using radiographs and VFA has been well studied [16, 17, 20, 23–28], there is only a single prior study , conducted more than a decade ago, that reported reliability of vertebral fracture assessment from CT lateral scout views. Since this study, there have been significant advances in CT technology, allowing for faster imaging and better resolution.
As part of a large observational study of osteoporosis, we are evaluating CT image data of the spine acquired for participants in the Framingham Study who underwent cardiac imaging studies. Our protocol calls for vertebral fracture assessment from lateral CT scout views using a semi-quantitative method, and we aimed to evaluate reliability of our measurements. Therefore, we conducted a study to determine inter- and intra-reader reliability for vertebral fracture assessment using lateral spine scout views obtained as part of CT examinations for a sample of 100 participants.
Participants for the current study were selected from the Framingham Heart Study Offspring and Third Generation Multi-detector Computed Tomography (MDCT) Study . Members of the Offspring and Third Generation Cohorts include second and Third Generation Offspring (and their spouses) of the original cohort that was establishedin1948[30–32]. In 2002-2005, QCT scans of the chest and abdomen were acquired in 3,529 participants in the MDCT Study (age 40-90 years, mean 51 years) for assessment of coronary and aortic calcium . Exclusion criteria for the MDCT Study included pregnancy or age less than 40 years for women, age less than 35 years for men, and weight greater than 320 pounds.
We selected 100 subjects to conduct a reliability study of vertebral fracture identification. To ensure an adequate number of individuals with vertebral fractures in our reliability study, a clinical investigator experienced in evaluating lateral spine images reviewed the scout images of persons age 70 years and older to identify 16 individuals (eight women and eight men) with suspected vertebral fracture . We then selected a convenience sample of 84 additional persons so that our study group included 100 individuals (50 women and 50 men) in the age groups representing the distribution of age in the parent study and giving more weight to the older groups with greater prevalence of fracture, as follows: seven persons age 50-54 years and seven persons age 55-59 years, 14 persons age 60-64 years, 16 persons age 65-69 years, 19 persons 70-74 years, and 37, age 75+ years.
An 8-slice multidetector computed tomography scanner (Lightspeed Ultra, General Electric Medical Systems, Milwaukee, WI, USA) was used for cardiac imaging of calcification [29, 35]. The scout views consisted of frontal and lateral low energy 2D scanograms extending from the upper thoracic (T4) to sacral (S1) vertebral levels.
The lateral scout images for the 100 subjects were evaluated twice (“Time 1” and “Time 2”), several weeks apart, by each of two independent readers (“Reader A” and “Reader B”). Thus, 4 evaluations were independently performed for each subject’s scout view image. The scans for the 100 subjects in the reliability study were randomly ordered within five larger sets of scans for evaluation in the parent study. Details of the reliability study were not revealed to readers and images were stripped of demographic and any other identifying information. In addition, we provided different identification numbers associated with the 100 subjects for each reading set so that each batch appeared to the readers as a unique set of subjects. Also, the order of the subject scans was changed for each evaluation.
The two readers were experienced radiologists who underwent training with a musculoskeletal radiologist who developed a semi-quantitative method to assess vertebral fracture . Prior to this reliability study, CT scout views for 45 subjects randomly selected from the parent study were used as a training set for readers along with a radiographic atlas. We provided readers with an automated system and instruction manual to directly enter findings into an electronic database.
Readers evaluated 13 vertebral levels from T4 to L4 and visually classified vertebrae as normal (grade 0), mildly deformed (grade 1, ~20–25% reduction in anterior, middle, and/or posterior height and a reduction in area 10–20%), moderately deformed (grade 2, ~25–40% reduction in any height and a reduction in area 20–40%), or severely deformed (grade 3, ~40% reduction in any height and area) . Scan quality was recorded as “good” or “poor”. Vertebral levels unable to be adequately visualized were classified as ‘unreadable’. Vertebral deformities due to causes other than fracture, such as spondylosis and Scheuermann’s disease, were classified as non-osteoporotic deformities. These were considered to be non-fractured in the analysis .
Prevalent vertebral fracture was defined as a vertebral body graded at least mildly deformed (grades 1–3). We also used a more restrictive classification of vertebral fracture, defined as moderately to severely deformed (at least ~25–40% reduction in any vertebral height or grades 2–3).
We calculated the frequency of unreadable vertebrae by spinal location for each reader at each of the two time points. Prevalence of fracture, per person and per vertebra, was calculated using two dichotomized definitions of fracture:  grades 1-3 (mild, moderate, and severe) versus grade 0 (normal), and  grades 2-3 (moderate and severe) versus grades 0-1 (normal and mild). In calculating prevalence, vertebrae that were unreadable were classified as normal.
The number and severity of fractured vertebrae is given for each of the two readers at each of the two time points. To calculate severity of vertebral fracture per person, individuals with more than one fracture were classified according to his/her most severe fracture grade (normal, mild, moderate, and severe).
We estimated agreement for fracture prevalence per vertebra, corrected for chance, using a simple kappa statistic and associated 95% confidence intervals (CI) . We provide kappa for intra-reader agreement of facture prevalence per vertebra between time 1 and time 2, separately for each of the two readers, and kappa for inter-reader agreement of facture prevalence per vertebra between reader A and reader B, separately for each of the two time points.
We conducted stratified analysis to evaluate potential differences in agreement of facture prevalence per vertebra by spinal region, categorized as T4-T9, T10-T12, and L1-L4 so that we could compare results to others  and have adequate numbers of fractures within each region for meaningful statistical analysis. We calculated kappa (κ) to estimate inter- and intra-reader agreement for fracture prevalence per vertebra beyond that expected by chance, and considered kappa >0.75 as excellent agreement, 0.40-0.75 as fair to good agreement, and <0.40 as poor agreement beyond that expected by chance as characterized by Fleiss . Finally, we repeated our analysis including only those scans indicated as good quality by readers.
Mean age of participants was 70 years (SD=9) and ranged from 50 to 87 years. The study group of 50 women and 50 men included predominantly members of the Offspring Cohort (n=92) and few members of the younger, Third Generation Cohort (n=8). Mean weight was 70 (±17) kg in women and 85 (±14) kg in men. Mean body mass index was 28 (±6) kg/m2 (Table 1).
The number of unreadable vertebrae was greatest for the upper thoracic spine (Fig. 1). Reader A classified a greater number of vertebrae as unreadable compared to reader B. For example, according to reader A, the number of unreadable vertebrae was 112 (9%) at time 1, which was four times greater than the number according to reader B, who classified 30 (2%) vertebrae as unreadable.
Each reader identified more fractured (grades 1–3) vertebrae at time 2 than time 1 (Fig. 2). Total number of vertebrae classified as fractured by reader A was 40 at time 1 and 52 at time 2. Number of vertebrae classified as fractured by reader B was 64 at time 1 and 66 at time 2. A bimodal distribution of prevalent vertebral fracture was observed with peak frequencies occurring at (and near) T8 and T12–L1 (Fig. 2). For example, prevalence (per person) was 11-12% at T8 and 7-8% at T12 (for three of the four evaluations).
Reader A identified 23-24 persons (23-24%) with at least one fracture, whereas reader B classified 35 persons (35%) with fracture (Table 2). Both raters classified fewer individuals with 1 vertebral fracture (and more individuals with two fractures) at the second reading compared to the first reading. The number of participants with three or more fractures doubled to eight persons at time 2 from four persons at time 1 for reader A, although reader B identified 1 fewer person with three or more fractures at the second evaluation (nine versus eight persons).
Reader A identified fewer mild (grade 1) fractures according time 1 relative to time 2 and fewer mild fractures than both evaluations for reader B (Table 3). For example, the number of vertebrae judged by reader A as mild fractures increased from the first to second evaluations from 25/1300 (2%) to 38/1300 (3%), whereas reader B classified 41-44 (3%) vertebral levels as mild fractures. The number of persons identified with mild fractures was consistent between time 1 and time 2 evaluations for each reader. However, reader A judged fewer individuals with mild fracture (13%) than reader B (22-23%). Frequency of moderate to severe fractures was low, with 14-15 vertebrae classified as ≥grade 2 by reader A and 22-23 vertebrae classified as ≥grade 2 by reader B. Per person prevalence of moderate to severe fractures was 10-11% according to reader A and 12-13% for reader B.
Intra-reader agreement for per vertebra prevalence of mild fracture (grade ≥1) was fair to good for reader A, κ=0.55, 95% CI: 0.43-0.67, and excellent for reader B κ= 0.77, 95% CI: 0.69-0.85 (Table 4). Agreement for per vertebra prevalence of mild fracture between readers was good, and agreement was similar at time 1 (κ=0.56, 95% CI: 0.44-0.68) and time 2 (κ=0.59, 95% CI: 0.48-0.70). When we considered moderate-severe vertebral fractures (grades ≥2), intra-reader agreement was excellent for both readers, 0.76 (95% CI: 0.58-0.93) for reader A and 0.98 for reader B (95% CI: 0.93-1.00). Agreement between readers reached 68% at time 1 and 72% at time 2 for moderate-severe fracture.
Intra-reader and inter-reader agreement for per vertebra prevalence of fractures grade ≥1 increased with descending vertebral level for each of the four evaluations (Table 5). For example, inter-reader agreement between readers at time 2 was 0.43 for T4-T9, 0.72 for T10-T12, and 0.78 for L1-L4, and the pattern was similar for the other evaluations.
Restricting the analysis to include only scans indicated as good image quality had little or no improvement on agreement for either intra- or inter-reader agreement.
We determined reliability of a semi-quantitative method to evaluate prevalent vertebral fracture from CT lateral scout views. Four evaluations (by two readers at two time points) were independently performed to classify 1,300 vertebra in 100 participants in a community-based cohort. We found fair (κ=56-59%) inter-reader agreement for ≥grade 1 vertebral fracture and good (κ=68-72%) inter-reader agreement for ≥grade 2 fracture. Intra-reader agreement for ≥grade 1 vertebral fracture was fair (κ=55%) for one reader and excellent for another reader (κ=77%), whereas intra-reader agreement for ≥grade 2 vertebral fracture was excellent for both readers (κ=76% and 98%). Upper thoracic vertebrae were more likely to be unreadable than the lower thoracic and lumbar regions, and agreement was lowest (inter-reader κ=43%) for fracture at the upper (T4-T9) thoracic levels evaluated and highest (inter-reader κ=76-78%) for the lumbar spine.
Takada and colleagues  used similar methods as ours and reported higher inter- and intra-reader reliability than estimates found in the current study. Agreement for ≥grade 1 fracture between readers (at time 1) was kappa=0.68, compared to our inter-reader reliability estimates of 0.56 for time 1 and 0.59 for time 2, and intra-reader agreement was 0.79 and 0.87 compared to our estimates of 0.55 and 0.77. Prevalence of fracture was 40% lower in our study which may explain, at least in part, lower estimates of agreement. The subjects in the study of Takada et al.  were women with mean age 56 years, and more than half were part of a clinical trial that required T-scores at least as low as −2.0. In contrast, participants in our study were mean age 70 years and included equal numbers of women and men. While information on fracture severity was not provided in the study of Takada et al. , higher frequency of more severe fractures is expected in this group which may also help explain the higher reliability estimates compared to our results. In fact, agreement between readers for ≥grade 2 fracture in our study was 68-72%, comparable to Takada’s findings, and inter-reader agreement for ≥grade 2 fracture was 76-98%, somewhat higher than reported for ≥grade 1 fracture by Takada et al. . However, prevalence of ≥grade 2 fracture was low (1-2%) in our population-based study.
We found frequency of unreadable vertebrae was highest and intra-reader agreement was lowest for the upper thoracic spine. This was consistent with Takada et al.  and others who used different imaging modalities [20, 39, 40] and fracture classification methods . Reproducibility of readings in the upper thoracic spine may be worse than other regions due to poorer image quality, in part attributable to the projection of other structures (pulmonary hilus and scapula) on the vertebral body. The lower levels of the spine have larger vertebrae and less obliquity and obscuring lung markings than the upper vertebrae. In contrast, Olmez et al.  reported, in a study of 67 postmenopausal women, little difference in agreement by vertebral level, assessed using the Kleerekopper method .
Intra-reader agreement for semi-quantitative vertebral fracture assessment using CT scout images in our study is lower than estimates reported for lateral spine images using DXA or conventional radiography. In one study that examined reliability of DXA for fracture assessment in 203 postmenopausal women (mean age 68 years), intra-reader agreement, as measured by kappa, was 64-77% . Intra-reader agreement is highest using lateral radiograph of the spine, considered the gold standard for vertebral fracture diagnosis, with a kappa estimate given at 89% . Although image quality is lower for CT scout views than conventional radiography, CT scout views offer some advantages. Specifically, CT fan-beam images do not suffer from parallax distortion present in cone-beam imaging geometry as in conventional radiographs. Since a lateral scout view is acquired as part of conducting a CT study for purposes other than identification of spinal fracture, assessment of vertebral fracture using this method exposes individuals to lower radiation levels than conventional radiography, while at the same time identifies candidates for treatment who might not have otherwise been evaluated.
This study is limited by the relatively small number of fractures; however, prevalence is lower for community-based populations as in the current study than for subjects recruited from clinical settings.
Another limitation is that we did not perform sagittal midline reformations to improve the sensitivity of fracture detection. The scope of this study was to evaluate in 100 subjects reader reliability of methods used to assess vertebral fracture in participants in the parent study, the Framingham Heart Study Offspring and Third Generation MDCT Study . Thus, although sagittal spine images will identify vertebral fractures not evident on the transverse axial images [43–46], use of reformations was beyond the scope of the current study.
This study is also limited by the lower spatial resolution and higher photon noise of CT scout views compared to other image modalities intended for evaluation of vertebral fracture, such as conventional radiography. In contrast to radiographs, however, CT scout views are not affected by parallax distortion which can reduce accuracy in fracture diagnosis. Additionally, information on reliability of the method used in this study is needed since these images allow identification of fracture in patients undergoing CT examinations for other purposes without additional procedures or radiation exposure. Given the high risk of fracture (and death) in older adults with prevalent vertebral fracture, identification of vertebral fracture from CT scout views provides opportunity for diagnosis and treatment to prevent pain, loss of function, and medical costs associated with subsequent fracture.
In conclusion, in this community-based sample of women and men, agreement between readers was fair to good for assessment of vertebral fractures using lateral CT scout views, (56-59% for mild fractures and 68-72% for moderate to severe fractures). Agreement for mild to severe fracture within readers was good to excellent (55-77%), whereas within-reader agreement for moderate to severe fracture was excellent (76-98%). Evaluation of lateral CT scout views using a semi-quantitative method is a reliable approach to identifying prevalent vertebral fractures. Lateral CT scout views can be useful in clinical settings to assess vertebral fracture.
This work was supported by NIH R01AR053986, R01AR/AG041398, K01 AR053118, T32 AG023480, and by the National Heart, Lung, and Blood Institute (NHLBI) Framingham Heart Study (NIH/NHLBI Contract N01-HC-25195).
Conflict of interest None.
E. J. Samelson, Hebrew SeniorLife, Institute for Aging Research Boston, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA.
B. A. Christiansen, Beth Israel Deaconess Medical Center, Center for Advanced Orthopedic Studies, Boston, MA, USA; Department of Orthopedic Surgery, Harvard Medical School, Boston, MA, USA; Department of Orthopaedics, University of California Davis, Sacramento, CA, USA ; Email: bchristiansen/at/ucdavis.edu.
S. Demissie, Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA ; Email: demissie/at/bu.edu.
K. E. Broe, Hebrew SeniorLife, Institute for Aging Research Boston, Boston, MA, USA ; Email: broe/at/hrca.harvard.edu.
Y. Zhou, Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA ; Email: zyanhua/at/bu.edu.
C. A. Meng, Hebrew SeniorLife, Institute for Aging Research Boston, Boston, MA, USA ; Email: Ching-AnMeng/at/hrca.harvard.edu.
W. Yu, Radiology, Peking Union Medical College Hospital, Beijing, China ; Email: weiyu5508/at/yahoo.com.
X. Cheng, Radiology, Beijing Ji Shui Tan hospital, Beijing, China ; Email: xiao65/at/263.net.
C. J. O’Donnell, Department of Medicine, Harvard Medical School, Boston, MA, USA; National Heart, Lung, and Blood Institute’s Framingham Heart Study, Framingham, MA, USA ; Email: codonnell/at/nih.gov.
U. Hoffmann, Department of Medicine, Harvard Medical School, Boston, MA, USA.
D. P. Kiel, Hebrew SeniorLife, Institute for Aging Research Boston, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA ; Email: kiel/at/hrca.harvard.edu.
M. L. Bouxsein, Beth Israel Deaconess Medical Center, Center for Advanced Orthopedic Studies, Boston, MA, USA; Department of Orthopedic Surgery, Harvard Medical School, Boston, MA, USA ; Email: mbouxsei/at/bidmc.harvard.edu.