|Home | About | Journals | Submit | Contact Us | Français|
The distribution of cortical bone in the proximal femur is believed to be a critical component in determining fracture resistance. Current CT technology is limited in its ability to measure cortical thickness, especially in the sub-millimetre range which lies within the point spread function of today’s clinical scanners. In this paper, we present a novel technique that is capable of producing unbiased thickness estimates down to 0.3 mm. The technique relies on a mathematical model of the anatomy and the imaging system, which is fitted to the data at a large number of sites around the proximal femur, producing around 17,000 independent thickness estimates per specimen. In a series of experiments on 16 cadaveric femurs, estimation errors were measured as −0.01 ± 0.58 mm (mean ± 1 std.dev.) for cortical thicknesses in the range 0.3–4 mm. This compares with 0.25 ± 0.69 mm for simple thresholding and 0.90 ± 0.92 mm for a variant of the 50% relative threshold method. In the clinically relevant sub-millimetre range, thresholding increasingly fails to detect the cortex at all, whereas the new technique continues to perform well. The many cortical thickness estimates can be displayed as a colour map painted onto the femoral surface. Computation of the surfaces and colour maps is largely automatic, requiring around 15 min on a modest laptop computer.
The exponential increase in hip fractures with ageing imposes a substantial and increasing health burden. They are the most common reason for acute orthopaedic admission in older people (Parker and Johansen, 2006) and their incidence is projected to rise worldwide from 1.7 million in 1990 to 6.3 million in 2050 (Sambrook and Cooper, 2006). Current imaging methods for predicting an individuals hip fracture risk use bone mineral density (BMD) estimates which are specific (Johnell et al., 2005; Kanis et al., 2008) but have low sensitivity (Kanis et al., 2008; Kaptoge et al., 2008; Sanders et al., 2006). Regional structural weakness in the aged hip may predispose to fracture (Mayhew et al., 2005; Poole et al., 2009; de Bakker et al., 2009) and there have been numerous calls to develop novel three-dimensional (3D) methods of assessing hip structure, most notably with multi-detector computed tomography, MDCT (Bouxsein and Delmas, 2008).
The anatomical distribution of cortical bone in the aged proximal femoral metaphysis is a critical component in determining its resistance to fracture, with trabecular bone playing a lesser role (Holzer et al., 2009; Verhulp et al., 2008). Compressive cracking of cortical bone of the superior parts of the neck and trochanter is often the first stage of fracture (de Bakker et al., 2009; Carpenter et al., 2005; Mayhew et al., 2005). Current MDCT studies are anatomically limited by set regions of interest and technically limited by thresholding errors caused by relatively low resolution data sets. Ideally, we would like to be able to map the cortical bone thickness to a high degree of accuracy, from a clinical resolution data set, over the entire proximal femur. Since bone anabolic drugs are capable of stimulating new bone formation at structurally important sites (Keaveny et al., 2008; Uusi-Rasi et al., 2005), such a technique might help not only with identification of weak areas but also in assessing treatment response.
Thin laminar structures such as the femoral cortex are not accurately depicted in clinical CT because of the images’ limited spatial resolution. This typically results in an underestimate of density and an overestimate of thickness. A number of studies (Dougherty and Newman, 1999; Hangartner and Gilsanz, 1996; Prevrhal et al., 1999) have investigated the influence of the imaging system’s in-plane point spread function (PSF), while Prevrhal et al. (2003) go further in quantifying the effects of the out-of-plane PSF as well. Such studies explain why straightforward thickness estimation techniques, such as those based on thresholding (Buie et al., 2007; Hangartner, 2007) or the 50% relative threshold method (Prevrhal et al., 1999, 2003, described in Section 2.2), are unreliable when the structure is thin compared with the imaging system’s PSF. The consensus is that, with typical PSFs from normal bore, clinical CT scanners, straightforward thickness measurements start to become inaccurate below around 2.5 mm (Dougherty and Newman, 1999; Hangartner and Gilsanz, 1996), with errors exceeding 100% for sub-millimetre cortices (Prevrhal et al., 2003).
More sophisticated thickness estimation techniques are capable of superior accuracy and precision. The classical approach to image deblurring is to assume models of both the object being scanned and the image formation process, then attempt to fit these models to the data. Often, this inverse problem is ill-posed, in that no unique set of model parameters explains the data, especially in the presence of imaging noise. In such circumstances, prior knowledge may be used to constrain some of the model parameters. Such an approach was used by Streekstra et al. (2007) to estimate the thickness of cartilage layers in the sub-millimetre range, well below the nominal width of the CT system’s PSF. In this instance, the model parameters relating to the PSF were fixed, using information obtained by scanning a suitable phantom.
In this paper, we describe a model-based approach for measuring femoral cortical bone thicknesses in the supra- to sub-millimetre range, using standard, normal bore, clinical CT. We employ the anatomical and imaging models described by Prevrhal et al. (2003), and explore the efficacy of constraining various parameters when fitting these models to the CT data. In contrast to the approach taken by Streekstra et al. (2007), we find that constraining the peak CT density, rather than the imaging system’s PSF, produces the best cortical thickness estimates. Our model-fitting process is guided by a semi-automatic segmentation of the outer bone surface. This leads to an illuminating way of visualizing the results: cortical thicknesses can be displayed, to high spatial resolution, as a colour map painted on the bone surface.
The paper is organised as follows. In Section 2, we recap the anatomical and imaging models of Prevrhal et al. (2003), describe our novel approach for fitting these models to low resolution CT data, and introduce our methodology for evaluating the results. In Section 3, we present results for 16 cadaveric human femurs, scanned using both high resolution, peripheral CT and conventional, low resolution CT. Cortical thicknesses measured by a 50% relative threshold technique on the high resolution data provide the ground truth against which estimates from the low resolution data are judged. By way of comparison, results are presented for the new method and also the thresholding and 50% relative threshold approaches. These results are discussed in Section 4, before we draw our conclusions and suggest some avenues for future research in Section 5.
We use as our raw data for each cortical thickness estimate the CT values along a line in the imaging plane normal to the cortical layer, and of sufficient length to pass through both the cortical layer itself and any blurred prolongation caused by the imaging process. A convenient method for locating such lines is discussed in Section 2.3. We assume that the cortical layer is locally flat and of uniform thickness, at least within the extent of the imaging system’s PSF. We can then model the CT values along the line as a convolution of the ‘real’ density with an in-plane and an out-of-plane PSF. The real density variation y over distance x is assumed to be piecewise constant, with different values in the surrounding tissue, the cortex and the trabecular bone. This can be expressed as the summation of two step functions:
where and are the CT values in the surrounding tissue, cortical and trabecular bone, respectively, and are the locations of the outer and inner cortical surfaces, and is a unit step function.
Following the approach taken by Prevrhal et al. (2003), we model the in-plane PSF as a normalised Gaussian function:
where σ represents the extent of the blur. The out-of-plane PSF is the result of the interaction between the CT slice thickness and the orientation of the cortical layer with respect to the imaging plane, as illustrated in Fig. 1. This can be expressed as a rectangular function:
where the extent of blur is calculated from s, the CT slice thickness, and a, the angle the cortical surface normal makes with the imaging plane:
Hence, the fully blurred CT values are given by1:
except for the special case , when:
The true cortical thickness is then:
where is the measured in-plane thickness (Fig. 1).
Fig. 2 shows for the case σ = 1 mm and r = 0, and with cortical thickness values of 5 mm, 1 mm and 0.5 mm. When the cortex is thin, as in Fig. 2c, thickness estimation using a set threshold, or the 50% relative threshold method, is highly error prone. Prevrhal et al. (2003) demonstrated that Eq. (5) is a good model for what is typically observed when estimating cortical thickness in practice. The real examples in Fig. 3 confirm that the imaging process for low resolution clinical data does indeed follow the model fairly closely. The equivalent high resolution data indicates that the ‘real’ density variation is reasonably well modeled by the piecewise constant function of Eq. (1), though clearly the CT values on either side of the cortex are not perfectly flat.
If we could fit the model of Eq. (5) to real CT data, then we could derive the cortical thickness using Eq. (7) and knowledge of the surface orientation. This potentially allows thickness measurement well below the blurring level σ of the imaging process. Fitted models are shown as dashed lines in Fig. 3c and d. For the high resolution data, the dashed lines follow the edges of the cortex well, despite the distracting density variations elsewhere. For the low resolution data, the dashed lines are nearly indistinguishable from the real data (solid lines), providing further evidence of the model’s validity.
In our proposed method, optimization of the model parameters is performed using the Levenberg-Marquardt method (More, 1977). The CT data is resampled at 100 points along the line in the image plane normal to the cortical surface, using a Mitchell-Netravali cubic spline (Burger and Burge, 2009) to interpolate between the original CT samples. Automatic positioning of this line is discussed in Section 2.3. The model is then fitted to the 100 new samples, with Eq. (5) evaluated numerically for each candidate set of parameters. Model-data discrepancies at one end of the line (that closest to the outer cortical surface) are penalised slightly more than discrepancies elsewhere. This encourages the optimization algorithm to fit the model to the actual cortex, and not to secondary CT peaks within the interior that are especially common in high resolution data.
There are seven parameters, , in the model. Of these, the out-of-plane blur r is assumed to be known from the surface orientation (see Section 2.3) and the slice thickness, following Eq. (4). The edges of the cortical layer are the values we are attempting to estimate. This leaves three parameters describing the CT values, and one, σ, describing the in-plane PSF. The CT values of the surrounding tissue and of the trabecular bone vary with location around the proximal femur, so we leave the model free to discover these parameters for each individual line. This requires that the line penetrates each region sufficiently: we therefore set the line length to at least three times the maximum cortical thickness we expect to encounter, and position it so that one-third lies outside the exterior cortical surface. For the experiments reported in Section 3, the line length was about 18 mm in all cases (6 mm outside, 12 mm inside).
The remaining two model parameters, , the cortical CT value, and σ, the in-plane blur, are more problematic. For thin cortical layers, it is possible to trade one off against the other to produce values of that are very similar. In other words, the problem is ill-posed. When both these parameters are left free, the Levenberg-Marquardt method frequently fails to converge, and there is much variability in the subsequent thickness estimates. To better condition the optimization problem, we could constrain either σ, as was done for the more complex cartilage model of Streekstra et al. (2007), or . Neither choice is ideal, since both σ and are expected to vary with location around the proximal femur. While it might seem more natural to constrain σ, it transpires that this barely improves the conditioning of the problem. In contrast, constraining is far more effective. The evidence for this assertion can be found in Section 3, where we subsequently assess two approaches for setting (a fixed value for all data, or automatically estimating a value for each femur), and establish how sensitive the thickness estimates are to errors in .
For comparison, we also tested thresholding, with levels as in Poole et al. (2008), and a method similar to the 50% relative threshold of Prevrhal et al. (1999). For a fair comparison at equivalent resolutions, all methods used the same 100 sampled CT points. Additionally, our implementations of the various methods all involve fitting the model to the data in one way or another: this ensures consistent discrimination between CT peaks considered to be in the cortex and those considered to be in the trabecular bone. In general, the low resolution data exhibits just a single peak along the length of the line, but the high resolution data resolves smaller features, including intra-cortical pores, leading to multiple peaks. In such circumstances, the model-fitting algorithm implicitly determines which peaks lie within the cortex, as illustrated in Fig. 3 for the new method and in Fig. 4 for the comparison methods. Although they share this same methodology for cortical peak identification, the various methods otherwise differ in many respects:
Threshold The sampled data is first converted to binary form using the threshold: values above the threshold are set to one, those below are set to zero. Both r and σ are fixed at zero, and is set to one. The model-fitting process then determines which peak or peaks are considered to be cortex: and are exactly aligned with 0–1 transitions in the binary data, as in Fig. 4, and these are taken to be the cortical edges. The chosen threshold was constant over all data sets scanned in the same way with the same scanner, corresponding to approximately 430 mg/cm2 areal BMD. This value is discussed further in Section 3.1.
Half-max is set to the maximum CT value along the line. The model-fitting process then provides a robust estimate of the exterior and trabecular CT values and also the location of the edges . The constraint tends to result in the edges being located on almost exactly half way between and , and and , as in Fig. 4. Since is the maximum observed CT value, this is very similar to the 50% relative threshold method.
New fixed The new model-fitting technique with an assumed cortical CT value fixed over all data sets scanned in the same way with the same scanner. For the experiments reported in Section 3, a value of 1750 HU was used throughout. 1750 HU is towards the middle of the range of values observed in the data (see Table 1).
New variable The new model-fitting technique with an assumed cortical CT value chosen individually for each data set. A CT slice is selected far enough down the femur for the cortex to be sufficiently thick, such that the blurred CT peak attains the same value as the unblurred peak (i.e. as in Fig. 2a, but not b and c). The model is then fitted at multiple locations on this slice with both and σ unconstrained. The average estimated is then used to fix the cortical CT value for that data set.
There are, of course, many ways that the threshold and half-max methods could have been implemented. We chose implementations that allow a direct comparison with the new technique, and are sufficiently robust to provide independent estimates at a large number of locations on the proximal femur. While different implementations will undoubtedly vary in their nuances, they will nevertheless share the same high-level characteristics. Thresholding will sometimes miss the cortical layer entirely, and techniques similar to our half-max implementation, including the 50% relative threshold method, will estimate the PSF rather than the cortical thickness when the latter is relatively thin (Newman et al., 1998).
So far, we have restricted our discussion to finding a cortical thickness estimate at one location in the data, given prior knowledge of both the approximate cortical surface location and orientation. However, to probe any relationship between cortical thickness and hip fracture, and to assess the various thickness estimation techniques, we would like to be able to map cortical thickness across the entire surface of the proximal femur.
An approximate exterior cortical surface can be extracted from the CT data by thresholding at an appropriate value. After clicking on one seed point inside the femur, a 3D flood fill algorithm is used to label all connected CT voxels above the threshold. This results in a set of labelled pixels in each CT image, from which contours can be extracted to sub-pixel precision by making use of the threshold. Labelled regions frequently contain holes where the CT value in the trabecular bone drops below the threshold, but these are readily eliminated by discarding all contours that are circumscribed by another. This fully automatic process results in one or more contours on each slice. There are roughly 150 slices for the MDCT data sets with 1 mm slice thickness, and 50 for the 3 mm examples. For the high resolution Xtreme pQCT data, where the slice thickness is 0.082 mm, contours were extracted on every fourth slice, giving roughly 450 slices in total.
After this automatic process, the contours are checked manually and edited where necessary. Fig. 5a and b show the two most common causes of error. The entire set of contours is then converted to a discrete 3D shape-based representation (Treece et al., 2000), and an iso-surface extracted at the original contour locations (Treece et al., 1999). This creates a detailed, triangulated surface of the proximal femur, with sub-pixel precision except at those locations where manual editing was necessary. The whole process takes approximately 15 min for typical low resolution data, producing a surface with typically up to 20,000 vertices. Surfaces extracted from the high resolution data are far more detailed, with over 200,000 vertices and a total processing time of about an hour. Nearly all of this time is spent manually editing the contours, the necessary amount depending strongly on the data. Contour extraction, editing, and surface creation are performed using Stradwin2 software.
The vertices are subsequently used as locations for the thickness estimates, with the corresponding surface normals defining the orientation of the cortical layer. When the cortical and CT planes are nearly parallel, a approaches 90° in Eq. (7) and becomes too large to measure correctly. Hence, vertices with surface normals that are approximately orthogonal to the nearest CT plane are ignored. Otherwise, the vertex and surface normal are projected back onto the nearest CT plane: one such plane is shown in Fig. 5c. The cyan lines show the projected vertices and normals. The red points are the cortical edges estimated at each vertex, using lines parallel to the projected normals and positioned such that each vertex lies exactly one-third of the way along the corresponding line.
It should be emphasised that the segmented surface is used only to position and orient the lines along which the thickness estimates are made, and to calculate the appropriate value of a for Eqs. (4) and (7). Actual cortical edges are found using the model-fitting process, and the exterior edges are generally not coincident with the projected vertices. The outcome is one thickness estimate per surface vertex. The thicknesses can be displayed collectively as a colour map painted onto the surface, as in Fig. 6a.
Sixteen cadaveric femurs were scanned in air using both high resolution Xtreme pQCT (82 μm/pixel, 82 μm slice thickness) and low resolution Siemens Somatom Sensation 64 MDCT (589 μm/pixel, 1 mm slice thickness). Cortical surfaces were extracted from these independently. Thickness estimates were mapped to the high resolution surface using the half-max method, since this is known to be a good, unbiased estimator, as long as the cortical thickness is sufficiently large compared with the imaging PSF (Prevrhal et al., 1999), which is almost certainly the case for the Xtreme data (we shall return to this point in Section 4). All four thickness estimation methods were used on each low resolution data set, and compared with the ‘ground truth’ high resolution measurements. To facilitate a meaningful comparison, the high resolution thickness estimates were smoothed over each surface using a filter weighted by the inverse of the residual model-fitting errors. The extent of the smoothing kernel was chosen to match the spatial resolution of the lower resolution data, as shown in Fig. 6b. In effect, we assume that the best we can do with the low resolution data is provide a thickness estimate that accurately represents the average thickness over the extent of the imaging PSF.
To compare high and low resolution thickness maps, the two corresponding surfaces were first brought into approximate alignment by hand, using interactive visualization software that allows the pose of one surface to be varied while the other remains stationary. The co-registration was then refined using a fairly standard variant of the iterative closest point (ICP) algorithm (Besl and McKay, 1992), as illustrated in Fig. 7. For each vertex on the low resolution surface, the closest vertex on the high resolution surface was found. Since the low and high resolution scans do not cover exactly the same areas of bone, some way of ignoring unmatched vertices is required: we chose to simply ignore any vertices on the low resolution surface that were more than 1 mm away from the closest vertex on the high resolution surface. An error metric was then defined as the sum of the squared distances from each remaining vertex on the low resolution surface to the closest vertex on the high resolution surface. This metric was minimized by adjusting the pose (six degrees of freedom, translation and rotation) and the scale of the low resolution surface. The scale parameter allows for any systematic discrepancy between the two surfaces caused by the segmentation technique.3 Optimization was performed using the Levenberg-Marquardt method (More, 1977). The ICP algorithm involves iterating this procedure, starting each iteration with a new search for the corresponding points, until the pose and scale parameters converge. For all 16 femurs, convergence was typically within 100 iterations, requiring around 15 s of computation on a 2.4 GHz Intel Core 2 Duo 6600 processor.
Having registered the surfaces, low resolution thickness estimates were compared to the geographically closest estimate on the corresponding high resolution surface. Where this closest estimate was located more than 1 mm away, the two estimates were ignored, since we do not wish alignment errors to unduly influence the results. Such misalignments occur when the high resolution surface follows small, superficial features that are not resolved in the low resolution data. In practice, the 1 mm tolerance excluded only a few percent of the low resolution thickness estimates. There remained, for each femur, up to 20,000 independent estimates, with corresponding high resolution ‘ground truth’, spanning the entire range of cortical thicknesses encountered on the proximal femur. From these, we calculated the mean (accuracy) and standard deviation (precision) of the error statistic , where is the low resolution thickness estimate and is the corresponding high resolution estimate.
The upper graph in Fig. 8 shows the average distribution of cortical thickness over the 16 proximal femurs, which is well populated in the 0.25–6 mm range. Since we are more interested in thinner cortical regions, we use 0–4 mm colour maps for surface renderings such as those in Fig. 6. The error distribution for the ‘new variable’ thickness estimation technique is shown in the lower graph of Fig. 8. While there is a narrow peak, outlying errors extend up to ±15 mm.
Many of these outliers can be attributed to the evaluation protocol as opposed to the thickness estimation technique. Fig. 9 shows two situations where the protocol leads to inappropriate comparisons between thickness estimates. Fig. 9d, in particular, demonstrates the difficulty in defining cortical thickness at two different resolutions when the cortex is porous. To eliminate such inappropriate comparisons from our results, we chose a threshold of 4 mm, above which we assume the error is more likely to be an artefact of the evaluation protocol than the thickness estimation technique. Clearly, this threshold might also remove genuine estimation errors, although Fig. 8b shows that these will be very few in number. It is also unlikely that an error exceeding 4 mm would be mistaken for a correct measurement in practice. Nevertheless, in Section 3 we present error statistics both with and without the 4 mm cut-off, so the reader can judge for themselves.
The new technique was also applied to two in vivo examinations of consented study participants. A 39-year old female was examined in a Siemens SOMATOM Sensation 64 machine (Siemens AG, Erlangen, Germany) and an 87-year old male with a recent femoral neck fracture was examined in a GE 64 Lightspeed PET/CT machine (GE Medical Systems, Waukesha, Wisconsin, USA). Proximal femurs were scanned from 2 cm proximal to the superior acetabular margin to 2 cm below the lesser trochanters. The study was approved by the Cambridgeshire Regional Ethics Committee and subjects were recruited according to the Declaration of Helsinki.
The thickness estimation methods described in Section 2.2 were first tested on simulated data, in order to investigate performance and parameter sensitivity in an ideal, noise-free environment. The model of Eq. (5) was used to simulate the CT values along lines passing through cortices of varying thickness in the range 0.2 mm to 5.5 mm, with an in-plane blur σ of 1.25 mm and no out-of-plane blur (i.e. the cortical layer was assumed to lie orthogonal to the CT imaging plane). These values were chosen to match approximately those of the low resolution cadaveric data sets. The half-max, threshold and new methods were then used to estimate the original, known, cortical thickness. The latter two methods were evaluated across a range of thresholds and assumed cortical CT values .
Fig. 10 shows results for simulated scans in air (a) and in vivo (b). As reported by Newman et al. (1998), the half-max method is accurate for large thicknesses but increasingly biased for smaller thicknesses. The thresholding technique is potentially more accurate below 3 mm, provided the threshold is chosen carefully. For the air simulation, a threshold of 600 HU (for the low resolution data, this corresponds to an areal BMD of approximately 430 mg/cm2) produces reasonable estimates of thickness above 1.3 mm. For the in vivo simulation, the ideal threshold is higher, at around 800 HU. Below 1.3 mm, thresholding starts to ‘miss’ the cortex and becomes increasingly unreliable, as previously noted by Poole et al. (2008). The new method generates accurate results down to the simulated thickness limit of 0.2 mm. This in itself is not surprising: we have, after all, used exactly the same model to simulate as to estimate thickness. More noteworthy is the low degree of sensitivity to the assumed cortical CT value, especially at thicknesses below 2 mm. Such thin cortices are precisely those likely to be of most clinical interest.
Summary statistics for the 16 cadaveric data sets can be found in Table 1. For the half-max and new methods, the percentage success rate indicates how frequently the model-fitting algorithm managed to converge and hence produce a result. For the threshold method, the number indicates how frequently there were any CT values above the threshold along the line used for thickness estimation. Neither convergence nor the existence of supra-threshold values implies anything about the accuracy of the resulting thickness estimate. The processing time was similar for all methods. Loading the CT data, estimating cortical thickness at approximately 17,000 locations and saving the results to disk (but not calculating or registering the surfaces) took around 5 min on a 2.16 GHz Intel Core2 CPU, with the software running single-threaded.
Table 2 shows estimation accuracy and precision for the entire range of errors, i.e. not making any allowance for likely evaluation artefacts like those in Fig. 9. While it is immediately apparent that both variants of the new method are less biased and more precise than either of the alternative methods, we must at this point take a brief diversion, to test the claim in Section 2.2 that constraining is the best strategy for conditioning the model-fitting process. To this end, we performed some supplementary experiments on femur 477, chosen since it is an average specimen in terms of the estimation errors in Table 2. In Section 2.2, we considered three possibilities: constraining the assumed cortical CT value ; constraining the in-plane blur σ; or using no constraints at all. With femur 477, no constraints resulted in an estimation error of , constraining σ (to a sensible median value obtained from the previous experiment) but not gave an error of , whereas fixing at 1750 HU produced by far the best results of . This is typical of our experience with all the data sets we have examined. There was very little difference in the percentage success rates for each of these variants of the new technique.
Table 3 presents the results after accounting for likely evaluation artefacts using the ±4 mm error cut-off suggested in Section 2.4. The order of merit of the various methods is the same as in Table 2, though clearly all the standard deviations are reduced. These results may better reflect the actual measurement error across the entire surface of the proximal femur. Finally, Table 4 presents results confined to areas where the cortex is relatively thin, in the range 0.3–2 mm. Such areas are likely to be the focus of interest when assessing fracture risk. For thin cortices, the differences between the methods are even more pronounced.
Finally, the results are presented as a function of location on the proximal femur, for the best and worst cases in Figs. 12 and 13 respectively. These images are stills from a user interface which displays the cortical thickness as a colour map, as described in Section 2.3. The surface can be examined from any viewpoint. The light grey colour, mostly evident for the threshold method in the second column, indicates regions in which no estimate is available. Away from these regions, the threshold method produces thickness maps that appear similar to the high resolution ‘ground truth’ in the first column. The half-max method (third column) generates few thickness estimates below 2 mm (green). The ‘new variable’ method (fourth column) produces thickness maps that are similar to the high resolution maps, with absolute errors generally less than 0.5 mm (pink in the error maps).
The various methods for cortical thickness estimation were assessed using cadaveric femurs scanned in air, since the human proximal femur cannot be imaged in vivo at the high resolutions required to establish the ‘ground truth’. In clinical CT scans, the femur is of course surrounded mostly by soft tissue, not air. While the new thickness estimation technique makes no assumption about the surrounding material, except that it is reasonably uniform,4 it is nevertheless reasonable to question how performance might vary between cadaveric and in vivo examples.
Fig. 11 expands the results in Table 3 to show performance as a function of true cortical thickness. Since the techniques produce so many independent thickness estimates in the range 0.3–4 mm (Fig. 8a), splitting the data for any one femur into bins 0.1 mm wide produces over 100 values per bin, and well over 1000 values when the data for all femurs is combined. This is sufficient to analyse performance as a function of thickness in the aforementioned range. The conglomerated results in Fig. 11c have more in common with the results for the best case (femur 473, Fig. 11a) than those for the worst case (femur 467, Fig. 11b), suggesting that the latter is the exception.
Fig. 11 also shows percentage success rates as a function of thickness. The threshold method starts to ‘miss’ estimates as the true cortical thickness drops below 1.5 mm, and is largely incapable of producing any reliable results below 0.75 mm. For the carefully chosen threshold, the estimates are reasonably unbiased down to 1 mm. The half-max and ‘new variable’ methods both generate cortical thickness estimates over the full 0.3–4 mm range. However, the half-max method overestimates the thickness below 3 mm, whereas the new method provides relatively unbiased estimates at all thicknesses. In general, the new method also compares favourably with the others in terms of precision, again across the entire range of thicknesses.
Fig. 14 shows cortical thickness maps produced using the ‘new variable’ method, working with in vivo MDCT data at 1.25 mm slice thickness. There is no ground truth for this data, and it is just a single example, so it cannot be used to draw any firm conclusions. However, there is reasonably good agreement between the independent maps on the fractured and contralateral hip; the new method produces estimates across most of the proximal femur; and the maps are not dissimilar to those in Figs. 12 and 13. The feasibility of applying this technique to fractured hips in vivo has potential clinical significance. Compared with intact femurs, the surface extraction stage was less straightforward (more manual editing was required), but the thickness estimation stage was no different, since all measurements are mutually independent.
Clinical data is, of course, typically acquired at slice thicknesses exceeding 1 mm. Fig. 15 shows cortical thickness maps estimated using the ‘new variable’ method, starting from the same in vivo scan but with different CT reconstructions at 1 mm and 3 mm slice thickness. Again, we can draw no firm conclusions from a single example, but the 0.01 ± 0.30 mm discrepancy between the two maps is remarkably low. This gives reasonable grounds for optimism regarding the applicability of the method to 3 mm slice CT data.
The light grey regions in Figs. 14 and 15 indicate areas where the new method failed to produce a thickness estimate. This was due to the close proximity of the acetabulum to the femoral head, preventing the successful estimation of . The cadaveric maps are free of such regions because, although the femurs were supported by a pelvis phantom, the acetabulum was not as close to the femoral head.
In general, the various thickness estimation techniques were observed to perform in accordance with the assumed model. However, close inspection of the graphs in Fig. 11c reveals two clear differences from the simulations in Fig. 10. While the half-max estimates do, as predicted, attain a plateau of around 2.5 mm (consistent with an estimated in-plane blur of ), at true cortical thicknesses below 1 mm they drop below this plateau, in a manner not explained by the model. Similarly, the threshold estimates inexplicably show a sharp increase at low cortical thicknesses. To understand these phenomena, we must first rule out any systematic inaccuracies in the ‘ground truth’ thicknesses. Recall, these were obtained by applying the half-max method to the Xtreme pQCT data, which is approximately seven times higher resolution than the Siemens MDCT data. Fig. 11c shows the half-max method starting to produce biased thickness estimates below 3 mm, this being with the low resolution data. It follows, therefore, that we should not trust our ‘ground truth’ much below . This is sufficiently close to the lower limit of 0.3 mm in Fig. 11c that it is unlikely to have a significant bearing on the results.
Instead, explanations for these two phenomena become apparent on close examination of the data where the cortex is very thin. For the half-max method, the cortical CT value becomes poorly defined and is often very similar to the trabecular value . Consequently, the inner cortical edge can be positioned almost arbitrarily, resulting in spurious thickness estimates that do not necessarily reflect the width of the imaging PSF. The strange behaviour of the threshold graph is driven by a relatively small number of thickness estimates: in the vast majority of cases, the imaged cortex does not exceed the threshold and the threshold method fails. However, occasionally the trabecular bone, at the end of the estimation line, has sufficiently high density to exceed the threshold. Thresholding then does produce a thickness estimate, but not one of the thin cortex, rather one of a thicker region of trabecular bone. This is why the threshold graph in Fig. 11c turns upwards.
The thickness maps in Figs. 12 and 13 show that the errors are not homogeneously distributed across the surface. Instead, they are clustered at small focal patches. We observed similar behaviour in all 16 data sets. The most likely cause of such errors is an ambiguous cortical-trabecular interface in these regions, suggesting different interpretations in the low and high resolution data. Certainly, the ‘worst’ data set (467) has a particularly dense matrix structure in the trabecular bone. The image in Fig. 9 is from this data set: at certain locations, such as around line 2, it is not at all clear where the cortex ends and the trabecular bone begins. While we attempted to eliminate such artefacts from our results by ignoring outlying errors, cortical ambiguity will not always result in a discrepancy exceeding the 4 mm cut-off, and the variances in Table 3 might therefore be seen as upper limits on the true estimation errors. Where the cortex is well defined, estimation errors are likely to be well within these bounds: witness the pink regions in the surface error maps (Figs. 12 and 13, (g) and (n)), where the error is everywhere less than 0.5 mm.
We have presented a novel method for estimating cortical thickness from clinical CT data. In a series of cadaveric experiments using standard MDCT (589 μm/pixel resolution, in-plane blur well approximated by a Gaussian PSF of standard deviation 1.25 mm), the method was able to produce unbiased estimates down to 0.3 mm. Errors were observed to be −0.01 ± 0.58 mm over the full range of thicknesses, and 0.01 ± 0.47 mm for thin cortices in the range 0.3–2 mm. This is a clear improvement over the alternative thresholding and 50% relative threshold techniques, which are particularly poor when the cortex is thin. Thin cortices are precisely those likely to be of most clinical interest.
The new method is highly automated and typically produces 17,000 independent thickness estimates over the surface of the proximal femur. These can be displayed as a colour map painted onto the surface, and visualized from any angle. Workflow is already in the realms of clinical practicality: around 10 min to segment the surface, then around 5 min to calculate the thicknesses, using just a modest laptop computer. Faster segmentation might be possible with the use of anatomically specific tools.
Future work will seek to further validate the new technique in vivo, using CT slice thicknesses of 3 mm and above, though Figs. 14 and 15 give grounds for optimism. In the longer term, it is anticipated that cortical thickness maps, like the ones in Figs. 12–15, will become a useful tool for both medical research and clinical practice. With inter-subject co-registration, statistical atlases could be constructed showing cortical thickness over the proximal femur for different groups: e.g. young vs. old, trochanteric vs. femoral neck fractures, cases vs. controls. By comparing such atlases, regions of interest might emerge to serve as biomarkers indicative of fracture risk. Cortical thickness maps might also inform longitudinal studies of patients undergoing anabolic therapy, by determining precisely where any thickening occurs. Finally, cortical thickness would be an important component of patient-specific models that might, one day, emerge as the clinical gold standard for assessing the mechanical competence of individual femurs.
K. Poole is supported by the Arthritis Research Campaign, the Evelyn Trust and the NIHR Cambridge Biomedical Research Centre. The cadaveric femurs were from the Melbourne Femur Collection Research Tissue Bank of the Victorian Institute of Forensic Medicine, with kind permission of Professor John Clement, Australia.
1These expressions are readily derived by differentiating and to obtain a sequence of δ-functions, convolving with , and integrating the result. can then be evaluated by numerical integration of Eq. (5) or (6), which is more computationally efficient than the alternative approach of direct numerical convolution. This is particularly advantageous when has to be evaluated a large number of times, as in the Levenberg-Marquardt procedure described in Section 2.2.
3Thresholding the blurred, low resolution CT data produces contours that are typically 1–2% larger than the corresponding high resolution contours. Thus, seven degree-of-freedom (DOF) registration gives a slightly better fit than six DOF registration. Exploratory investigations with six DOF registration showed that the subsequent thickness comparisons were not particularly sensitive to the nuances of the registration technique.
4It should be noted that the cadaveric femurs were not cleaned before scanning and there was some residual tissue attached to the outer cortical surface. The combination of residual tissue and air may in fact have made the estimation process more difficult, since the exterior CT value would have been estimated in an area not well characterized by a constant value.