|Home | About | Journals | Submit | Contact Us | Français|
In this article, the authors evaluate a merit function for 2D/3D registration called stochastic rank correlation (SRC). SRC is characterized by the fact that differences in image intensity do not influence the registration result; it therefore combines the numerical advantages of cross correlation (CC)-type merit functions with the flexibility of mutual-information-type merit functions. The basic idea is that registration is achieved on a random subset of the image, which allows for an efficient computation of Spearman’s rank correlation coefficient. This measure is, by nature, invariant to monotonic intensity transforms in the images under comparison, which renders it an ideal solution for intramodal images acquired at different energy levels as encountered in intrafractional kV imaging in image-guided radiotherapy. Initial evaluation was undertaken using a 2D/3D registration reference image dataset of a cadaver spine. Even with no radiometric calibration, SRC shows a significant improvement in robustness and stability compared to CC. Pattern intensity, another merit function that was evaluated for comparison, gave rather poor results due to its limited convergence range. The time required for SRC with 5% image content compares well to the other merit functions; increasing the image content does not significantly influence the algorithm accuracy. The authors conclude that SRC is a promising measure for 2D/3D registration in IGRT and image-guided therapy in general.
Exact estimation of patient and target position is a crucial issue in modern fractionated radiation therapy since accurate dose delivery over the entire period of treatment is mandatory for therapeutic success. The current interest in image-guided radiation therapy (IGRT) has been stimulated by the advent of novel radiation therapy facilities with integrated kilovoltage (kV)-imaging units1,2 and the broad availability of semiconductor electronic image detectors.3-5 In the context of daily patient position verification, 2D/3D registration6-8 was always considered as the technology of choice since its introduction to medical imaging. Here, a 2D projection image (usually an x ray) is iteratively compared to a perspective volume rendering derived from volumetric 3D imaging. In the case of patient positioning in radiation therapy, typically megavoltage (MV) electronic portal images (EPIs) or, more recently, a kV x ray and digitally rendered radiographs (DRRs) derived from computed tomography (CT) volume data are used. Additional information from other imaging sources can also be integrated in such a volume.9 The latest generation of linear accelerators (LINACs) provides x-ray imaging equipment mounted together with the treatment system for intrafractional planar x-ray and cone beam CT imaging.1
As far as 2D/3D registration in general is concerned, the main issues remain algorithm robustness and execution speed of the iterative registration process. A number of possible merit functions, rendering algorithms for the generation of DRRs and optimization strategies were proposed. These include the following.
The properties of merit functions need further discussion since this article deals with a novel type of similarity measure that has advantageous properties for the 2D/3D registration of images with substantial intensity discrepancies. In general, one can categorize merit functions as follows:
Usually, in 2D/3D registration, intramodal images are registered, although efforts to register magnetic resonance (MR) volume images and DRRs exist28 or are subject to research for slice-to-volume registration.16,17,29,30 With regard to the daily verification of patient position in IGRT, we are dealing with a quasi-intramodal matching problem; image intensity and histogram content are highly disparate due to the different energies and detector characteristics used for CT kV imaging from either the LINAC-integrated cone beam CT or a diagnostic CT image from another machine; both may provide the volume data for iterative DRR rendering in the 2D/3D registration process. The x-ray imaging from the LINAC-integrated imager in the treatment room provides the reference image. However, the physical process—the attenuation of ionizing radiation by tissue—is the same in both cases.
The robustness of the 2D/3D registration process is governed by the global shape of the merit function. A wide convergence radius (sometimes also referred to as “capture range” in literature) with a smooth concave shape is desirable for optimum convergence even in the presence of large initial misregistrations. Too many local or false global minima increase the risk of registration failure, whereas sharp, spike-shaped merit functions may give accurate results in the case of convergence but require the use of time-intensive global optimization algorithms.17
In a recent study on slice-to-volume registration of FluoroCT and CT imaging,17 we found CC to be an excellent merit function for 2D/3D matching if an exact linear relationship exists between pixels in the images under comparison. This was also confirmed by other recent studies10,16 but careful radiometric calibration of image histograms when rendering the DRR is necessary to achieve good results.10 The reason for this constraint of strict linearity is clear from the very definition of CC and is further addressed in Sec. II. MI, on the other hand, appears suitable for the problem of 2D/3D matching of imaging data obtained at different energies but was of questionable stability due to the sparse histogram population resulting in false global minima at the boundaries of the optimization range.7,17 In this article, we introduce stochastic rank correlation (SRC), a novel merit function for 2D/3D registration of images which exhibit a continuous monotonic relationship between pixel intensities. In particular, we have tested SRC on a reference dataset of a cadaver spine.27
The CC-merit function is, in fact, the Pearson product moment correlation coefficient that measures the level of linear dependence between two variables measured on the same subject. It is defined as
where Iij denotes the pixel grayscale values of the base image and the match image, respectively. is the average grayscale pixel value of the image. It is a well-known fact from basic statistics that the product moment correlation coefficient determines an exact linear relationship between the grayscale pixel values in IBase and IMatch. Another property of is evident from Eq. (1). Variances are derived as deviations from the average value , which is a measure prone to error if outliers are present. In the case of medical image registration, high intensity artifacts jeopardize registration success. Careful radiometric calibration, for instance, by means of histogram equalization, can somewhat compensate for this problem.10
For variables measured on an ordinal level, statistics provides Spearman’s rank correlation (RC) coefficient, a measure for assessing arbitrary monotonic functions. It does not require a linear relationship, and the variables need not be measured on a metric scale. For the purpose of medical image registration, a merit function based on RC can be defined as follows:
Note that is defined in such a manner that totally uncorrelated images would deliver a maximum value, and perfectly correlated images give a minimum so that a convex optimization problem is given. For obtaining a concave minimization problem, one may define
The advantages of using RC rather than CC are evident.
The disadvantage of RC is also obvious. The processes of raw score matrix sorting and comparison of ranks is of considerable complexity and requires a large amount of computing time.
The basic idea of SRC is to obtain an estimate of the raw score lists, RBase and RMatch, by sampling the two matrices at randomly selected but identical pixel positions in IBase and IMatch. In practice, one determines a mask of pixel positions prior to algorithm initialization for both images. This mask is computed from a uniform distribution using a random number generator. It is of utmost importance to emphasize that the random mask is constant during the registration process and identical for both images; hereby, the rank correlation coefficient is always computed from the image intensities at the same pixel location in RBase and RMatch. The effect of this operation is that the number of pixels N can be reduced significantly. The performance of such a merit function
Evaluation of the algorithm took place with x-ray/DRR datasets of an available reference dataset for 2D/3D registration of a spine segment.27 The phantom consists of a spine segment with five vertebral bodies. Image data are provided via http://www.isi.uu.nl/Research/Databases/GS/. For the evaluation, we used the CT data and the lateral fluoroscopic image from the example data. The CT was acquired using a 16-row multislice scanner (MX8000) at a tube voltage of 80 kV and the x ray was taken with a Integris BV5000 (both from Philips Medical Systems. Best, The Netherlands) at approximately 60 kV according to Ref. 27. Our main interest for applying SRC stems from image-guided radiotherapy with LINAC-mounted kV-imaging systems such as the Elekta Synergy system. Since the kV-imaging unit of such a system is rigidly attached to the LINAC, only one projection can be acquired at a time—therefore we confined ourselves to 2D/3D registration using only a single projection. The coordinate system used for registration in our implementation is fixed to the imaging plane, as opposed to the local coordinate system used in Ref. 27. For this reason, the gold standard position data were transformed to the convention as used in our setup. Furthermore, the CT scan was interpolated from its original resolution of 0.30664×0.30664×0.492 mm3 to a homogeneous resolution of 0.5 mm3 using ANALYZE AVW 9.0. The reference x ray was also interpolated from its original resolution of 0.6271 mm2 to a resolution of 0.5 mm2 and cropped to a 256×256 matrix. The distance of the x-ray focus and the imaging plane is 802.37 mm. Finally, both the x-ray image and the CT were saved in 8 bit depth, which is also the original depth of the provided CT scan and the x-ray image. Figure 2 shows the reference x ray together with six DRRs rendered from the CT data. The splat-rendering algorithm used here24 requires a minimum threshold for perspective volume rendering—six different intensity thresholds of 40, 50, 60, 70, 80, and 90 (out of 256) gray values were used for the evaluation of the algorithm with changing histogram content. Since the CT scan does not feature 12 bit depth, translating these thresholds to Hounsfield units (HU) is not possible; when transforming the 8 bit gray values to HU, one can estimate the minimum thresholds for DRR rendering to be in the range of −384–416 HU.
Using the known registration parameters, we evaluated the capture range and the accuracy achievable by performing 145 registrations with varying starting positions. Following a simple power analysis, the sample number should be sufficient for giving a significant result. The correct parameters were distorted by a random number. The range of displacements was chosen to be ±5° and ±10 mm. These displacements should also reflect the typical range of motion encountered in radiotherapy, for instance, in lung irradiation.5 After registration, the position of the maximum voxel coordinate using the found registration parameters was computed, and the result was compared to the position of that voxel as given by the gold standard parameters. The absolute difference was recorded as the target registration error (TRE). We did not select the results in terms of successful and unsuccessful registrations; only those results where the optimization did not converge after a fixed set of iterations were removed. This problem, however, only occurred for one of the merit functions under consideration.
The merit functions under comparison were CC, CC after histogram equalization of the DRR and the x ray, and SRC using 3%, 5%, and 10%. Furthermore, SRC was also computed for 50% and 66% of image content—these results are, however, to be considered out of competition since the time required for computing the merit function becomes unacceptable. The results were recorded, and a t-test for independent samples was carried out for the TRE using SPSS 17.0. The resulting p values are given in Sec. III.
PI was also included. PI is known to give excellent results for 2D/3D registration,7 but suffers from a very limited convergence range, which was also confirmed by our own experience.17,23,31,32 The merit function operates on a difference image Idiff and is given as
Since PI operates on a difference image, the threshold for DRR rendering also appears critical since the disorder of the difference image is evaluated as a registration measure. For our method of evaluation, we used PI together with histogram equalization. It turned out that the algorithm converged only for approximately 50% of all trials, with very varying accuracies when using internal parameters of σ =10 and r=3 or r=5. A detailed description of the algorithm can be found in Ref. 7.
MI, the other merit function that is of known usefulness when dealing with multimodal 3D/3D registration problems, is a natural choice for images with different histogram content. However, it is also known for its problematic behavior in the case of 2D/3D registration due to the sparse population of the joint histogram.7,17 MI is highly dependent on the chosen implementation.5 In our case, a simple approach where the probability distribution was derived from a histogram with 32 bins was also of limited success. Therefore, MI was finally excluded from the experiments, which are strictly aiming at quasi-intramodal 2D/3D registration. When registering x-ray data to volume data from another modality but CT, the case is of course different.
The algorithm was implemented using GCC 4.1.2 on Fedora 8 on a Dell Precision 490 personal computer. User interfaces were programed using the Qt 3.3 toolkit (Trolltech, Norway). Basic image processing functionality was added using the ANALYZEAVW 9.0 library (BIR, Mayo Clinic, Rochester/MN).33 In principle, the software setup was adopted from the setup used in Ref. 17. DRRs were generated using the wobbled splatting algorithm as described in Ref. 24. The Downhill–Simplex algorithm used for optimization in the iterative registration process used was taken from the GSL package (Free Software Foundation, Boston, MA). Registration takes place in such manner that after convergence of the algorithm, the result is used as a starting point for another trial. After three subsequent registrations, the result was considered to be final.
In a first trial, the stability and accuracy of the implementation was tested. For this purpose, a DRR was rendered at a threshold of 70, and the result was fed into the program as the x ray. The known initial position used for rendering the DRR was distorted by random positions of 5° and 10 mm in each coordinate. Fifteen registrations were carried out. Such an evaluation on a phantom image gives only little information on the actual performance of an algorithm, but it gives a figure of the quality of the implementation including the optimization algorithm. For this ideal configuration, no significant differences in the maximum TRE were found. All merit functions converged properly. PI suffers from its small range of convergence, which affects the overall algorithm accuracy. The detailed results can be found in Table I.
The crucial evaluation is the registration of the spine phantom dataset; the results can be found in Table II. Here, SRC using 3%–10% of image content always outperformed CC and CCHEq. For CC, results were really bad, especially when using another treshold than 70, which gave the most homogeneous results for CC, CCHEq, and SRC. Maximum TRE for CCHEq ranged from 14.3±6.8 mm when using DRR threshold 70 to 18.9±8.6 mm at threshold 90. The results for SRC (excluding the registration efforts when using 50% and 66% image content, which did not show any improvement compared to SRC using 5% or 10% image content) ranged from 12.6±5.7 mm (SRC 5%, DRR threshold 70) to 14.7±9.0 mm (SRC 3%, DRR threshold 40). For all trials, the best results were provided by SRC. It is the very nature of the volume-driven splat-rendering routine to perform better for higher thresholds,24 which results in faster registration if a higher threshold is chosen. Detailed results for the time requirements are given in Table II; however, SRC for 3% and 5% image contents required usually 20–120 s, which is similar to the performance of CC and CCHEq. SRC with 50% or 66% of image content took up to 10 min and is therefore only of academic interest.
PI showed only limited stability and was terminated in almost 50% of all trials. Beyond that, a tendency to converge to local minima was evident, which spoiled the registration results. Maximum TRE for those cases where convergence was achieved ranged from 15.0±7.3 mm (PI with r=3, σ=10 at DRR threshold 60 to 17.0±8.3 mm for the same parameters at DRR threshold 80. The results are given in Table III.
In this article we present a merit function suitable for 2D/3D registration of intramodal data sets acquired with x-ray-based equipment. The specific strength of SRC is the independence of pixel intensity differences in the image histograms as long as they can be modeled as monotonic transforms. Thus, radiometric calibration is unnecessary. However, SRC also has limitations by design. As it is the case with most intensity-based merit functions, depth of the images is crucial; if only a few actual values of possible gray values γ are available, the number of tied ranks will be considerable, resulting in an inconclusive minimum for the merit function. It is also necessary to use images with lots of content—an x ray showing a small structure surrounded by air will be definitely result in only a few random pixels with different ranks. Furthermore, noise may affect the performance of SRC, as it is the case with most other intensity-based merit functions. In this article, we have not examined the influence of noise on SRC. However, the evaluation was carried out with image data acquired on standard imaging equipment, and the splat-rendering routine used24 produces images stricken with Gaussian noise by definition, which is reduced by applying a spatial low pass filter prior to registration of the DRR to the x ray. We therefore assume that the noise level encountered here represents a real-life scenario.
Our results demonstrate that, in the given setup, SRC outperforms both classical CC and CC with histogram equalization of the images under comparison; in the case of SRC, any kind of fine tuning in histogram space is not necessary by definition. Therefore, SRC is ideally suited for registration tasks involving image data obtained at different energies, and the experimental data on the spine phantom support this conclusion. However, a patient study using actual IGRT equipment is a pending task. Furthermore, the robustness of SRC in the presence of high intensity artifacts is an open question. It is in the very nature of the rank correlation coefficient that it is more robust against outliers; the extent of this theoretical advantage is to be examined.
The use of stochastic sampled image information for merit function design is not new; in Ref. 21, a similar approach was used to derive a MI-based merit function while reducing time requirements for computing the DRR. We have not explored this approach since wobbled splatting,24 as a volume-driven method, does not necessarily benefit from rendering just a few pixels. On the other hand, it is more efficient compared to raycasting, and it offers huge potential for GPU-based implementation.22 A direct comparison of a partial raycasting approach as proposed in Ref. 21 and splat rendering is, however, also an open task.
One possibly astonishing fact is the range of error encountered in the experiments and the fact that PI does not perform as good as one might expect from the literature. The reason for this behavior lies in the fact that the initial misalignment in the experimental series is larger than the convergence range typically used in most evaluations. The very limited convergence range of PI, however, is also known and documented.23,31,17,37 In Ref. 27, a convergence range of only a few millimeters is considered to give safe results. With tumor motion tracking being our objective, it makes little sense to distinguish between failed and successful registrations—only those results where the optimization algorithm did not converge were excluded; the range of motions possible used in this evaluation is realistic, and beyond that, the TRE computed gives a worst case scenario. PI, a merit function that gives a very sharp minimum, here is a victim of this setup. It gives very good results for small displacements, but outside the convergence range, it does not even provide a gradient that might finally point in the appropriate direction. Future research work, for instance, on improved optimization schemes as proposed in Ref. 31, is, however, necessary to improve convergence behavior and algorithm accuracy. Since it is not possible to compute analytical derivatives of a rank correlation type of merit function, options for alternative optimization schemes are somewhat narrowed down.
Another problem we encountered is the little availability of reference datasets and reference algorithm implementations. The spine dataset27 is a good initial effort in this direction but suffers from limitations such as its reduced intensity range and the small size of the specimen; a pending task for the community is therefore the development of more representative phantoms, which are to be provided together with gold standard data and reference implementations of the registration algorithms. Given the methodological nature of this article, an exhaustive evaluation with more merit functions is therefore also a pending task.
The other question is the intended use of inter- and intrafractional imaging in combination with 2D/3D registration in IGRT. The most obvious application—exact positioning of the patient prior to treatment—is widely discussed and well known both in image-guided therapy6-8,20,21,25,16,17,28 and radiation oncology.2-5,19,10,26 For this purpose, several x-ray images from various positions or even a cone beam CT can be made, the latter providing the known advantages of a full 3D volume. However, numerous tumor locations require intrafractional position monitoring due to organ motion during irradiation. It appears as if the advent of on-board imaging units has further stimulated this field of research, which spans approaches from motion prediction34,35 to permanent x-ray monitoring of implanted fiducial markers36 and registration.37-41 2D/3D registration using only one projection42 as in our experimental series appears to be a promising method of choice to achieve this goal of minimizing dose delivered outside the target volume despite the additional irradiation from the low-energy x-ray system. With this perspective in mind, we have designed our experiments; here, the sensitivity and reliability of motion detection is of greater interest than the absolute accuracy (which can be achieved by performing a 3D/3D registration of the cone beam CT and the diagnostic CT prior to irradiation anyhow). The future will show whether image-based motion correction at a high rate can compete with motion sensing using miniature active43 and passive44 tracking solutions. To achieve this goal, a massive increase in registration speed, which may be achievable using GPU-based parallel approaches is necessary.
ANALYZEAVW and AVW were provided courtesy of Dr. R. A. Robb, BIR, Mayo Clinic, Rochester/MN. The authors are grateful to Mary Mc Allister, MD, Johns Hopkins University Hospital, Baltimore, for manuscript assistance. The spine dataset and ground truth for the 2D-3D registration used in this work was provided by the Image Sciences Institute, University Medical Center Utrecht, The Netherlands. This work was supported by the Eurasia-Pacific Uninet Foundation and the Austrian Science Foundation FWF Grant Nos. P19931 and L503.