|Home | About | Journals | Submit | Contact Us | Français|
Estimating organ residence times is an essential part of patient-specific dosimetry for radioimmunotherapy (RIT). Quantitative imaging methods for RIT are often evaluated using a single physical or simulated phantom but are intended to be applied clinically where there is variability in patient anatomy, biodistribution, and biokinetics. To provide a more relevant evaluation, the authors have thus developed a population of phantoms with realistic variations in these factors and applied it to the evaluation of quantitative imaging methods both to find the best method and to demonstrate the effects of these variations. Using whole body scans and SPECT/CT images, organ shapes and time-activity curves of 111In ibritumomab tiuxetan were measured in dosimetrically important organs in seven patients undergoing a high dose therapy regimen. Based on these measurements, we created a 3D NURBS-based cardiac-torso (NCAT)-based phantom population. SPECT and planar data at realistic count levels were then simulated using previously validated Monte Carlo simulation tools. The projections from the population were used to evaluate the accuracy and variation in accuracy of residence time estimation methods that used a time series of SPECT and planar scans. Quantitative SPECT (QSPECT) reconstruction methods were used that compensated for attenuation, scatter, and the collimator-detector response. Planar images were processed with a conventional (CPlanar) method that used geometric mean attenuation and triple-energy window scatter compensation and a quantitative planar (QPlanar) processing method that used model-based compensation for image degrading effects. Residence times were estimated from activity estimates made at each of five time points. The authors also evaluated hybrid methods that used CPlanar or QPlanar time-activity curves rescaled to the activity estimated from a single QSPECT image. The methods were evaluated in terms of mean relative error and standard deviation of the relative error in the residence time estimates taken over the phantom population. The mean errors in the residence time estimates over all the organs were <9.9% (pure QSPECT), <13.2% (pure QPLanar), <7.2% (hybrid QPlanar/QSPECT), <19.2% (hybrid CPlanar/QSPECT), and 7%–159% (pure CPlanar). The standard deviations of the errors for all the organs over all the phantoms were <9.9%, <11.9%, <10.8%, <22.0%, and <107.9% for the same methods, respectively. The processing methods differed both in terms of their average accuracy and the variation of the accuracy over the population of phantoms, thus demonstrating the importance of using a phantom population in evaluating quantitative imaging methods. Hybrid CPlanar/QSPECT provided improved accuracy compared to pure CPlanar and required the addition of only a single SPECT acquisition. The QPlanar or hybrid QPlanar/QSPECT methods had mean errors and standard deviations of errors that approached those of pure QSPECT while providing simplified image acquisition protocols, and thus may be more clinically practical.
In vivo organ absorbed dose estimation is an essential part of patient-specific radioimmunotherapy (RIT). There are currently no direct measurement methods feasible for routine clinical use. Instead, the conventional method used to estimate the organ absorbed dose is based on the MIRD methodology.1,2 This has been widely applied to clinical trials of RIT.3–7 The MIRD methodology has previously been implemented into software packages, such as OLINDA,8 and requires estimates of organ cumulated activities or residence times. Those parameters are usually estimated by integrating organ time-activity curves (TACs) estimated from measurements of organ activity at multiple time points. The organ activity measurements are usually made using quantitative imaging methods. It is important to validate these methods before relying on them for clinical studies.
Quantitative imaging methodologies are usually validated using physical or mathematical phantom studies. The validated methods are then applied to a population of humans who exhibit variability in terms of anatomy and biokinetics. In this work, we investigated the errors associated with applying an approach that has been validated for a single anatomical and biokinetic parameter set to a patient population exhibiting variability in anatomy and biokinetics.
While clinical studies would seem the obvious way to evaluate quantitative imaging methods, such studies can be quite challenging for a number of reasons. First, obtaining sufficient numbers of patients can be both expensive and time consuming. Second, it is very difficult to optimize factors such as collimator design and acquisition protocol using patients, as this would require multiple scans for each patient. Indirectly assessing the accuracy of methods by correlating estimated quantities with biological effects such as tumor response or organ toxicity is also not the optimal way to evaluate these methods. This is because, in addition to the differences in accuracy of the methods under investigation, the biological effects can be quite variable over patient populations. Thus, the efficacy of the quantitative methods can be obscured by other factors such as previous treatment history. Another alternative is the use of animal studies. However, these are expensive and sizes of the animals are such that scatter and attenuation effects will be different than in humans. In addition, the biokinetics of the organ uptake and the geometric size and spatial relationships of organs are different than in humans.
An alternative to clinical or animal studies and the one used in this work is simulation-based evaluation using a population of phantoms where the simulated anatomies and biokinetics match those of the imaging agent of interest. We have previously performed such simulated clinical trials in the context of myocardial perfusion defect detection and found that the results in terms of optimal parameters and the area under the receiver operating characteristic (ROC) curve were quite different than in previous studies using less realistic phantoms with less variability in anatomy and perfusion agent biodistribution.9–11
In this work, we evaluated quantitative residence time estimation methods12 using a population of phantoms based on the 3D NURBS-based cardiac-torso (NCAT) phantom13 with realistic variations in anatomy and biodistribution. The projection data were simulated using Monte Carlo simulation methods that accurately modeled the image formation process including attenuation and scatter in the patient and scatter, penetration, partial absorption, and finite energy and spatial resolution in the collimator-detector system. The evaluation was performed in terms of accuracy and variation in accuracy of residence time estimates obtained from the phantom populations. Please note that variation in accuracy is perhaps as important a parameter as the average accuracy. For example, if a quantitative imaging method has poor average accuracy but the magnitude and direction of the inaccuracy are consistent (i.e., there is no variation in accuracy), it will be a better predictor of the absorbed dose-response relationship than a method that has good average accuracy but with large variations in both magnitude and sign of the error for individual patients.
Using diagnostic CT, planar nuclear medicine, and SPECT/CT images, we measured the parameters describing the anatomy and in vivo biodistribution and biokinetics of patients participating in a clinical trial of high dose myeloablative 90Y-Zevalin therapy.14 All the patients had conjugate view whole body scans acquired at 0–1, 4–6, 24, 72, and 144 h post injections of 111In-Zevalin. In addition, abdominal and thoracic SPECT/CT scans were performed immediately after the 24 h whole body scan. The anatomical parameters were measured for dosimetrically important organs that had high activity uptake, including bone marrow, heart, liver, lungs, spleen, and kidneys. The parameters measured for each organ included its length, width, and thickness. The measurements were performed on 3D organ VOIs which were defined slice by slice on registered SPECT and CT images. Table I shows the patient gender and measured organ volumes.
These values were used to rescale the size of the corresponding organ in the NCAT phantom by shrinking or stretching it so that it had the same size in these three dimensions. The result was a phantom where the values of these parameters were the same as for the patient, though the exact organ shapes, volumes, and geometric relationships were different. This process was repeated for seven consecutive patients to generate a population of phantom anatomies.
In addition to the anatomical parameters, we used quantitative imaging methods to estimate organ time-activity curves (TACs) for these organs.14 Organ activities were determined using conventional planar (CPlanar) processing methods at each time point and quantitative SPECT (QSPECT) processing methods of 24 h SPECT scans. The CPlanar method used was similar to the realistic CPlanar method discussed below. However, no explicit object thickness corrections were performed because no transmission data were acquired in the clinical trial. The TACs were then fitted with mono- or biexponentials in order to estimate the effective half-life and fraction of injected activity (FIA) at the time of injection for each organ and patient. Tables II and III show the measured organ activities, backextrapolated to time zero, and the organ activity half-lives, respectively.
We used each set of anatomic parameters with each set of organ biokinetic parameters to form a total of 49 sets of parameters. Below we describe how these sets of parameters were used to produce 49 different projection data sets.
We used a modified version of the SIMSET code and an angular response function (ARF) method to generate the projection data.15 The ARF method uses a set of precalculated tables based on full simulations of the interactions of photons in the collimator-detector system performed using a modified version of the Monte Carlo N-particle transport code (MCNP) code.16 It provides a fast method for accurately modeling the interactions in the collimator-detector system. This simulation method and its validation have previously been described in detail.17
All the simulation parameters were appropriate for a GE Discovery VH/Hawkeye SPECT/CT system using a 2.54 cm thick crystal and a medium energy general purpose (MEGP) collimator (hex holes 5.8 cm long with a flat-to-flat distance of 0.3 cm and a septal thickness of 0.105 cm). The simulation modeled a 9.5% energy resolution at 140.5 keV and a 4.5 mm intrinsic spatial resolution. Both 111In photopeaks (171 and 245 keV) were separately simulated with the appropriate abundances. Low-noise projection images were generated in 128 transaxial and 170 axial projection bins at 120 views over 360° using a 0.442 cm projection bin size and two 14% wide energy windows centered at 171 and 245 keV. Two additional energy windows, 145.9–158.1 and 184.5–225.5 keV, were also simulated for use in a triple-energy window (TEW) scatter compensation method.18 In this implementation of the TEW method, we assumed that the counts in an energy window above the 245 keV photopeak would be zero. This energy window setting was used because of limitations in the number of energy windows that can be used in our clinical GE Millenium VG SPECT/CT system. Thus, only three images from combinations of four energy windows were obtained for each simulation: one primary window image obtained by summing counts from two 14% wide photopeak energy windows and two scatter window images.
In order to reduce discretization effects, the activity and material maps were discretized with voxels 1/8 the volume (0.221 cm cubic voxels) of the simulated projection bins (0.442 cm bin size). Using the seven sets of NCAT phantoms and seven sets of organ activity uptakes and biokinetics listed in Tables I–III, we generated 49 sets of projection data corresponding to the realistic anatomy with variations in the biodistribution and biokinetics at five different time points (1, 5, 24, 72, and 144 h) using monoexponential TACs. For patients where the original TAC was a biexponential, we modeled only the washout phase and used the corresponding half-life. For each phantom anatomy, simulations were performed separately for each organ. The organs were then scaled and summed based on the whole-organ activity uptake and kinetics for a given patient to form a set of projection data corresponding to the same anatomy but with variations in the biodistribution and biokinetics. This method was used to reduce the number of Monte Carlo simulation required but to provide a large number of combinations of variations in both kinetics and anatomies.
For each set of projection data, the low-noise projections were scaled to count levels appropriate for SPECT or planar acquisitions at the corresponding injected activities and scan times. We simulated total acquisition times of 20 and 30 min for the planar and SPECT scans, respectively. The average over all slices and all patients of the total counts in the 24 h SPECT projection sinograms was 1.02×105. The corresponding average total counts in the 24 h anterior planar images was 2.3×106. Note that since the organ uptakes were different for each set of projection data, there was variation in the noise level, as one would expect in clinical studies. Poisson noise was then simulated using the scaled low-noise projections to generate projections with clinically realistic noise levels. Figure 1 shows samples of noisy anterior planar images at five different scan time points from two sample phantoms.
For each phantom and at each time point, we applied three different organ activity quantification methods: the conventional planar (CPlanar), the quantitative planar (QPlanar),19 and the quantitative SPECT processing methods (QSPECT).16
In the CPlanar method, we first applied scatter compensation using the TEW method and then attenuation compensation using the geometric mean (GM) method20 with object thickness correction. To partially account for the effects of source thickness, we also used a calibration factor that converts GM counts to activity21 by assuming that the total activity in the 1 h planar scan was equal to the injected activity (i.e., patients had not voided prior to the imaging at 1 h). Three different overlap and background activity correction methods were applied. In the first method (referred to as CPlanar-None), we first computed the simple projections of the true 3D VOIs. The 2D organ ROIs were then defined to contain the nonzero pixels in these projections. In this case, there was no overlap correction, which represented the worst case in terms of organ overlap. The other extreme situation was ideal overlap correction (referred to as CPlanar-Ideal). Since each organ was separately simulated, we could quantify organ activity using each organ’s separate projection where there was no overlap. We also applied a realistic overlap correction between these two extreme cases (referred to as CPlanar-Realistic). In this case, manually defined 2D organ ROIs were drawn to avoid overlap with other regions and, to the extent possible, overlap with underlying organs. As shown in Fig. 1, the planar ROIs on the first column are the outlines of the 2D projections of the 3D true organ VOIs. These ROIs clearly demonstrate the problem of organ overlap in the 2D projection views. The ROIs on the second column are the manually defined ROIs used for the realistic overlap correction with the CPlanar processing method. No explicit correction was done for background activity or to compensate for missing portions of the organ or activity from other organs. These corrections were not applied because they are frequently not applied in clinical studies and there are no standardized methods for these corrections.
In the QPlanar method, the true 3D organ VOIs from the phantom configurations were used to overcome the overlap problem in the CPlanar method. Uniform activity distributions inside each organ were assumed. The projection of each organ’s VOI, scaled so that there was unit total activity was calculated using a projector which models image degrading effects including attenuation, scatter, and full collimator-detector response. The average of the true attenuation maps for the 171 and 245 keV photons emitted by 111In was used for the attenuation modeling. Scatter was modeled using the effective source scatter estimation (ESSE) method.22 The collimator-detector response was estimated by Monte Carlo simulation of point sources at various distances from the face of the collimator; these simulations included propagation of photons in the collimator and detector.23 Organ activities were then estimated as the set of scale factors for each individual organ projection that gave the best match, in a maximum-likelihood sense, between the measured projections and the sum of the scaled organ VOI projections. More detailed information about the QPlanar method has previously been published.19
In the QSPECT method, the SPECT projections were reconstructed using the iterative ordered-subsets expectation-maximization (OS-EM) algorithm24 with 24 subsets and 30 iterations. Attenuation, scatter, and the collimator-detector response function (CDRF) were modeled during the iterative process using the same rotation-based projector used in the QPlanar method. A perturbation-based geometric transfer matrix (pGTM) method was used for postreconstruction partial volume compensation.25 A detailed description and validation of the QSPECT methods can be found in Ref. 16.
Using the organ activity estimation methods described above, the organ time-activity curves (TACs) were estimated from a series of SPECT and planar scans. The organ TACs were then fitted with a monoexponential curve, which then was integrated to give the cumulated activity and, after dividing by the total activity at time zero, the residence time. Three classes of methods, pure planar, hybrid planar/QSPECT, and pure QSPECT, were used to estimate the residence times in this work.12 For the pure planar or QSPECT methods, the residence time was estimated using only the planar or SPECT scan data, respectively. For the hybrid method, one QSPECT image (from a SPECT study acquired 24 h postinjection) was used to rescale the TAC obtained from the pure planar method. Four different hybrid methods were investigated based on the three CPlanar methods and one QPlanar method. A detailed description of these methods can be found in Ref. 12. The three CPlanar methods differed in how they compensated for background and overlap, as described above.
A comparison of the accuracy of the estimated organ activities [(mean percentage error)±(standard deviation of percentage errors)] for the 24 h time point obtained using the three CPlanar, QPlanar, and QSPECT methods is shown in Table IV.
The CPlanar method with no overlap compensation (CPlanar-None) generally overestimated all the organ activities due to the overlap with other organs and background activity, as demonstrated in Fig. 1. Even with manually defined ROIs, TEW scatter, and object thickness compensation (CPlanar-Realistic), there was still significant overestimation for most organs. For a relatively isolated and large organ such as the liver, the CPlanar-Realistic method worked relatively well, as might be expected. For smaller organs such as the kidneys, overlapping organs and background activity were still important factors, even when using the smaller ROIs. With the ideal overlap and background activity compensations, the CPlanar-Ideal method still resulted in errors of up to 21% due to the inherent limitations of the CPlanar method. The activity estimates for the QPlanar method had errors of less than 7% except in the lungs, likely due to the nonuniform activity distribution and partial volume effects.19 The QSPECT method generally provided the most accurate activity estimates.
Note that the variations in accuracy of all the QSPECT and planar methods (as measured by the standard deviation of the percentage error) due to the anatomical variations were, in all cases, larger than the variations due to statistical noise in a single phantom.12 Statistical significance of the difference between planar and QSPECT activity estimates was determined by computing the ensemble mean and standard deviations of the differences in the activity estimates for the two methods. The null hypothesis was then tested using a two-tailed t test. The differences for each organ were found to be statistically significant with p values less than 0.001 for all methods except QPlanar and QSPECT where there was no statistically significant difference between these two methods for all organs except the lungs.
We also studied the accuracy and variation in accuracy of organ half-life estimates. These reflect the accuracy of the shape of the estimated TAC. Table V shows the mean errors and standard deviations of the errors in the estimates of the organ half-lives obtained using the CPlanar, QPlanar, and QSPECT methods (the half-life for the hybrid methods were the same as for the planar methods).
From Table V we see that the half-life estimates from QPlanar and QSPECT methods were within 3% of the true value for all organs. Errors for the CPlanar methods were somewhat larger, being on the order of 8%–21% for most organs. The relative standard deviations of percentage errors over the patient population were larger than the relative standard deviations over multiple noise realizations presented in Ref. 12 (less than 2%). The differences in the half-lives for CPlanar-None and CPlanar-Realistic methods were statistically significantly different from the other three methods for all the organs (p<0.001), while there were no statistically significant differences among CPlanar-Ideal, QPlanar, and QSPECT methods.
Table VI compares estimates of the organ residence times for different CPlanar, QPlanar, QSPECT, and hybrid methods. From Table VI, the CPlanar-None method provided the poorest accuracy and largest variation in accuracy. For the CPlanar-Realistic method, the residence time estimates were overestimated by at least 31% for all organs except the liver and spleen, where the activities were underestimated by 7%. For the ideal case, CPlanar-Ideal, the errors for most organs were 6%–16%. For the hybrid CPlanar and QSPECT methods, the errors were substantially less for most organs than for the corresponding pure CPlanar method. The quantitative accuracy for pure QPlanar (<5% error for all organs except the lungs) or QPlanar combined with a single QSPECT study (<7% error for all organs) was substantially improved compared to the use of pure CPlanar, with accuracy approaching that of pure QSPECT (<10% error for all organs). Again, for all these methods, the variations in accuracy due to the phantom variations were relatively large compared to those due to the noise variations (less than 2% for a single phantom as reported in Ref. 12). This again indicates the importance of using a population of phantoms instead of a single phantom and multiple noise realizations to validate quantitative imaging methods.
In this work, simulated experiments using a population of realistic anatomical phantoms based on 111In ibritumomab tiuxetan patient studies were used to evaluate various residence time estimation methods. The methods evaluated were based on either a series of conjugate view whole body scans without (pure planar) or with SPECT (hybrid planar) or a series of SPECT (pure SPECT) scans. Although the size of the phantom population (seven anatomies and seven biodistributions permuted to form a set of 49 phantoms) was limited, the results demonstrate the importance of using a phantom population instead of a single phantom. In previous work,12 we evaluated the accuracy and precision of different quantitative residence time estimation methods due to statistical noise. Compared with the results in Ref. 12, the variations of errors for each method calculated from the phantom population were much greater than the variations due to statistical noise. Different methods responded differently to the variations in phantom population. Since the variation in errors may be as important as the mean error in determining how well an absorbed dose estimate will correlate with a biological response, it is essential to evaluate or optimize a quantitative method using a population of phantoms. In this work, we focused on evaluating the errors and variations in errors due to patient variations. There are other substantial sources of errors and variations of errors such as organ VOI misdefinition and misregistration. We have performed some preliminary evaluations of the effects of these errors,26 and more complete studies are under way.
Three CPlanar methods without, with simple, and with ideal background and overlap compensations were evaluated in this work. While the CPlanar methods with the use of improved overlap and attenuation compensation and addition of corrections for source and object thicknesses performed better than without them, these require additional imaging time, careful calibration, and skill in defining the ROIs. The results will be somewhat subjective depending on the skill in drawing of organ and background ROIs, which may compound the already-large variation in errors seen for these methods. However, even with ideal overlap and background compensations, the CPlanar method still provided larger errors and variations in errors than either the QPlanar or QSPECT methods due to the approximations inherent in planar processing methods.
Combining the CPlanar method with a single SPECT study resulted in a large decrease in the mean error and the variation in the error for most organs. It thus seems that the use of hybrid SPECT/planar methods should become the minimum standard for clinical practice.
The QPlanar method, either alone or when combined with a single SPECT study, provided further reductions in errors and variations in errors; for most organs, the error levels with the hybrid QPlanar/QSPECT method approached those obtained with a QSPECT series. However, QPlanar processing requires registered 3D anatomical information which might limit its application in clinical studies. Compared with the CPlanar methods, the QPlanar method is more theoretically robust and less subjective since 2D ROIs do not need to be drawn to minimize overlap. Compared with the pure QSPECT method, QPlanar is less computationally intensive and does not require multiple SPECT acquisitions at each time point that, at least in current practice, use longer acquisition times than typical planar scans. As a result, it appears that, if the registration challenge is solved, QPlanar processing may be a viable replacement for CPlanar methods for organ dosimetry.
The use of a QSPECT series provided no statistically significant advantage over the QPlanar/QSPECT hybrid method and little or no advantage in variation of accuracy. However, while defining the 3D organ VOIs in each scan would be tedious, there is, in contrast to the QPlanar method, no need to register the VOIs from each scan. One limitation of the pure QSPECT method is the requirement to perform relatively computationally intensive QSPECT reconstructions at each time point. Nevertheless, given that the image acquisitions are spaced over a time that is large compared to the reconstruction time and given the rapid advance in computer power and reconstruction algorithms, this is likely not to be a major limitation.
In this work, we have developed a phantom population based on anatomical, biodistribution, and biokinetics data obtained from seven patient studies. Using this phantom population, we evaluated various residence time estimation methods. The results indicated that the processing methods differed both in terms of their average accuracy and the variation of the accuracy over the population of phantoms. The variations in accuracy were larger than the variations due to noise in a single phantom observed in previous studies. This demonstrates the importance of using phantom populations in evaluating quantitative imaging methods. In the study we observed that the addition of a QSPECT scan to a CPlanar series provided a marked reduction in the mean percentage error and standard deviation of the percentage error for most organs. This suggests that hybrid QSPECT/planar imaging should be the minimum standard for clinical practice. Among all the processing methods, the QPlanar or hybrid QPlanar/QSPECT method had mean errors and standard deviations of errors that approached those of pure QSPECT while providing simplified image acquisition protocols. Due to their simpler acquisition and processing protocols, the hybrid methods may be more clinically practical and thus the methods of choice for clinical practice.
The reconstruction code used in this work has been licensed to GE Healthcare for inclusion in a commercial product. Under separate licensing agreements between the General Electric Co. and the Johns Hopkins University and the University of North Carolina at Chapel Hill and GE Healthcare, Dr. Frey is entitled to a share of royalty received by the universities on sales of products described in this article. The terms of this arrangement are being managed by the Johns Hopkins University in accordance with its conflict of interest policies.