|Home | About | Journals | Submit | Contact Us | Français|
When applicable, it is generally preferred to evaluate positron emission tomography (PET) studies using a reference tissue-based approach as that avoids the need for invasive arterial blood sampling. However, most reference tissue methods have been shown to have a bias that is dependent on the level of tracer binding, and the variability of parameter estimates may be substantially affected by noise level. In a study of serotonin transporter (SERT) binding in HIV dementia, it was determined that applying parameter coupling to the simplified reference tissue model (SRTM) reduced the variability of parameter estimates and yielded the strongest between-group significant differences in SERT binding. The use of parameter coupling makes the application of SRTM more consistent with conventional blood input models and reduces the total number of fitted parameters, thus should yield more robust parameter estimates. Here, we provide a detailed evaluation of the application of parameter constraint and parameter coupling to [11C]DASB PET studies. Five quantitative methods, including three methods that constrain the reference tissue clearance ( ) to a common value across regions were applied to the clinical and simulated data to compare measurement of the tracer binding potential (BPND). Compared with standard SRTM, either coupling of across regions or constraining to a first-pass estimate improved the sensitivity of SRTM to measuring a significant difference in BPND between patients and controls. Parameter coupling was particularly effective in reducing the variance of parameter estimates, which was less than 50% of the variance obtained with standard SRTM. A linear approach was also improved when constraining to a first-pass estimate, although the SRTM-based methods yielded stronger significant differences when applied to the clinical study. This work shows that parameter coupling reduces the variance of parameter estimates and may better discriminate between-group differences in specific binding.
For brain imaging studies with positron emission tomography (PET), it is practical to avoid arterial blood sampling which is uncomfortable for the subject, increases the effort and expense of the study, and adds complexity to the quantitative analysis. Thus, methods that do not require blood sampling are often preferable and have seen frequent application to studies where a suitable reference time-activity curve can be defined. A common reference tissue-based method for brain PET studies is the simplified reference tissue model (SRTM) (Lammertsma and Hume 1996) which includes three fitted parameters; the relative tissue transport (R1), the tissue to blood clearance rate (k2), and the binding potential normalized to nondisplaceable uptake (BPND). Note that the nonspecific clearance rate from the reference region ( ) is expressed as , which by definition takes on a single value for any given study. However, as SRTM is often implemented, k2 and R1 may be fitted to multiple regions or even voxelwise to effectively produce multiple estimates of . It can be shown that allowing to differ across regions violates the assumption of uniform nonspecific uptake across the brain. Thus it is appropriate to constrain both on the grounds that parameter reduction improves parameter identifiability and also brings SRTM in better accord with the physiologic assumption of uniform nonspecific uptake (Wu and Carson 2002, Zhou et al 2007). Previously, a two-step voxel-based approach has been described for applying SRTM with constraint of , that reduced the variability of BPND estimates while introducing little bias (Wu and Carson 2002). Another approach to constraining with SRTM is through the use of parameter coupling, which has been shown to reduce the variability of SRTM parameter estimates for quantification of [11C]PIB (Zhou et al 2007). Recently, we reported on the use of parameter coupling to constrain in a [11C]DASB study of HIV depression (Hammoud et al 2010). In that study, although we had applied several methods to quantify [11C]DASB specific binding, we only reported the analysis using SRTM with parameter coupling (SRTMC) as that method yielded the most significant differences. The implication is that SRTMC had improved statistical power over other methods, which seemed plausible given that the application of parameter coupling to PET compartmental models has been shown in many instances to stabilize model fitting and to reduce the variance of parameter estimates (Buck et al 1996, Ginovart et al 2001, Millet et al 2006, Raylman et al 1994). However, the use of parameter coupling with SRTM has not been well documented. Thus, to better characterize the application of SRTMC to [11C]DASB, and most importantly, to determine if SRTMC truly has higher statistical power for detecting significant differences, we applied detailed simulations to evaluate thoroughly several quantitative methods to assess their performance for quantifying [11C]DASB specific binding at noise levels consistent with region of interest time-activity curves. In addition to SRTM-based methods, a linear approach (Zhou et al 2003) is examined both with and without constraint of . The methods are compared using data from the previously published DASB study of HIV depression as well as with simulations, including simulated mock patient studies.
DASB PET data were taken from a study reported recently to examine SERT density in HIV depression (Hammoud et al 2010). Briefly, the study consisted of 7 healthy controls and 18 HIV subjects. HIV subjects were further classified as depressed (n = 9), or non-depressed (n = 9), however, for the present evaluation the HIV subjects will be treated as a single group of 18 subjects. Subjects were scanned for 90 min, with the PET acquisition binned into 22 frames. Twelve total regions, including cerebellum and 11 specific regions of interest, were delineated manually on individual MR scans and then transferred to the coregistered dynamic PET data to generate time-activity curves.
Five quantitative methods were applied to measure specific binding in a study of serotonin transporter (SERT) density in HIV. Method 1 was standard application of the SRTM (Lammertsma and Hume 1996), which was fitted to curves individually to estimate the relative blood to tissue transport (R1), reference tissue clearance rate ( ), and binding potential normalized to the tissue nondisplaceable compartment (BPND). The operational equation for SRTM (Gunn et al 1997) is expressed typically in terms of R1, k2 and BPND (equation (1)). Upon substitution of into equation (1), we obtain the desired form for SRTM (equation (2)) that allows us to couple or constrain . Method 2 is SRTM2, which is a two-step SRTM procedure where the individual estimates of obtained using method 1 were averaged for each subject to obtain a mean value . The value of was then fixed to for subsequent modeling with SRTM to estimate the two remaining parameters (R1, BPND). For human [11C]DASB studies about 1% of values were outliers; thus, to apply SRTM2 consistently, the highest and lowest estimates for each subject were discarded and the average of the nine remaining estimates was used to compute . SRTM2 has been described previously for application to parametric imaging (Wu and Carson 2002). Method 3 is application of SRTM with coupling of across regions (SRTMC). For SRTMC, all 11 specific binding regions were fitted simultaneously in a single regression step to yield 23 total fitted parameters per study which consisted of 11 regional estimates of R1 and BPND, and a single global estimate of . Method 4 was to apply the operational equation from Zhou et al (2003), which is a linear equation with three terms (equation (3)). For the Zhou method, the coefficient of the first linear term is equal to the distribution volume ratio (DVR), where BPND = DVR − 1. Method 5 was a two-step modification of the Zhou method (Zhou2), where the initial fitting with the Zhou method was used to estimate , which was then fixed to reduce a subsequent fitting to two parameters (equation (4)). Some other methods, including the graphical method of Logan (Logan et al 1996), and the relative equilibrium method of Zhou et al (2010), were examined initially but are not reported as they were found to give noticeably greater bias in BPND estimates, especially in high binding regions (Slifstein and Laruelle 2000), than the other methods reported here. For each method, a 2-tailed unpaired t-test was used to test for significant differences in regional [11C]DASB binding. For the SRTM-based methods, tissue activity was modeled by integrating equation (2) over each frame interval. Nonlinear curve fitting to the analytic model was performed using in house software written in C++ that implemented Levenberg–Marquardt optimization. The linear methods (Zhou, Zhou2) were implemented using the left division operator in Matlab 7.5 (Mathworks, Natick, MA).
To characterize parameter estimation with the different methods, 100 sets of time-activity curves (TACs) were simulated, where each set included five hypothetical regions of different SERT density using BPND = 0.25, 0.5, 1.0, 2.0, 4.0. These BPND values correspond to the range of parameter estimates obtained from clinical [11C]DASB studies. Using a two-tissue four-parameter model, the remaining parameters were K1 = 0.4, K1/k2 = 10, k3 = 2.0 min−1. Note that BPND is the only parameter that was varied for the entire set of simulations reported here. To make the curves more realistic, a 4% blood volume term was added to the simulated TACs. For blood volume, the plasma concentration was used in lieu of whole blood with the assumption that tracer equilibrates rapidly between plasma and blood cells. Both the input function and total blood activity for [11C]DASB were constructed by averaging the blood measurements from five control subjects. Noise was computed from , with α = 0.02 (low noise), α = 0.03 (medium noise), and α = 0.04 (high noise) chosen to achieve noise levels consistent with region of interest TACs. For the five BP levels, and three noise levels, 100 TACs were generated, giving a total of 1500 simulated control TACs. For the reference tissue, 100 sets of TACs were generated with a lower noise level of α = 0.01 as the reference region (cerebellum) is typically a larger region that results in a lower noise TAC. To evaluate quantitative methods for each method, each noise level, and each BPND, the percent bias (Bias%), coefficient of variation (COV), and the percent root mean square error (RMSE%) were computed as described previously (Zhou et al 2003), and were obtained from the following equations:
To investigate the ability of quantitative methods to detect significant differences across subject groups, a mock study was generated where it was assumed that five tissue regions were being compared across two groups. Aside from BPND, the simulation conditions used for patients were identical to those used in control simulations, including the same total blood curve and input function. Patient groups were simulated by reducing one of the five regions by either 10%, 20%, or 40%, relative to controls. Thus, there were a total of 15 patient groups with n = 25 per group, with the only difference between simulated patient and control binding levels being a difference in mean BPND in a single region. For example, one patient group had a 10% reduction in the lowest binding region (BPND = 0.225, 0.5, 1.0, 2.0, 4.0) but otherwise had the same average SERT binding as the control group (BPND = 0.25, 0.5, 1.0, 2.0, 4.0). For each group, the 25 patients were simulated at all three noise levels. The within group COV of BPND in each region was between 32% and 38%, which is consistent with the variability in BPND estimates obtained from the patient studies. For example, for a 25 patient simulation of the region with mean BPND = 2.0, the individual BPND values were 1.39, 1.52, 2.36, 1.66, 2.85, 2.07, 2.65, 2.94, 2.74, 0.91, 1.90, 1.35, 1.07, 1.93, 2.00, 2.51, 3.39, 1.55, 1.54, 1.45, 2.65, 0.90, 2.61, 1.90, 2.17. A unique distribution of BPND was generated for each binding level, and was used to generate low (α = 0.02), medium (α = 0.03) and high noise (α = 0.04) TACs for each BPND value. Each of the 15 patient groups were compared to the control group using all five quantitative methods to test for significant differences in BPND for each region. The comparison was done using a 2-tailed unpaired t-test across all 25 subjects as well as with subsets of 10 or 15 subjects per group. As the significance testing is dependent on which subgroup of subjects is selected, the significance for n = 15 was tested using 25 different combinations of 15 subjects per group. For example, controls 1–15 were compared to patients 1–15, controls 2–16 were compared to patients 2–16, and so on with the final comparison being controls and patients 25, and 1–14. To summarize the significance testing for each simulated region, the average of the 25 significance p-values is reported. Similarly, for n = 10 per group the significance was tested with grouping of simulations 1–10, 2–11, …, 25 and 1–9, and the average p-values are reported. The total number of simulated patient and control TACS was 16 groups × 25 subjects per group × 3 noise levels = 1200 TACs.
A summary of the p-values obtained for each region using each method is shown in table 1. Overall, the best discrimination across regions was achieved using SRTMC with p < 0.05 in 8 of 11 regions. SRTM2 results were comparable to SRTMC with p < 0.1 in 7 of 11 regions, and with p < 0.05 in 3 of 11 regions. The Zhou and Zhou2 methods achieved p < 0.05 in a single region, and standard SRTM showed the poorest discrimination, with p > 0.1 for all regions.
At low noise, the RMSE% was similar for all methods (table 2), although the methods that held to a single value (SRTM2, SRTMC, Zhou2) had smaller COV than the other methods (SRTM, Zhou). That distinction was more apparent at high noise levels, as SRTM2, SRTMC, and Zhou2 had smaller COV as well as much smaller RMSE% than either SRTM or Zhou. In comparison of SRTMC and SRTM2, although the RMSE% was quite similar, SRTMC showed consistently smaller COV whereas SRTM2 had smaller Bias%. At high noise level, Zhou2 gave a smaller COV and RMSE% than either SRTM2 or SRTMC. For the nonlinear methods (SRTM, SRTM2, SRTMC) the bias did not increase substantially at high noise level, whereas there was a large noise-dependent increase in Bias% for the Zhou method. RMSE% tended to be smaller for the two-step methods (SRTM2, Zhou2) as compared with their corresponding one-step methods (SRTM, Zhou), although Zhou2 did show a small increase in RMSE% for high binding regions (BPND = 1,2,4) in the medium noise simulations. For estimation of R1, the COV, Bias%, and RMSE% were generally small, with the notable exception of Zhou2, and to a lesser extent Zhou, at high noise level (table 3). The Zhou method also gave a very large Bias% in at all noise levels (table 4), whereas SRTM showed very little Bias%. SRTMC yielded greater Bias% in than was obtained with SRTM, but SRTMC also gave smaller COV.
For all quantitative methods, no significant differences (p > 0.2) were found at any noise level when evaluating regions where the simulated binding level was equal for controls and patients. There were also no significant differences found when evaluating groups that had a regional binding change of only 10%, although for the single case of medium noise looking at a 10% reduction in BPND = 4, the nonlinear methods (SRTM, SRTM2, SRTMC) gave p < 0.08, whereas the linear methods (Zhou, Zhou2) gave p > 0.3. In order to focus on the comparisons that were relatively near the common threshold of statistical significance (p < 0.05), only a subset of the mock study results will be presented in detail. Table 5 shows the significance p-values for measuring a 20% reduction in SERT with 25, 15, or 10, subjects per group. Table 6 shows the p-values for measuring a 40% reduction with 15 or 10 subjects per group. The p-values for measuring a 40% reduction with 25 subjects per group are not presented as the t-test in that case was highly significant for all methods. Tables 5 and and66 show that for low noise simulations, there is not much difference in the statistical power achieved by each method. At high noise level, the methods that constrain to a single value (SRTM2, SRTMC, Zhou2) show similarly good discrimination (small p-values) for most comparisons. In particular, at high and medium noise the two-step methods (SRTM2, Zhou2) tended to show better discrimination than their corresponding one-step method (SRTM, Zhou).
The projections of midbrain serotonergic neurons are distributed throughout the brain, but the local serotonin transporter (SERT) concentration is quite diverse with a few areas of high concentration such as the midbrain, thalamus, and basal ganglia, with low to intermediate concentrations throughout the cortex (Varnas et al 2004). As a variety of regions may be of interest in any given study, it is important to quantify high and low binding regions reliably. In fact, recent studies have implicated cortical and limbic SERT measured by [11C]DASB in Parkinson’s disease and geriatric depression (Kish et al 2010, Smith et al 2008). However, quantification of regions with either high or low SERT density presents a challenge. In the case of low-density regions, the low specific binding signal may be difficult to distinguish from non-specific binding. For high-density regions, the tracer clearance is relatively slow which may be problematic for quantitative methods that assume some degree of equilibrium among compartments. TACs at all binding levels are generally well characterized when applying full compartmental modeling with a blood input function. However, it is particularly difficult to measure the input function for [11C]DASB due to an atypical metabolite profile that likely reflects transient uptake in lung (Parsey et al 2006b). That difficulty, along with the very low SERT binding in cerebellum, has further encouraged the use of reference tissue methods for quantifying [11C]DASB PET studies (Frankle et al 2006, Ichise et al 2003).
An important distinction between the application of blood input and reference tissue methods is that quantification of specific binding with a blood input method is typically a two-step process, whereas application of a reference tissue method is typically a one-step process. For example, when applying a two-tissue compartmental model, the first step in the quantification of tracer binding is to estimate the non-specific volume of distribution (VND) by modeling the cerebellar gray matter, which is nearly devoid of specific SERT binding and is thus considered to be a reference region (Parsey et al 2006a). The second step is to model specific binding regions to measure K1, k3, BPND, with the assumption that , where the ‘r’ superscript denotes the reference region. Alternatively, the total volume of distribution (VT) in a specific binding region can be measured with any suitable method, and then specific binding can be computed as BPND = VT/VND − 1. Note that since we have asserted that is constant, it can be seen that is constant. Therefore, setting to be constant across regions is tantamount to setting VND constant as is required in order to adhere to the assumption of uniform nonspecific binding. In addition to making the application of SRTM parsimonious with physiological assumptions, constraint of adds stability to the analysis. Consider that for [11C]DASB, it has been shown that with compartmental modeling, constraint of VND either by fixing K1/k2 to a constant, or by coupling K1/k2 across TACs, gave better convergence and more consistent results than was obtained without constraint of K1/k2 (Ginovart et al 2001). From the relation , it is seen that setting to a fixed value for SRTM is the kinetic equivalent of setting K1/k2 to a fixed value for the two-tissue compartmental model. Similarly, coupling of across tissue regions is equivalent to coupling of K1/k2. Therefore, just as fixing or coupling of K1/k2 improves parameter estimation when using the two-tissue blood input model, fixing (SRTM2) or coupling (SRTMC) of should improve parameter estimation when applying SRTM. Thus, it is not surprising that SRTM2 and SRTMC reduced the RMSE% of BPND estimates relative to SRTM, and also yielded better significance in the mock study. The linear method that was examined (Zhou) was also improved by constraint of (Zhou2). In particular, Zhou2 had a smaller COV than the Zhou method, and tended to yield better significance, especially at high noise level. In the mock PET study, the Zhou2 method yielded p-values that were quite comparable to those of SRTM2 and SRTMC, although Zhou2 did not perform as well in our clinical study.
It is instructive to examine the effects on estimation of other parameters, namely R1 and . The estimation of R1 showed small Bias% (<4%) and COV (<3%) with SRTM-based methods (table 3) at all noise levels. As was the case for BPND estimation, for R1 estimation SRTMC had a smaller COV than SRTM and SRTM2 as well as a somewhat larger bias. The Zhou and Zhou2 methods showed consistently larger COV and Bias% in R1 than the SRTM methods. In particular, Zhou2 showed a much larger Bias% in R1 at the highest noise level examined. For estimation of , the Zhou method showed an extremely large Bias% that exceeded 100% even at low noise level (table 4). That finding is consistent with the smaller, but still substantial bias in R1 and k2 that was found previously with application of the Zhou method to simulations of [11C]Flumazenil (Zhou et al 2003). It is important to note that for the Zhou method is estimated from a ratio of parameters whereas BPND is estimated directly. That may account for why the Zhou method is able to achieve a rather good estimate of BPND despite yielding a rather poor estimate of . In contrast, SRTM estimated with much lower bias than Zhou2, while having similar COV. As compared with SRTM, SRTMC yielded even smaller COV for especially at high noise level. However, SRTM showed a smaller Bias% in (<9%) relative to SRTMC (Bias% ~10–12%), but the Bias% and COV of measured with SRTMC appeared to be insensitive to the noise level. In fact, in some cases the Bias% of BPND was slightly higher at medium noise level than at high noise level (table 2), although COV increased as expected which suggests a noise/bias tradeoff in parameter estimates. In general, SRTMC gave larger Bias% and smaller COV than did SRTM for both and BPND. That finding indicates that, at the cost of some bias, coupling of via SRTMC does reduce variability associated with estimation of , which translates to reduced variability in the estimation of BPND. Given the similarity in performance of SRTM2 and SRTMC when examining the RMSE% of control simulations (table 2) and the detection of differences in the mock patient study (tables 5 and and6),6), it appears that either method of constraint is suitable and definitely preferable to the standard SRTM approach for ROI analysis of [11C]DASB PET studies.
The linear methods tested here (Zhou, Zhou2) performed about as good as the nonlinear methods at low noise level. As compared with the one-step method (Zhou), the two-step method (Zhou2) gave better discrimination between groups at high noise level (tables 5 and and6),6), and showed less Bias% (table 2). On average, Zhou2 gave somewhat less power for significant difference detection than SRTMC and SRTM2, although the overall performance of the Zhou2 method in the mock study simulations was quite comparable.
In the original publication of our [11C]DASB study in HIV, SRTMC was selected for presentation because it yielded the highest significance. The present work provides some theoretical validation that SRTMC is indeed among the best current approaches for extracting a significant result given an underlying difference in BPND. It is important to note that although the mock patient simulations are useful for examining the relative performance of different quantitative methods, the simulations apply only to sets of parameters that are typical of [11C]DASB PET studies. However, some inferences on the application of SRTMC to other tracers can be drawn from other work. For example, faster equilibrating tracers such as [11C]carfentanil, [11C]raclopride, and [11C]flumazenil would probably not benefit as much from SRTMC as SRTM already obtains reliable estimates when the approach to equilibrium is relatively rapid (Endres et al 2003). Similarly, in comparison of [11C]PIB binding between healthy controls and Alzheimer’s patients, the statistical significance values obtained with SRTMC were about the same as those obtained using a simple Logan plot (Zhou et al 2007). The SRTM2 results obtained here were generally consistent with those published previously for [11C]DASB (Ichise et al 2003). For example, it was found for SRTM2 that the bias of BPND was generally small as was the bias and variability of R1, the latter of which showed subtle increases with noise. There were some differences, for example the work of Ichise showed that bias with SRTM2 increases with BPND at high noise levels which was not found here. A likely reason is that the present work targeted noise levels relevant to ROI analysis, whereas the Ichise paper included examination of voxel level noise. That work also presented a linear method known as the multilinear reference tissue model (MRTM), as well as a two-step procedure with constraint of (MRTM2) (Ichise et al 2003). In our experience, for parametric imaging of human brain studies with tracers such as [11C]DASB and [11C]PK11195, MRTM tends to show higher variability than the Zhou approach. However, there is apparently little distinction between the MRTM and Zhou methods at the lower ROI noise levels examined here. Consider that the Ichise study showed that BPND estimates obtained with MRTM2 had nearly identical variability to those obtained with SRTM2. The bias in BPND was also similar but was slightly higher for MRTM2. In this study, the BPND estimates obtained using Zhou2 were also similar to SRTM2 with Zhou2 showing somewhat smaller variance but also greater bias. The implication is that MRTM2 and Zhou2 give similar performance at ROI noise levels which is not surprising given that they are both linear methods based on different permutations of the same equation. Either SRTM2 (or MRTM2 or Zhou2) are applicable to voxelwise calculations, and should be implemented when constraint of is desired for voxelwise analysis. SRTMC is not practical for voxelwise analysis due to the large number of parameters that would need to be included in a single regression equation.
The Zhou method was developed originally as a parametric imaging approach where tissue data were smoothed using a linear ridge regression penalty function (Zhou et al 2003). Here, we are applying the method to region of interest TACs, where the tissue data are already quite smooth, and thus we did not further smooth the data. However, even in that case the limitation of the Zhou method when applied to a high binding region is apparent, and the performance of the method generally degrades at high noise levels. The poor performance of the Zhou method in that case is likely due to the increasing lack of equilibrium in regions of high SERT density. As compared with the Zhou method, Zhou2 gave a definite improvement in power in the mock study simulations. The control simulations showed that Zhou2 substantially reduced COV relative to Zhou, although RMSE% increased slightly for high BPND.
Overall, evaluation of the simulations as well as the clinical study support SRTMC and SRTM2, and to a lesser extent Zhou2, as being better capable of detecting a significant difference between subject groups when performing ROI analyses of [11C]DASB PET studies. It is evident that parameter constraint, achieved either by fixing to a first-pass estimate, or coupling of across regions, reduces the variability of parameter estimates and achieves more robust parameter estimates in the presence of noise. Thus, the fact that SRTMC and SRTM2 gave the strongest significance in our clinical HIV study, with SRTM giving the weakest significance, is consistent with there being a real difference in SERT binding between control and HIV subjects that SRTMC and SRTM2 were better able to distinguish.
The authors would like to express their appreciation to Rena Geckle for coordinating the study, as well as the radiochemistry staff and PET technologists at the Johns Hopkins University PET center. They gratefully acknowledge Drs Yun Zhou and Gwenn Smith for helpful discussions. They also acknowledge NIH R21 MH076591.