Assessing the Measurement of Tumor Volume Change from Anatomic Imaging

Many parameters could be exploited to measure tumor change. There are physical parameters that already have either established or suggested relationships to cancer including density, diffusivity, and elastic moduli. In addition, there are shape and composition parameters including volume, spicularity (typically quantified as the ratio of surface area to volume), heterogeneity, and vascularity (typically quantified as number of vessels intersected per unit area in a histology section). The following section describe methodologies only for assessing the accuracy of measuring tumor volume change as rendered in anatomic imaging. In the simplest case, the same techniques can be used for assessing accuracy of measuring other parameters, but should these parameters have interactions, the measurement methods will require the use of multiparametric estimators such as generalized linear models (GLMs) potentially including mixed effects models that are not addressed herein.

By way of introduction to the problem of measuring volume change of tumors, we describe three of possibly many methods of implementing such measurements. The purpose here is to show the generality of the possible solutions as well as to view the following discussion from a common viewpoint, that is, primarily that of the quantification of tumor volume

*change*. Consider the following two of many possible methods for estimating tumor volume change:

- Currently, the standard method of measuring change is the sequential segmentation of the tumor in interval examinations followed by subtraction of the value of the tumor volume of the previous examination from that derived from the current examination. This double segmentation is an indirect method in that volume change is not measured directly and will depend on the accuracy or consistency of the segmentation and the change assessment paradigm [18,19].
- Registrations that map the same, possibly complex, tumor geometry between two different interval examinations can be performed with the volume change estimated directly

In the following discussion of volume change, we are fundamentally addressing directly the problem of quantifying volume change. When we discuss random error variance or bias, we are not referring to just the segmentation problem that may or may not precede more sophisticated estimates of volume change but rather to the *entire* change analysis methodology.

Components of Error

In every problem, we face two basic components of error:

*Variance*, σ^{2}, is a quantitative estimate of the random variability of the data about its mean in repeated measurements associated with noise from various sources, for example, data sensors and subsequent measurement methods, and is estimated as shown here
Here *x*_{i} represents one of the *N* discrete measurements wemake to compute the quantity within the brackets as an unbiased estimate of the variable's variance, is the estimate of the discrete data's mean, and *E* is the estimate of the quantity inside the brackets as *N* approaches infinity. The quantity ( is the noise or error term from the expected mean estimate. Thus, we estimate variance by computing the terms inside the brackets, that is, the sample variance, for an *N* large enough to give us a sufficiently low noise estimate of the true variance for our purposes. *Precision*, a qualitative term frequently used in radiologic literature, is quantified by the measurement of SD, σ, also called standard uncertainty, which is the square root of variance. Precision improves as the SD and variance of the repeated measurements decrease.For example, if we need to measure the length of an object with a cloth measuring tape, we can measure the object multiple times and calculate the variance of the measurement. Here, the variability could be due to several components of error, for example, the measurement tool can be randomly stretched by differing amounts, each time we place the beginning of the tape at slightly different positions, and so on. These differences between repeated measurements can be characterized by the variability about the mean, that is, the *variance*; the smaller that variance is, the more *precise* these measurements are said to be.

*Bias* is a quantitative estimate of systematic measurement error, that is, even if the random error were zero, the measured number would be systematically different from the truth if bias were nonzero. Examples of this include systematic over/under estimation of some measured property (again, such as volume). *Accuracy*, a qualitative term also frequently used in radiologic literature, is quantified by the measurement of bias. Accuracy improves as the measured bias decreases. The measurement of bias is discussed in more detail in the section on estimating variance and bias for the case of no volume change.Continuing the previous cloth tape analogy, all measurements would be positively biased, that is, longer, if the cloth tape we used had been unknowingly cut off at the beginning of the tape by 1 inch. Thus, we could average many measurements to reduce the variance and improve the precision, but still be wrong (biased or inaccurate) by 1 inch.

For tumor change measurements, we will begin with the assumption that for similar physical imaging characteristics and subject setup, the variance and bias estimates are likely dependent on the size of the tumor as well as its complexity which includes factors such as heterogeneity, shape, and location; specifically, the derived parameters that describe each of the errors may in general be a function of these enumerated independent parameters.

In most experiments, we observe both effects simultaneously as they are not easily separated and only through the collection of sufficient data and the use of statistical analysis techniques such as GLMs with selected mixed effects are we able to separate estimates of error components. Such models are especially important when the measured quantities are truly changing with time. The modeling is complicated by having to choose the specific mixed effects and degrees of freedom (DOF). Owing to the model's large DOF, the amount of test data needed and collected under known conditions also increases. When a single measured quantity is stationary, as in the section on Estimating Variance and Bias for the Case of No Volume Change, we can also approach the problem as a simpler, ordered discovery of the two separate components.

Estimating Variance and Bias for the Case of No Volume Change

In the following discussion, we will describe an ordered quantification of both random error and bias *around the operating point of ***no** change. This is a crucial operating point because in many practical clinical applications, we wish to discover real change in as short a time interval as possible to affirm or refute the assumption that the applied therapy is effective. The urgency arises from the desire to prevent continued treatment of the patient with a high-cost and high-risk regimen with no demonstrated benefit as well as from the need to rapidly switch an *individual* patient to another therapy that may increase individual efficacy. Thus, measurement noise observed in the case of no change for a specific patient is a sample of the null hypothesis that must be quantified before we can determine with some stated probability that *any* measured change represents true change.

Estimating variance for the case of no volume change Under the simplifying assumption that the random error is additive, we can estimate its variance by using input data sets where we know the underlying truth is no change. There are two main types of experiments to be considered here: “coffee break” studies, that is, very short interval examinations, and longitudinal studies, both used for gathering input data sets from which we can estimate variance.

- (
*a*) Coffee break studiesIn a coffee break study, the time interval between scans is small compared with the time required for the tumor to change macroscopically, that is, the subject is scanned more than once in a given session or day. Requirements for these studies typically include a special prospective image acquisition protocol with local institutional review board approval, the acquisition of no more than two scans per patient owing to radiation or contrast dose considerations and use of the same scanner for both scans so that confounding effects such as scanner calibration drift, change of physical scanner, change in image acquisition, or reconstruction protocols are minimized. Although these efforts may seem overly constraining to be required in clinical practice, to achieve clinical measurements of early tumor change during as short an interval as 1 to 2 weeks, patients should be reassigned to the same scanner with the same specified image acquisition protocol despite any implementation difficulties. We would expect that maximum sensitivity to the measurement of change would be obtained under these conditions because of the reduced number of possible noise sources. Although performance of repeated imaging on manufactured phantoms may provide useful information about imaging equipment variations such as calibration drift, obtaining this imaging on patients is advantageous for predicting variance in clinical applications such as measuring tumor change. Because of a variety of factors including homogeneity of the background, simplistic shapes of simulated tumors and lack of interfering adjacent structures such as penetrating vessels, using physical phantoms to estimate the variance of image-derived parameters will, as a rule, underestimate the variance of image-derived parameters obtained in clinical practice, but such phantoms will have use in estimating bias as described later.

- (
*b*) Long-term clinical surveillance studiesA second possible source of input data sets for estimating variance is imaging examinations taken during multiple quarterly, semiannual, or annual intervals in which no statistically significant trend is observed. We would expect to find more random variability in the measurement of tumor change in this setting due to effects unrelated to tumor change such as long-term physiological change in the subjects and scanner changes, for example, different 1) scanners, 2) acquisition protocols, and 3) hardware and software owing to upgrades including image reconstruction algorithm changes. The main advantage of this approach is that cases may be retrospectively selected from clinical archives and special prospective institutional review board protocols would not likely be required. The difficulty here is that 1) the analysis of these longitudinal data to verify that the tumor volumes are statistically stationary over time is slightly more complicated than the simplistic analysis we describe for the coffee break data, and 2) we are only studying tumors that are stable and these tumors are not necessarily representative of cancers as a whole. Because stable tumors are generally more homogeneous than tumors that are rapidly growing, the measurement task may be simpler and the variance reduced compared with that obtained in malignant tumors. In addition, because these tumors may actually be slowly changing, these data may be useful for testing the relative comparison of algorithm variance using the estimator

described later.

Although we are limiting our consideration to the measurement of change near the operating point of no change, we need to make the measurement of variance for tumors of differing sizes. There is significant evidence from manual and semiautomatic segmentation that SD and therefore variance is a function of tumor size; see [

17]. Thus, we need a source of truth data, for example, coffee break examinations, which contain a spectrum of scanned tumor sizes to characterize the performance of the change measurement analysis for different size tumors.

Because sample variance is a noisy measurement of the underlying distribution's variance, we will need many measurements of tumors with no size change. There are two possible approaches to increase the number of observations of variance to approximate the variance of the underlying distribution.

- (
*c*) For every patient with *N* interval examinations of a tumor that is conservatively judged to show no change, we can compute *N*! / ((*N* - 2)! x 2!) different but partially correlated, interval pairs of examinations from which we can estimate the variance of change measurements; DOF will need to be adjusted to account for the correlation in the data pairs. - (
*d*) Because estimates of random error are potentially dependent on tumor characteristics (e.g., shape, content, surroundings, volume, acquired voxel size), we should only use interval examinations containing tumors of similar characteristics that are conservatively judged to show no change. These variance estimates can then be aggregated to decrease our confidence limits for estimating the underlying population variance for tumors characterized by that specific volume and constitutive complexity.

The estimate of the random error's variance may be sensitive to the estimator used, particularly in case of an error in classifying a tumor as having no size change. For example, for measurements

*X*_{i} with mean

, the obvious estimator is the sample variance determined by:

which is exactly the same estimator as

a

*U* statistics-based estimator [

21] as suggested in (

*c*) above. However, estimators

and

are valid only under the assumption that there is no change and will be biased if the tumor varies with time. Other model-based methods are more appropriate should the tumor's volume vary with time. For long-term clinical surveillance studies of slowly varying nodules, the estimator

should be less heavily biased and can be justified based on simple assumptions on the nodule growth and the homogeneity of the variance:

Robust estimation is especially useful for small data sets when outliers may make a big difference. Huber [

22] and Hoaglin et al. [

23] give extensive discussion of the pros of robust estimation in practice.

Estimating bias for the case of no-volume change Once the random error's variance has been approximated, we can explicitly compute the number of observations we must obtain to test for the presence of a

*bias* effect at some stated level of significance. The number of observations (experiments performed to measure the bias) depends on the variance of the previously determined random error distribution, the size of the bias effect we wish to measure and the probability that we will measure such an effect, that is, reject the null hypothesis, at a stated level of confidence. The required number of observations, that is, measurements of volume change,

*increases* as

- (
*a*) the size of the bias we wish to measure *decreases* for a fixed variance, - (
*b*) the variance as measured above for the random error component *increases* for a fixed bias, or - (
*c*) the power, that is, likelihood of detecting the change at the given level of significance, *increases* (an interactive demonstration of powering a test is available at: http://wise.cgu.edu/power/power_applet.html).

The measurement of bias is important because if present it will lead to a propensity for false-positives/negatives, depending on whether the change measurement bias is positive/negative, respectively. As the name implies, bias is a systematic error whose cause can be discovered and removed or at least modeled and ameliorated.

Estimating Variance and Bias for the Case of True Volume Change

The determination of bias and variance in the presence of true volume change is needed if we want a completely generalized statistical characterization of a measurement method. Note that if we want to quantify volume change, not simply determine whether there was or was no change, the truth data required for this task are more difficult to obtain. Because estimates of bias and variance in volume change may be dependent on tumor volume as well as tumor volume change (along with other characteristics such as shape, type, acquisition/reconstruction protocols, and possible motion), we will want to regress both bias and variance as a function of both tumor volume and tumor volume change through GLMs. Truth data for this task can only be known from manufactured phantoms; a method for obtaining volume change truth for real tumors is difficult and has yet to be defined for RIDER. Here, the “coffee break” null change paradigm for real patient scan data is of little use because the “truth” of tumor size is not known (only the null change in tumor size is known); instead, we need estimates of true change from other accurate sources.

The key issue is that we currently have no measurement method that will provide the true change in size of an actual tumor. For real tumors that do change in size between interval scans, we are restricted to using image measurements made by expert radiologists, and this measurement method is itself subject to bias and random error [

9,17]. The only way we can obtain scans with known truth for size change is to scan manufactured phantoms with known tumor characteristics and different sizes or to embed simulated, mathematically defined tumors in actual patient scan data; the critical concern here is how well such phantoms represent real tumors and their growth. To summarize:

General Overview of Methods Useful for Assessing Tumor Change

In the preceding section, methods for assessing the relative performance of algorithms specifically for measuring tumor *volume* change for the purpose of early assessment of tumor response to therapy were discussed. Whereas the use of volume was explicitly examined, we could use exactly the same techniques to examine any other single parameter, for example, average mass, elasticity, etc., and the same techniques for assessment of performance would apply, that is, the measurement of variance and bias. There is, however, an explicit difference between volume and most other single parameters: volume is necessarily a singular, summary parameter whereas other parameters have tumor-dependent, heterogeneous spatial distributions of values within that volume which can be characterized in several ways including a one-dimensional histogram of its values and the histogram's summary statistics, that is, mean, variance, skew, kurtosis, and other higher moments.

The function of the sparsely filled is to demonstrate the relative relationships of some different outcomes analysis methods and computed parametric models previously contributed to NCI's public archive

https://imaging.nci.nih.gov/ncia/, now called the National Biomedical Imaging Archive, through the efforts of previous RIDER groups as well as a few related methods previously published.

As seen in the rows of the Outcomes Analyses, , most processing is first subjected to segmentation, that is, defining the volume of interest (VOI) for further processing as the volume of clinical interest. Registration commonly follows segmentation in that registration of the whole, complex set of organ systems is very computationally intensive and challenging given that some organs deform and slide along slip walls, for example, lung compression and slippage along the pleural surface of the rib cage. Thus, registration of the lung alone is far simpler than attempting to register the lung and chest wall simultaneously owing to the discontinuity of velocity vectors at the pleural surface. Hence, registration of a segmented lesion with itself across interval examinations is typically preferred.

After segmentation and registration, differing outcomes analyses are applied to the following potential change descriptors for detecting/measuring response to therapy:

**Volume**: In the case of estimating a tumor's volume change, two of the methods we discussed in the previous section are shown in in the second set of major columns from the right. Whereas the imaging modalities associated with volume estimation are typically those of CT and MRI owing to better spatial resolution, segmentation-based implementations can be applied to PET and single photon emission computed tomography (SPECT) as well.The method of tumor segmentation followed by summing the volume of voxels inside a VOI yields a single numeric characterization of the volume of the tumor in the examination; subtraction of the results for any two interval examinations yields a single numeric characterization of the tumor's volumetric change. Although this method that can include manual as well as many sophisticated semiautomatic methods is the current method of choice of most groups for computing volume change, we have referenced only two papers here [

18,19]. Note that this method yields only a single number, not a spatial distribution that can be summarized by other single metrics, such as the mean; because none of the other parametric methods we will discuss yields only a single number, the remainder of this row has been grayed out.

Another method we described previously that directly measures volume change is based on registration of the earlier interval examination onto the later. Assuming that the information content of the imaging modality is sufficient to support accurate registration, such methods provide a spatial distribution of local scale changes over the volume of the reference tumor as represented by the Jacobian matrix, that is, the determinant of the first partial derivatives of local change in all cardinal directions. The resulting scale distribution yields local measures of heterogeneous volume changes if they exist. Again we cite only a few reference examples using such methods [

20,25,26].

**Uptake**: In PET, biologic chemists have had significant success in tagging specific physiological metabolites with radiotracers. Normalized standard uptake values (SUVs), calculated typically as the ratio of measured radioactivity concentration to injected dose divided by patient body weight, are proposed for quantifying tumor response to therapy [27]. In the accompanying article as well as in a predecessor article [28], Kinahan et al. describe methods of PET quality control and measurement for assuring that measured SUV changes are related to the tumor's physiological changes in response to therapy. The accompanying article additionally describes useful test data contributed to the RIDER data collection that help define tumor change effect sizes that are required to identify a meaningful change. Because SUV is a spatial distribution of values over the segmented tumor, its measurement is typically reported as the maximum and/or the mean and SD of the underlying one-dimensional histogram of values within the delineated VOI.**Perfusion**:
- (
*a*) Many perfusion models exist, but in dynamic contrast enhancement (DCE) MRI, a simple, often used model is the two-compartment model: one compartment for the intravascular input contrast concentration, and the other for the extravascular-extracellular compartment. Common assumptions here are that the current gadolinium-based contrast agents do not penetrate cells and that the intravascular concentration of contrast only contributes to contrast enhancement in the extravascular-extracellular compartment by passive diffusion. The relationship between the change in signal amplitude due to T1 relaxivity and contrast concentration must first be established to convert voxel amplitudes into contrast concentrations. Then from Fick Law, the time rate of change for contrast material in the extracellular tissue is driven by the difference between the two concentrations, that is, the input from intravascular plasma and the loss from the tissue surrounding the capillaries back into the plasma. For a two-compartment model, this statement is typically written in equation form as
where *C*_{t}(*t*) is the time-dependent extracellular tissue concentration, *C*_{p}(*t*) is the time-dependent plasma concentration, *K*_{trans} is the rate coefficient for contrast flow from the plasma into the tissue, and *k*_{ep} is the rate coefficient in the opposite direction. Because these rates are due to passive diffusion mechanisms through the capillary endothelial cells, that is, no active “pumps,” and volume normalization is applied, we can derive that *k*_{ep} = *K*_{trans}/*v*_{e}, where *v*_{e} is the fractional extracellular-extravascular volume and *v*_{p} is the fractional blood plasma volume; see Tofts et al. [29]. Computing coefficients from differential equations is very sensitive to noise so a more robust approach is to model the time integral of the equation given above, which converts the solution to the convolution of *K*_{trans}*C*_{p}(*t*) with the kernel , that is, , where noise is now attenuated owing to the averaging of the integral. Computing the coefficients for this simplified model is still fraught with some difficulties, for example, picking a good model of the plasma input function to determine the convolution kernel as well as attempting to avoid numerical instabilities in the discrete implementation of the convolution. Here, additional noise reduction can be achieved using the singular value decomposition in the discrete modeling of the convolution integral and elimination of the smaller eigenvalues in formulating the inverse [30]. - (
*b*) A simpler modeling approach is that the temporal integration of the T2* relaxivity change in the first pass of an intravenous bolus contrast injection at each voxel in brain yields the relative cerebral blood volume (rCBV) change [31] under the assumption that the blood-brain barrier is intact, or in cases of fenestration, that appropriate deconvolution modeling is used [32,33]. The relative mean transit time (rMTT) is obtained from the integral of the time-weighted concentration normalized by rCBV, and thus relative cerebral blood flow is defined by the ratio rCBV/rMTT. Owing to the rapid time rate of change of blood flow, dynamic susceptibility contrast MRI sampling is accomplished using rapid acquisition sequences such as echo-planar imaging. A detailed pictorial review accompanied by equations of these concepts and others that follow is provided in Jackson [34].Whereas MRI acquisition methods to obtain data in support of computing perfusion models described in sections (

*a*) and (

*b*) above were described, CT, PET, and SPECT are all capable of capturing images sufficiently rapidly to derive meaningful coefficients for modeling perfusion in section (

*a*) as described by Tofts. But only CT and MRI are readily capable of the increased acquisition rates necessary to derive coefficients for the model in section (

*b*) above. Coefficients for models described in both sections (

*a*) and (

*b*) can be computed on a voxel-by-voxel basis; thus, outcomes analyses can be computed in a number of ways. A good review of methods for perfusion (as well as diffusion) MRI is presented in Provenzale et al. [

35]. By far, most outcomes analyses for perfusion coefficient models use summary statistics from VOIs to report a mean and standard error of the VOI mean (SEM); necessarily limited references to these approaches are included herein [

36–41].

More to the point of this article's emphasis, tumor change analysis in perfusion is often computed as the change in these summary statistics with

*t*-tests performed to assess whether the treatment effect measured was different than the null hypothesis. In the case that there are multiple VOI pairs for an interval examination from which change is assessed, or where each voxel pair in registered interval data sets is treated as a separate “VOI,” the statistical test must be corrected for false-positives arising from multiple comparisons. If the one-sided level of significance were picked at α = 0.05 and the null hypothesis were true, that is, there was no tumor change, approximately 5 of 100 voxels would test as positive, that is, falsely changed, simply because we applied the test 100 times to the null Gaussian distribution. Good descriptions of possible correction methods (Bonferonni, family-wise error rate, false discovery rate, etc.) for multiple comparison tests are presented in Wiens [

42], Perneger [

43], and Genovese et al. [

44]. A correction must be applied wherever multiple comparisons occur for the stated

*P* values to be meaningful, whether related to perfusion diffusion or other metrics.

**Diffusion**:*In vivo* assessment of organ system and tumor apparent diffusion coefficient (ADC) measures is available using MR diffusion-weighted imaging (DWI or DW-MRI). The formula for computing ADC in the direction of the diffusion gradient is

where

*S*_{h} and

*S*_{l} are the high and low signal amplitudes of the isotropic DW images corresponding to the use of

*b*_{h} and

*b*_{l}, respectively, where the high and low

*b*_{x} values are a function of the applied amplitude of the diffusion sensitizing gradient pulses, as well as the temporal duration and temporal separation of the pulses. Owing to the dipole nature of the coils used to apply the diffusion gradients, the results are anisotropic for all but the case where

*b*_{l} = 0, which is isotropic because there is no diffusion gradient applied. Many gradient directions can be acquired [

45] subject to scan time constraints to improve the resulting SNR. Singular value decomposition of the amplitude response of all these components at each voxel yields the amplitude response for each of three principal axes, that is, the eigenvalues (λ

_{1}, λ

_{2}, λ

_{3}). The singular value decomposition result is the complete summary of all excitations at each voxel, but the fractional anisotropy (FA) is a normalized scalar that is commonly used to characterize the variation in the eigenvalues for each voxel. FA is expressed as

where

*M* is the vector magnitude, that is,

and λ is the mean of the eigenvalues, that is, λ = (λ

_{1} + λ

_{2} + λ

_{3}) / 3. Note that FA varies between the limits of 0 and 1; 0 is obtained for the isotropic diffusion case (as in a pure cyst where λ

_{1} = λ

_{2} = λ

_{3} = λ), whereas FA = 1 is obtained in the most anisotropic condition, that is, only one of the three principal axis magnitudes is nonzero (approximated by a straight segment of white matter tract in brain).

Alternative outcomes analyses Instead of using manually drawn multiple VOIs as the only means to approximate following spatial changes in the same tumor across interval examinations, registration of the interval data sets can be implemented to reduce the increased variance associated with manual misplacement of VOIs drawn on interval examinations. More importantly depending on the accuracy of its implementation, registration is capable of supporting voxel-by-voxel change analysis. After registration, a single VOI may be used on registered interval examinations to limit voxel-by-voxel analysis to, or generate summary statistics from, registered differences considered important by the investigator. Such registered differences may come from the same enhancing region as used to define a VOI on a registered T1-postGad series that has been mapped, that is, warped to, the series (one or more) of analytical interest. Summary statistics, typically mean and variance, can be compiled

- from the one-dimensional histograms of values from the VOI registered onto the pretherapy and posttherapy examinations and compared for statistically significant changes, or
- from two one-dimensional histograms of values from the VOI registered onto two pretherapy baseline examinations to sample baseline noise; see [46,47] in as examples. The experiment described in these references assesses the repeatability of measurements across patients for which we expect no difference, for example, between two baseline examinations acquired during a short interval. Bland and Altman [48] first described an appropriate method of making this assessment in their study based on plotting the difference between a pair of measurements on the same subject
*versus* the mean of the two measurements. The 95% confidence interval for these differences is the definition of repeatability, that is, ±2σ or two times the SD of the differences, that is, repeatability improves as the SD and thus variance decrease. Importantly, the same study of Bland and Altman also describes in a similar fashion how to measure and characterize agreement between two methods, an assessment often mistakenly attempted through correlation or regression. - Further, the voxel-by-voxel analysis can consist of a two-dimensional, co-occurrence plot of the registration-paired voxel values, or its joint density histogram constructed by summing the number of co-occurrences in bins. This is just the usual
*t*-test with the exception that the distribution is now two-dimensional instead of the usual one-dimensional distribution that we commonly use. The somewhat hidden issue here is what are the DOF of the estimate of the mean, that is, how many independent samples contributed to its computation? Typically, in acquisition of functional MRIs, the number of acquired data points (whose independence is also based on slice profiles) in *k*-space is zero padded and interpolated up to some desired array display matrix size several times larger than the actual data acquisition matrix. These data acquisition parameters can be gleaned only by careful reading of the data's DICOM header; but even then, additional vendor signal-processing specifics may remain hidden in vendor-specific encoded DICOM header regions. Clearly, when the data have been interpolated, either by the MR vendor or in the process of registration, the DOFs of the estimate of the mean are only indirectly related to the number of data samples. The multipliers associated with vendors' signal processing and user's interpolation associated with registration for voxel-by-voxel analysis must be used to correct DOF [49].Summary statistics for the voxel-by-voxel analysis may be expressed by one or more of the following metrics:

- (
*a*) displacement of the joint mean relative to the covariance of the mean's null distribution, which, for this case, is tested for significance by the multivariate version of Student's *t*-test also known as Hotelling's *T* test, - (
*b*) Kullback-Leibler (KL)-directed “distance” [50] between the treatment effect and null distributions (this metric is sensitive to any differences between the two distributions), or - (
*c*) percent change (%change) of tumor voxels that have a significant change in perfusion above a threshold, for example, the two-tailed 95th percentile determined from the null distribution.In (

*b*) above, the KL-directed distance metric is defined as the log-weighted, average distance from distribution

*p*_{1}(

*a*) to distribution

*p*_{2}(

*a*), that is,

Note that this definition is sensitive to small differences in the two distributions wherever they occur but is weighted to be more sensitive near the mode of

*p*_{1}(

*a*). In clinical applications,

*p*_{1}(

*a*) could be that of the treatment effect and then

*p*_{2}(

*a*) could be the null distribution. Note that this measure is not intrinsically symmetric, that is, the “distance” from

*p*_{1}(

*a*) to

*p*_{2}(

*a*) is typically not the same as from

*p*_{2}(

*a*) to

*p*_{1}(

*a*) when we exchange 1's and 2's in the definition but can be made symmetric by taking the average of both directed distances. The KL-directed distance metric can obviously be applied to any number of variables, for example, the univariate version as shown in the definition above assumes the variable

*a* is a scalar, but

*a* could be a vector as well.

In (

*c*) above, under the assumption that the VOI encompasses primarily the tumor, estimates of percent change are accomplished by applying a threshold to the treatment effect distribution where the threshold's parametric value is defined by selecting a percentile on the null distribution, for example, the 97.5th percentile to minimize false-positives. By measuring the percentage of the treatment effect above that threshold and subtracting the percentage expected for the null distribution, for example, 2.5% for the 97.5th percentile suggested above, the percent of voxels that have significantly changed in the tumor can be reported and their spatially coherent loci in the tumor demonstrated [

51–54]. Further, given the current chaos in attempted change analysis generated from perfusion data, there is some hope from recent results [

55] that suggest that the voxel-by-voxel analysis may more accurately support detection of change in heterogeneous tumors (such as glioblastoma multiforme) than simple, mean histogram VOI analysis applied in current practice.