For many clinical applications, single b-value DW-MRI at relatively high diffusion weighting offers exceptional sensitivity to detect disease (this is evident from experiences in the brain for the early detection of stroke (on high-b value images) and in the liver for the detection of lesions using “black blood” low-b value images). However, image signal analysis from single b-value images is inadequate for even rudimentary quantitative analysis of water mobility in tissue.
Multiple b values are necessary to calculate the ADC. At least two b values are needed for basic ADC calculations and this can be done on most clinical systems. Implicit in two b-value ADC calculations is the application of a monoexponential decay model. That is, an ADC map is generated by the natural logarithm of the ratio of low-b value over high-b value image, scaled by inverse of the b-value difference. Albeit overly simple, this two-point method is adequate in instances where multiexponential features are negligible over the acquired b-value range. Moreover, the adoption of this basic analysis has led to reasonable agreement across centers and MRI vendors for ADC quantification of the human brain (b-value range 0–1000 sec/mm2). Many observers believe this may be sufficient for clinical usage, although it is likely that physical measurement of water diffusion in tissue is more complex than described by the monoexponential decay model.
In highly vascular tissue, blood flow/perfusion may impart significant signal attenuation over the low b-value range (from b = 0 to b = 100 sec/mm2), which artificially inflates diffusion estimates. As described above, nonzero lower b values should be used to eliminate vascular contributions to the calculated ADC. The minimum b value threshold to suppress perfusion effects will depend on the vascular properties of tissues, although for most applications, a lower b value of 100 to 150 sec/mm2 is probably adequate.
It is recommended to also continue to acquire the nominal “b = 0” image to provide anatomic information and to maintain consistency with prior work. Usually, the b = 0 image can be obtained at nearly no cost in scan times using single-shot techniques, particularly because acquisition along three-orthogonal axes is not performed for the b = 0 weighting.
For applications where DW-MRI is acquired over larger ranges of diffusion sensitivities, and assuming perfusion effects have been effectively removed by the proper choice of the lower b
value, simple monoexponential models may not adequately characterize the decay curve. Usually, evidence of true multiexponential features (not related to perfusion effects) requires substantially higher b
values (e.g., 2000–6000 sec/mm2
), much greater than is typically acquired in clinical studies owing to practical SNR limits. Proper analysis of these data types requires multiexponential models where signal decays are modeled as weighted sums of two or more exponentials (provided that the signals at the highest b
value are above the noise level) [10,11
] or alternative models such as stretched exponentials that allow a distribution of diffusion coefficients in each voxel [12
]. As with other curve fitting challenges, reliability to accurately isolate multiple decay coefficients depends on the difference between the true Dfast and Dslow, SNR, b
-value range, and number of b
values acquired. Rejection of low SNR pixels and/or incorporation of SNR weights in the multiexponential fitting routine should be used to mitigate fitting errors. An unfortunate tradeoff in acquisition of DW-MRI over many b
values and/or averaging to increase SNR to support multiexponential diffusion analysis is the commensurate increases in scan times which may not be practical in many clinical settings.
Diffusion in some tissues is known to be directionally dependent, that is, anisotropic (e.g., in the central nervous system and in muscle). If it is known a priori
that the tissue of interest is isotropic (e.g., most tumor models) then a single gradient direction is usually sufficient to properly document diffusion properties. In general, however, it is safer to assume the lesion of interest and its surrounding tissues may have directional dependencies so it is best to measure water mobility along at least three orthogonal diffusion gradient directions yielding, say, ADCx
, and ADCz
. The simple average of these into a mean diffusivity value effectively removes confounding influences of the relative orientation between tissue and the imaging system. This mean diffusivity bears the same desirable rotational independence as the trace of the full diffusion tensor without having to acquire or process DTI [70
If further information is specifically desired regarding the strength and spatial patterns of anisotropy, at least six gradient directions are required to generate the full diffusion tensor, although additional gradient directions (9–32 commonly) generally improve the quality of the tensor analysis results.
Most MRI vendors offer the option to acquire and process DTI scans in a reasonably efficient manner. Intrascan image registration should be applied if there are systematic shifts and image distortions at various gradient directions before tensor analysis. Multiple indices are available to quantify the degree of anisotropy (e.g., FA, relative anisotropy [70
]). In addition, the direction of the strength of anisotropy can be color-encoded using the principal eigenvector of the diffusion tensor. Furthermore, the connectivity of anisotropic domains can be represented in tractography, which allows visualization of tissue fiber tracts in three dimensions [56
]. As suggested above, diffusion anisotropy is relatively strong in the CNS. Outside the CNS with the exception of the kidney and muscle, however, anisotropy is rather modest, and therefore, most tumor analyses have been directed toward isotropic diffusivity indices (i.e., ADC calculations).
Regions of interest definitions
To study diffusion properties of tumor, proper delineation of lesion boundaries must be identified for subsequent quantification. Ideally, the region of interest (ROI) is contoured around lesions using images with the highest contrast between lesion and normal tissue. Subjective placement of smaller ROIs within lesions is not recommended particularly for response assessment studies.
Traditional high-contrast anatomic images, such as T2-weighted and contrast medium-enhanced T1-weighted, which are independent of the DW-MRI sequences are preferred, but translation of ROIs to the DW-MR image set is then required. Transferal of such ROIs to the DW-MRI data set requires image registration unless prescription of the traditional and DW-MRI scans was identical (ignoring for the moment other systematic distortions). In some instances, the DW images themselves can offer strong lesion/tissue contrast, in which case these are sufficient for ROI definition. Ideally, the b0 T2-weighted image (or a very low b value image) should be used, although, occasionally, higher b-value images may have to be used.
There is debate as to which b-value image best delineates tumor from normal tissue/necrotic tissues. When ROIs are drawn on high-b value images for the estimation of ADC values, such ROIs are said to represent “viable tumor” because the detrimental effects of necrosis are ameliorated. However, such a method for defining ROIs is occasionally prone to error because of T2-shine through effects. Furthermore, in the presence of necrosis/cystic structures, lesion extent maybe underestimated. It is also important to remember that well-differentiated tumors may not be seen on high-value images.Whatever the method used to define ROIs, a standard, recorded strategy should be applied to ensure consistency within any given study.
In the ADC calculation methods described above, low SNR pixel values should be eliminated before the ADC map calculation, and these pixels should be flagged as “not-a-number” for exclusion. However, elimination of low SNR pixels for ADC calculations and/or using high-b value images for ROI definitions can be problematic when evaluating therapeutics effects of some drugs. For example, chemotherapy for teratoma can cause a poorly differentiated tumor to become well differentiated (a favorable outcome measure) and some drugs/therapies induce necrosis and cystic degeneration. In both cases, ROIs placed solely on areas of “viable tumor, however, defined” would underestimate/mask favorable therapeutic effects. Pixel counting of zero ADC values before and after being induced by therapy would be a way of dealing effectively with these issues.
Conservative ROI definitions would only include apparently viable tumor based on robust Gd contrast enhancement on T1-weighted images. A more generous tumor extent would include contrast-enhanced and hyperintense tissues on T2-weighted images. However, inclusion of necrotic and cystic zones can include extremes in water ADC values, which may adversely bias image analysis. It is important that standardized software be developed in which criteria of undesirable tissue be clearly defined and that individual subjective decision making by observers is kept to a minimum. Different scenarios may be adopted to exclude these nonviable tissue regions. It should be kept in mind that a particular treatment might induce more nonviable voxels, which, if eliminated from analysis, would falsely reduce the apparent impact on ADC.
The entire three-dimensional volume of interest (VOI), a composite of ROIs over multiple slices, of the lesion should be delineated particularly if the tumor is being followed over time.
Volume of interest analyses methods fall into three general areas: whole-tumor summary statistics, histogram, and voxel-wise analyses.
- Whole-tumor summary statistics (e.g., mean and median) are a common method for reduction of the tumor into a single quantity. The advantage of this technique is its simplicity, although it fails to fully address the important issue of tumor heterogeneity.
- The histogram-based approach subclassifies different tumor diffusion environments . For example, the volume of the tumor within a specified range of the diffusion histogram has been investigated as an approach to document tumor evolution in response to treatment. It is vital that consistent “binning” procedures be used to analyze data across a multi-institutional trial because errors in histogram comparisons have occurred where this was not standardized . When using histograms, redistributions of high and low values may occur, and this may not be reflected in mean changes. This occurs because histograms cannot depict the spatial information as to the origin of changes, which may be crucial to detect the evolution of diffusion changes within lesions. Ideally, advanced histogram analysis techniques should be used so that a single scalar value can be used for developing response criteria on DW-MRI; an example approach is principal component analysis .
- Ideally, to track the spatial origin of changes induced by treatments, it is necessary to have spatial tags to accurately monitor the change in diffusivity [55,72]. The retention of spatial information requires voxel-wise approaches incorporating registration of image data sets, typically between treatment interval examinations. This method seems to work well in the brain and in bones but is less likely to be applicable to whole-body measurements owing to problems associated with image registration and changes in lesion sizes with therapy [55,73]. This approach enables ADC in individual tumor voxels to be followed and so enables the depiction of the fractions exhibiting a significant change (increase or decrease) in ADC. The spatial location of ADC changes within the tumor can be made available to potentially guide spatially directed therapies.
Choice of monoexponential versus multiexponential modeling of signal decay with b value depends on features apparent in the data, SNR, number, and range of acquired b values.
Data typically obtained in most clinical applications for b-value ranges of 100 to 1000 sec/mm2 are reasonably well modeled using monoexponential decay fits.
Tumor ROI/VOI definitions may be done on traditional high-contrast images such as T2-weighted or T1-weighted contrast-enhanced images. High-image contrast, high-b value DW images can also be used.
Descriptions of diffusion properties within lesions or tissues of interest may be reported at several levels classified as follows: 1) traditional summary statistic over the entire ROI/VOI; 2) histogram analysis, which allows segmentation of the tissue based on diffusion properties; and 3) voxel-by-voxel analyses where spatial information is retained over interval examinations such that fractional volume of tissue exhibiting change in diffusion properties is measurable. However, the latter requires methods of tracking individual voxels over time.
Correlations with End Points
The determination of outcome measures or end points must be dictated by the nature of the question being address (clinical, biologic, physical, or pharmaceutical). For instance, if the purpose of a trial is to determine whether DW-MRI can characterize the biologic aggressiveness of a tumor, then ADC values need to be correlated with recognized measurements of aggressiveness. This could include tumor grade, time to progression, progression-free survival, or overall survival. However, if the goal is to determine whether DW-MRI is an early marker of treatment success, then intermediate end points such as pathologic response could be used. Potentially, DW-MRI results could be compared with other biomarker changes such as serum markers of cancer (e.g., carcino embryonic antigen, prostate-specific antigen, etc.), RECIST, and WHO measurements of tumor size. However, firmer and more robust end points reflecting therapy efficacy in patient outcomes are preferred where possible, such as time to progression, progression-free survival, and overall survival.
If DW-MRI is being evaluated as an early biomarker of therapy response then the timing of follow-up studies should be such that DW-MRI is acquired before changes in size are expected to occur. Intermediate time points may become influenced by necrosis and liquefaction, and DW-MRI may become less useful. Long-term data points may “normalize” because liquefactive necrosis resolves and the residual mass contains fibrotic dehydrated tissues.
For response assessment studies, it is important to have predetermined whether DW-MRI changes are expected to occur, the magnitude/direction of the likely change, and the timeline as to when and for how long changes are expected to last. Animal validation studies before human studies may provide information on the appropriateness of using DW-MRI and on the optimal timing for doing imaging in human studies.
Diffusion experiments generate large numbers of magnitude b-value images. When these are combined with morphologic images, many hundreds of images are produced, which need to be reduced for diagnostic interpretations.
The most valuable images required for interpretation are high-b value images and ADC maps, which should always be evaluated with morphologic imaging. Because high-b value DW-MR images have high background suppression, tumor localization is usually straightforward. However, very high signal on high b value may also be due to T2-shine through effects; conversely, liquefaction or necrosis can result in an underestimation of lesion extent, so comparisons with anatomic images are important.
Although no color scales are especially suited for the display of high-b value magnitude images, convention has it that “inverted grayscale” be used (ADC maps however, are better displayed using conventional grayscale). Indeed, whole-body DW imaging with background suppression can produce images that superficially resemble FDG-PET scans (). This is because of the high contrast on high-b value images, which, when used with three-dimensional displays, are amenable to multiplanar reconstructions and three-dimensional renderings (maximum-intensity projections, surface shaded display, volume renderings).
Figure 5 Metastatic renal cancer. Whole-body DW image demonstrates (left to right). Coronal computed tomographic image, coronal T1-weighted MRI at the same plane of section, DW imaging (b800) demonstrating left chest wall metastasis and the fused ADC and T1-weighted (more ...)
A common method of analyzing high-b value images is to use fusion imaging techniques. Modern three-dimensional fusion imaging visualization software works in three steps. (1) Superimposition: data sets do not need to be acquired in the same plane and to have identical FOVs and matrix sizes, but most ADC data sets are aligned and obtained with similar parameters. (2) Alignment: algorithms work with multiple degrees of freedom (translation and rotation) based on anatomic landmarks with the ability to work automatically with manual overrides if necessary. (3) Visualization: blending of grayscale with pseudo color images with adjustable balance between the two superimposed data sets. When blending is used for data display, the level of blending should be kept constant across a study and reported in manuscripts.
Other potential artifacts appearing on fused images include misregistration of anatomic and DW images due to bladder filling and internal organ including movements. Susceptibility artifacts caused by luminal air are exaggerated on high-b value images, although their effects are minimized on ADC maps.
A major challenge to the widespread implementation of DW-MRI is the lack of a standard approach to data collection and analysis. This creates challenges for support of DW-MRI by commercial MRI vendors and makes deployment of DW-MRI techniques limited to sites with significant experimental MRI expertise. Furthermore, the lack of standard approaches impairs validation and makes the ultimate qualification of DW-MRI as a biomarker extremely difficult.
In large part, the lack of standardization is related to the technical challenges in performing DW-MRI acquisitions. In most practical applications of DW-MRI, performing “ideal” data acquisitions is impractical owing to limits in technology and patient compliance.
Approaches that accommodate technical limitations through compromises in acquisition and/or in analysis have been developed to allow the practical implementation of this technique. Examples include reducing the number of b values for modeling of data, reducing spatial resolution, limiting volume of imaging, averaging free breathing studies instead of gating, using empiric analyses (e.g., visual assessments signal intensity of high-b value images), creative acquisition time reducing techniques, and so on.
Standardized data sets should be acquired systematically using “ideal techniques” with great intrinsic redundancy to test the effect of various technical compromises on measuring the signal associated with response. Such data should be made widely available for investigators to test their analytic software. These ideal data sets should be limited to single organs/single treatments starting with the least challenging. Ideally these should be documented, anonymous, and be available on the Web.
Similarly, it would be desirable for research groups to make their analysis methods available either by publication of open code or under specific bilateral agreements. In the longer term, specific standardized software for analysis would be advantageous, but this should not restrict the continual evolution of measurement and analysis approaches.
Standard methods of diffusion assessment should be established and validated against phantoms appropriate to specific body locations, with their measurement reproducibility being established.
Recommendations for standardization
Basic standards for measurements/analysis and reporting of tissue diffusion coefficient should be established and adhered to. They should be tested against relevant phantoms, and reproducibility should be established.
New techniques need to demonstrate specific advantages over existing methods, providing comparison data that defines the benefit.
Studies should include routine measurement and QA analysis.
Standardized data sets need to be made available to allow testing and comparison of analysis approaches.
Research groups should make analysis methods available, either as open source code or by specific agreements where there are confidential or commercial issues.
Standardization of software for analysis would be desirable.
To support the use of DW-MRI parameters in decision making about pharmaceuticals, it is important to link DW-MRI to underlying pathophysiological processes both before and after interventions.
Initially, this should be performed in well-defined model systems and then, where possible, confirmed by clinical measurements using biopsy specimens or surrogate tissues. The link between DW-MRI biomarker change and therapy response should also be established in xenografts and then clinically using both clinical outcome measures as well as pathologic surrogates of outcomes. Ideally, these biologic end points should relate specifically to the mechanism of action of the compound.
Suggested histologic validation of DW-MRI includes exploring links with measurements of proliferation index (Ki 67), cellularity index (cells/high-power field), tumor grade, and apoptosis. It will also be useful to explore/correlate DW-MRI with other MR measures of perfusion (dynamic contrast-enhanced (DCE) MRI, dynamic susceptibility contrast MRI, blood oxygenation level-dependent MRI), arterial spin labeling or metabolism magnetic resonance spectroscopy, and other imaging tests (e.g., FDG-PET, thymidine-PET, or annexin imaging for apoptosis).
Initially, clinical studies should validate practical approaches developed using the standardization guidelines described above, in more generalized applications such as chemotherapy response at varieties of anatomic sites. Neoadjuvant clinical trials are particularly suitable for these purposes because pathologic materials obtained can serve as rapid intermediate readouts/end points. If these are successful then novel therapeutics in early phase 1/2 studies can be evaluated.
Validation of DW-MRI in Relation to End Points
Requires correlation between size and type of biologic effect and relevant DW-MRI parameter, in animal models, supported by clinical biopsy or histology data.
Time course of effects will define the timing of imaging in clinical trials.
Attempt to derive hypothesis-driven relationships between imaging and specific biologic end points.
Biologic end points should relate to the purported mechanism of activity of the compound.
It would be desirable to be able to predict the magnitude of the MR effect based on animal models, allowing trial design to monitor dose-related change.
Reproducibility (See Also Appendix 3
for Detailed Methods)
To allow appropriate study design and to assess the significance of change, centers should demonstrate the reproducibility of their clinical measurements, in a manner that is traceable, providing information on individual and intergroup reproducibility. This information should be combined with evidence of the expected magnitude of therapeutic effect, such that studies can enable assessments of dose-related changes.
Reproducibility assessments are facilitated by incorporating baseline repeated measurements to provide information directly relevant to the body sites chosen. It is important to identify major sources of error leading to nonreproducible results. To determine whether changes in tumors induced by treatments are significant, three factors should be known. These are the natural biologic variability of parameters such as ADC, the variability inherent in the measuring instruments, and knowledge of additional errors induced by appraisers or analysis techniques. This implies that diffusion parameter measurement changes cannot be taken at face value without due consideration of measurement errors. Estimates of measurement errors enable us to decide whether changes in ADCs are “real” for both group and individual observations.
Few published studies have documented measurement error in body DW-MRI, and the major contributors toward errors are not documented. However, from previous studies of other functional imaging techniques (e.g., DCE MRI), it is likely that DW-MRI measurement error will be dependent on a number of factors. These include imaging instrumentation and setup procedures, data acquisition techniques, and the time interval between repeated measurements. Data analysis techniques are also likely to add to measurement error including modeling techniques used (including range and noise of b value images used and implicit assumptions (monoexponential vs biexponential or multiexponential fitting)). Patient-related factors include tumor type, anatomic region being evaluated, and underlying physiologic status of patients.
It is important that clinical trials evaluating DW-MRI responses to treatment assess measurement variability as an intrinsic part of clinical trial design. The measurement error estimate component should be of sufficient statistical power (i.e., on enough patients) and needs to be performed on the study patients or in other patients who are representative of those being examined in the main study. To compare measurement errors of DW-MRI parameters at diverse anatomic sites and pathologies, it is important that similar statistical methods be used and that the meaning and limitations of statistical measures are understood.
Before statistical tests are applied, assumptions intrinsic to reproducibility analysis must be verified (e.g., normality of data and the nature of any relationship between measurement error and the magnitude of the parameters). Appropriate statistical parameters include the within-subject SD and coefficient of variance, and intraclass correlation coefficient should be quoted in communications (as detailed in Appendix 3
). The repeatability statistic is a useful parameter for DCE-MRI studies because it informs on whether changes in a particular patient are significant.
Centers should define reproducibility of data that is traceable, for individuals and intergroup comparisons, allowing the power of studies to be defined prospectively for a defined end point. Where possible, and in the absence of existing reproducibility data specific to the method, two baseline measurements should be incorporated to allow assessment of individual patient reproducibility.
Multiple lesions per organ should be taken into account.
A standardized minimum statistical approach for reproducibility analysis should be reported.
In multicenter trials using identical (preferred) or similar methods (such as maintaining a constant field strength, imaging a single organ, etc.), comparison of precision and accuracy should be determined on phantoms to provide a basis for pooling of data, with account taken of corrections for machine-specific factors, and for sensitivity to motion effects not seen in phantoms.
Site qualification should be undertaken by the performance of measurements validated at a central analysis site before recruitment using standardized data from each site. Readers should refer to Appendix 4
on QA procedures and diffusion phantoms for further details.
Analyses of DW-MRI data in multicenter trials should be performed at a single center using a standardized validated software. The reliability of analyses should be assured using data from each participating center before starting the trial.
In each study, patient and lesion selection, as well as the number of studies per patient including reproducibility assessments, should be defined prospectively. Reproducibility studies should be done at each imaging site because it provides estimates of measurement error in multicenter settings but also serves as a quantitative QA measurement of site performance. Standardized QA procedures should be enforced on all institutions participating to keep the data as uniform as possible.
Every effort should be made to ensure that the study can proceed on a given MR unit even if the unit is upgraded to a higher software level. It is important for the viability of DW-MRI, as a biomarker that implemented DW-MRI methods, be impervious to upgrades and software changes; otherwise, its future as a biomarker is in question.
Robust data acquisition protocols that are able to deal effectively with physiological motions should be instituted and adhered to.
Central data collections should incorporate appropriate QA and quality control procedures. Fast feedback to imaging sites is recommended to minimize data loss due to incomplete or incorrect imaging.The number and causes of failed examinations/analyses should be prospectively recorded. Ideally, failed examination/analysis rates should be <5% to 10%.
Data analysis should use a software that is fit for the purpose, is validated, and is preferably Food and Drug Administration 21 CFR part 11-compliant. 21 CFR Part 11 sets forth the requirements that need to be met to have the Food and Drug Administration consider electronic signatures and records equally trustworthy and just as reliable as handwritten signatures. Validation of software algorithms via multicenter trials is an essential need for obtaining regulatory approval to use DW-MRI as an accepted surrogate biomarker.
To promote the comparisons of ADC values obtained from different centers and for differing therapies and to overcome the dependence of ADC on the range of b values chosen for any particular study, perfusion-insensitive ADC values (by excluding the b = 0 sec/mm2 image from the ADC calculation) should always be quoted.
Additionally, study data should be publicly available to enable alternative analytic approaches that might be superior to the ones used in the study.
Animal validation should be undertaken before human studies to provide information on the appropriateness of using DW-MRI and may be able to indicate the optimal timing for doing imaging in human studies.
Double-baseline studies should be done to provide data about measurement error of imaging specific to the study and thus knowledge of what constitutes a significant change in an individual and in a group of patients (powering studies).
Quantified parameters such as ADC should be measured to derive physiologically meaning that can be related to drug mechanisms of action. Quantified parameters have the advantage of allowing interpatient and intrapatient comparisons to be made. Good quality control and QA are keys to success for multicenter studies.