|Home | About | Journals | Submit | Contact Us | Français|
To summarize the findings of anthropomorphic proton phantom irradiations analyzed by the Imaging and Radiation Oncology Core (IROC) Houston.
103 phantoms were irradiated by proton therapy centers participating in clinical trials. The anthropomorphic phantoms simulated heterogeneous anatomy of a head, liver, lung, prostate, and spine. Treatment plans included those for scattered, uniform scanning, and pencil beam scanning beam delivery modalities using five different treatment planning systems. For every phantom irradiation, point doses and planar doses were measured using TLD and film, respectively. The difference between measured and planned dose was studied as a function of phantom, beam delivery modality, motion, repeat attempt, treatment planning system, and date of irradiation.
The phantom pass rate (overall 79%) is high for simple phantoms and lower for phantoms that introduce higher levels of difficulty, such as motion, multiple targets, or increased heterogeneity. All treatment planning systems overestimated dose to the target, as compared to TLD measurements. Errors in range calculation resulted in several failed phantoms. There was no correlation between TPS and pass rate. The pass rates for each individual phantom are not improving over time, but when individual institutions received feedback about failed phantom irradiations, pass rates did improve.
The proton phantom pass rates are not as high as desired and emphasize potential deficiencies in proton therapy planning and/or delivery. There are many areas for improvement with the proton phantom irradiations, such as TPS dose agreement, range calculations, accounting for motion, and the irradiation of multiple targets.
The increase in use of proton therapy as a treatment modality for cancer has grown to nearly 20 clinically active in the USA. This growth in availability, and success of several single-institution phase II clinical trials using protons (1–3) have led to its use in the National Cancer Institute’s (NCI’s) clinical trials. Clinical trial groups sponsored by the NCI, including the Children’s Oncology Group and NRG Oncology, have opened and are currently developing phase III trials that either allow proton therapy or compare its therapeutic outcomes to other treatment arms.
As with all multi-institutional clinical trials, comparability and consistency of treatments is crucial for the trial to have meaningful outcomes not obscured by uncertainty and variability. The mission of the Imaging and Radiation Oncology Core (IROC) cooperative, specifically the IROC Houston QA Center, is to provide quality assurance programs to support NCI’s National Clinical Trial Network, assuring high quality data for clinical trials. The NCI’s guidelines for the use of proton radiation therapy in NCI-Sponsored clinical trials, are intended to ensure that proton therapy treatments and data are per protocol (4). These guidelines specify the approval processes for proton therapy site participation, credentialing techniques, protocol design, and dose specifications. Over the past five years, IROC Houston has implemented a QA program that ensures proton institutions meet the NCI Guidelines, thereby ensuring a level of consistency and comparability between proton centers allowing the pooling of patient data from approved/credentialed centers.
Every proton center wishing to participate in NCI-funded clinical trials must complete a baseline approval process and credentialing through IROC Houston before being allowed to enroll patients on trials. The approval and credentialing steps include a facility questionnaire, remote output check, on-site visit, and anthropomorphic phantom irradiations. For phantom irradiation audits, IROC Houston currently uses five different anthropomorphic phantoms, simulating five distinct anatomical sites. These phantoms provide assurance that a proton center is fully capable to identify, treat and deliver accurately the intended proton dose distribution, especially with the introduction of Intensity Modulated Proton Therapy (IMPT) as a treatment modality.
IROC Houston has processed 103 phantom irradiations from 17 proton therapy centers. Although these phantoms have relatively loose passing criteria (5–7% point dose agreement and >80–85% of pixels passing 5–7%/4–5mm gamma criteria), many irradiations do not show acceptable agreement between the intended and delivered dose distributions. This has also been documented in photon therapy, where disagreements have highlighted several issues (5–7). Phantom data are presented here to detail and explore the results seen for proton therapy.
IROC Houston’s five anthropomorphic phantoms for proton therapy approval/credentialing include the head, liver, lung, prostate, and pediatric spine (Figure 1). These phantoms are end-to-end tests of the simulation, treatment planning, image guidance, and treatment delivery process. The phantoms have been designed to mimic human anatomy and are modeled after tumor disease sites that are common in clinical trials. For proton therapy, with a steep dose gradient at the end of range, it is important to include phantom heterogeneities that test the treatment planning system’s ability to accurately model range. The phantoms are overwhelmingly made out of proton-equivalent plastics, that is, plastics that, when simulated with a CT scanner, provide tissue/lung/bone-equivalent relative linear stopping powers (RLSP). These materials all fall on a clinical HU-RLSP conversion curve, and do not require overrides from the treatment planning system (8). All phantoms have an imaging insert (that defines the target for planning purposes). The insert may also include dosimeters, or the dosimeters may be housed in a separate dosimetry insert that is spatially registered with the target.
The proton head phantom is made of RANDO® water-equivalent plastic (The Phantom Laboratory, Salem, NY; RLSP=1.00) with an embedded human skull (Figure 1a). The imaging insert has a spherical target (2 cm diameter) that mimics a meningioma that can be visualized with both CT and MRI. The dosimetry insert is solid polyethylene (RLSP=1.00) that contains radiochromic film in the coronal and sagittal planes and two TLD capsules in the target.
The proton liver phantom is a water-filled phantom with a solid blue water (Standard Imaging, Madison, WI; RLSP=1.07) insert containing two non-coplanar targets: one spherical (3 cm diameter) and one oblong (2 cm diameter, 2.5 cm long) (Figure 1b). The insert also contains film in the coronal and sagittal planes and two TLD capsules in each target. There are two lateral normal tissue structures that are designated as organs-at-risk (OARs). The phantom can be irradiated on a moving platform depending on the institution’s respiratory motion management process. The motion amplitude is 1 cm. Institutions can choose to treat this phantom with a single isocenter or multiple isocenters.
The proton lung phantom is a Solid Water® (Gammex, Middleton, WI) phantom simulating a single lung (Figure 1c). The phantom has high density rib structures (RLSP=1.30) inlaid in the Solid Water®. The lung is compressed cork (RLSP=0.28) and the moveable lung dosimetry insert is low density balsa wood (RLSP=0.31) with an oblong target (3 cm diameter, 5 cm long). The insert contains film in all three planes and two TLD capsules in the target. The lung insert can move (independently of the outer shell) simulating different respiratory motion management processes. The motion amplitude is 2 cm.
The proton prostate phantom is a water-filled phantom with an imaging insert with structures simulating a prostate, bladder and rectum (Figure 1d). The prostate is a spherical structure (5 cm diameter). The phantom also contains two femoral head structures. The solid polystyrene (RLSP=1.02) dosimetry insert contains film in the coronal and sagittal planes and two TLD capsules in the target.
The proton spine phantom is a solid phantom with actual vertebral bones (Figure 1e). There are two radiochromic films located in the sagittal and coronal planes along with two TLD capsules in the vertebral structures.
TLD-100 LiF:Mg,Ti dosimeters were used in each phantom to measure point doses, which have a dosimetric precision of 3% (9, 10). A ratio of the TLD dose to the TPS calculated dose is calculated; all TLD in the PTV must agree within the established criteria (Table 1). GafChromic EBT2 film is used to measure planar dose distributions, and offers spatial agreement to within 1mm (9). The film planes are generally rotated 3°–5° from normal to avoid streaming of protons that occurs if the beam is parallel to the film (11). Typically, a gamma analysis is performed comparing the measured film plane to the corresponding dose plane from the TPS. Criteria for each phantom is presented in Table 1. The exception is the spine phantom where a distance-to-agreement criteria is used to compare the measured-to-calculated distance-to-agreement in the field junction region and penumbra. The point dose and planar criteria differ slightly by phantom, based on the heterogeneities present, whether the phantom is moving, and the proximity and complexity of the OAR structures.
IROC Houston performs extensive commissioning and routine QA tests of these phantoms. Each phantom is commissioned with several controlled test irradiations, checked each time they are returned to IROC Houston to verify the physical integrity (cracks, leaks, or broken components), and motion platforms are tested before being shipped to an institution. We expect that each institution performing the end-to-end test will follow the provided instructions and their clinical workflow to deliver the highest quality treatment; failure to do this could result in an increased risk of suboptimal proton therapy delivery.
Of the 103 phantoms processed, three phantoms were excluded from analysis due to non-clinically relevant setup errors (e.g. physicist not loading dosimetry insert properly, or motion platform malfunctioning). The remaining 100 phantoms include all three beam delivery techniques: scattered (n=31), uniform scanning (n=33), and pencil beam scanning (PBS; n=36). There have been five different treatment planning systems (TPSs) used for phantom irradiations: Eclipse (Varian Medical Systems, Palo Alto, CA), RayStation (RaySearch, Stockholm, Sweden), CMS XiO (Elekta, Stockholm, Sweden), and two in-house treatment planning systems. Most of the current TPSs use pencil beam algorithms. While some planning systems are beginning to implement Monte Carlo-based proton planning, there has only been one phantom irradiation performed with a Monte Carlo algorithm. This irradiation was included with the in-house TPS analysis.
The pass rates vary widely by phantom type, as shown in Table 1, and these differences were statistically significant (p=0.010, Chi-Square test). The proton head phantom has a 100% pass rate. It is the simplest phantom, with a small, spherical target and no OARs; it is also static, and it tends to be easier to strike a nonmoving target. The liver phantom has the lowest pass rate at 38% and is the most challenging phantom. It is irradiated on a moving platform and contains two targets and several organs at risk to avoid. The overall pass rate of all phantoms, 79%, is well below a desired performance, particularly given the relatively wide acceptance criteria.
The TLD results (Figure 2) and gamma results (Figure 3) elucidate the pass rates of each phantom. The mean TLD ratios for all phantoms were statistically different from unity (p≤0.028, t-test). This means that the TPSs are overestimating the dose that is actually delivered to the target, in a variety of different heterogeneous tissue-equivalent media. The head phantom showed the best TLD agreement, with the mean ratio of TLD to TPS measuring 0.99 (σ 0.03). The liver, lung and prostate showed statistically significantly lower mean TLD values (p<0.001, ANOVA). The liver phantom had the worst TLD agreement, with the mean ratio of TLD to TPS measuring 0.95 (σ 0.02). The lung phantom showed the second worst TLD agreement, 0.96 (σ 0.02). This low result for the lung phantom was comparable to the TLD results for the photon lung phantom, which showed a systematic overestimate of dose in the PTV by photon algorithms as well, on the order of 3–5% for convolution superposition, anisotropic analytic, and pencil beam algorithms (7).
The mean percent of pixels passing gamma criteria for all phantoms (Figure 3) were statistically different between phantom types (p<0.001, Kruskal-Wallis). Similar to the TLD results, the head phantom showed the best gamma agreement, with a mean pixel pass rate of 96% (σ 4.7). The lung and liver phantoms had the worst gamma agreement, 84% (σ 14.1 and 14.9 respectively). As shown in Table 1, a majority of phantom failures (19 out of 21) resulted from institutions failing the film criteria (gamma or distance-to-agreement). Only two phantom failures resulted from both the TLD and the film failing criteria. For the lung phantom, most of the institutions that failed (n=3) had a large shift in the superior/inferior direction that caused the film to fail the gamma analysis. These shifts were caused by a failure to properly account for the position or motion cycle of the target in the lung phantom. For example, if an institution imaged the phantom at the exhale phase but used alignment intended for average target position, the phantom irradiation would be offset in the superior/inferior direction.
For the past four years, the cumulative pass rate averaged over all phantoms has stayed between 75% and 80%. Figure 4 shows the cumulative pass rate of each phantom. A per annum pass rate is not presented in the figure, as some years had too few or no phantom irradiations. Overall, the per annum pass rate has fluctuated with the introduction of more challenging phantoms, such as those with multiple targets (liver) and motion (lung and liver). The lung phantom was originally irradiated without motion. When motion was introduced in 2013, we observed a drop in the pass rate, likely related to the increased difficulty of irradiating a moving target. The liver phantom showed a precipitous decline in cumulative pass rate, although this largely reflects a challenging phantom with few irradiations in the first year. Of note, no phantom has shown a meaningful improvement with time (R2<0.6, Regression Analysis).
We reviewed the pass rates of first irradiations compared to repeat irradiations, excluding the head phantom since it did not have any repeats. The repeat irradiations (n=21) had a higher pass rate, 90%, than first-time irradiations, with a pass rate of 69% (n=61) and the difference was statistically significant (p=0.050, Chi-Square test). When an institution fails a phantom, the IROC physicists provide feedback to the institution to help root out errors.
Proton phantoms irradiated with motion have higher failure rates than static phantoms. The pass rate for all static phantoms was 85% (n=73), while the pass rate for all moving phantoms was 63% (n=27). These pass rates were statistically significantly different (p=0.017, Chi-Square test). The pass rates for ITV (n=19) and breathhold (n=8) techniques were the same: 63%. The lung and liver phantoms, which are the only moving phantoms, have the additional challenges of many heterogeneities (lung) and multiple targets (liver). The motion, therefore, coincides with increasing complexity.
The pass rates were evaluated across proton beam delivery modality. While the results were not identical across every phantom, there was a general trend in that phantoms irradiated with scattered (n=31) and uniform scanning (n=33) beam delivery had comparable pass rates (74% and 73%, respectively) that were lower than phantoms irradiated with pencil beam scanning (n=36, 89%). This difference was not statistically significant (p=0.189, Chi-Square test), but warrants further study as more phantom results are accrued and higher statistical power can be achieved because pencil beam scanning is often considered more complex, and one might assume that it would show a lower phantom pass rate. However, most institutions that irradiated phantoms started treating patients with either scattered or uniform scanning beams, and then upgraded or added pencil beam scanning after routine treatment practices were well established. Therefore many of the institutions who irradiated phantoms with PBS had more clinical experience, as well as previous radiotherapy feedback from prior phantom experience.
An analysis of the pass rate vs. TPS for all phantoms showed no statistically significant difference between the planning systems (p=0.162, Chi-Square test). In order to look at the planned dose compared to the delivered dose, the analysis was broken up by phantom because each phantom has unique heterogeneities. The results are presented in Table 2. While some of the results were statistically significant, there was not consistent trend of one TPS performing superiorly to the other TPSs. For example, the analysis for the prostate phantom showed statistically significant differences for both TLD (p=0.006, ANOVA) and film (p=0.048, Kruskal-Wallis) among the TPSs, but while the in-house TPSs showed the best agreement with TLD, they showed the worst agreement for film. With many failed prostate phantom irradiations, the cause was determined to be a shift in phantom alignment (n=5) or an error in the range calculation (n=4). These errors are evident in the gamma analysis of the films, but may not affect the TLD doses, explaining why the TLD agreement does not necessarily correspond to pass rate for the prostate phantom. Accrual of more data will provide more insight into these results, as the research was underpowered for detailed analysis of causes of phantom failures.
With new delivery methods and more challenging targets, we expect an initial period of low pass rates as institutions learn the limitations of their radiation delivery system. This was seen for IMRT with the head and neck phantom, as initial pass rates were only 65% but are currently around 85% (6). We have not yet observed an upward trend in pass rates for proton therapy phantoms. Lower pass rates were particularly apparent in phantoms with motion and/or multiple targets. Correspondingly, it is of great importance that proton institutions review their methods of managing these sorts of cases.
For photon phantom irradiations, most phantom failures are caused by an institution failing to meet the TLD criteria (6). For proton phantoms, the opposite is true; most phantom failures are cause by institutions failing to meet the gamma criteria (Table 1). In other words, proton irradiations are delivering acceptable dose to the center of the target, but the edge of the treatment field is not where it is calculated to be. Errors in range calculations have played a role in phantom failures. One example was seen with the prostate phantom where it was observed that the failing institution’s film showed a discrepancy with the TPS in target coverage in the range direction. Upon further investigation, it was discovered that the CT Number to RLSP conversion curve that the institution was using had an unusual shape near water. When the institutions re-ran their stoichiometric calculations, a mathematical error was discovered and fixed. The institution resubmitted the treatment plan using the new conversion curve and it matched much better with the delivered dose distribution and passed the phantom criteria.
Another range error was observed in the lung phantom. For this irradiation, it was determined that a contour of the body used for dose calculation clipped part of the phantom, and thus the treatment plan calculated a deeper beam penetration than was observed on the film. When the contour was extended to include the whole phantom, the TPS comparison with the film passed the analysis criteria.
Yet another range error was observed when an institution overrode the RLSP value of one of the phantom materials, but instead of correcting to RLSP, they assigned the RLSP as the physical density. For many of these phantom materials, and presumably, patient implant materials, the density is not exactly the same as the RLSP. Physicists are advised to use great caution when overriding areas of the CT scan during treatment planning, particularly when the TPS only allows input of density and not direct input of RLSP.
These examples demonstrate the process of identifying a problem through the phantom irradiation, providing feedback to the institution, and seeing improvements in delivery upon re-irradiation of the phantom.
The overestimation of dose by all planning systems was somewhat surprising. The phantoms all contain proton-equivalent heterogeneities, but each phantom has unique materials and structures. Once enough phantoms have been irradiated with Monte Carlo-based algorithms, the mean TLD doses will be compared to determine if there is a significant improvement in the dose calculation. Preliminary studies have shown that Monte Carlo based algorithms do predict lower doses for proton beams (12, 13), which may agree better with the phantom measurements. Monte Carlo-based planning may also improve the agreement at the edge of the treatment field in the shoulder region of the dose profiles.
The proton phantom irradiations emphasize deficiencies in the agreement between measured and delivered doses in clinical conditions, particularly in challenging cases involving multiple targets and/or motion. The field of radiation oncology should continue to work to improve pass rates. This can be done by improving motion management and phantom setup, improving the dose agreement between treatment planning systems and delivered dose, and improving range calculations.
We compiled data from 100 irradiations performed on IROC Houston’s anthropomorphic proton phantoms: the head, liver, lung, prostate, and spine. The overall pass rate is 79%. Phantom failures correspond to increased difficulty of the phantom (motion, multiple targets), as well as range and dose calculation errors.
The IROC Houston QA Center’s proton radiation therapy efforts were supported by the Federal Share of program income earned by Massachusetts General Hospital Proton Therapy Research and Treatment Center on C06 CA059267, and PHS grants CA081647, CA10953, CA180803 awarded by the NCI, DHHS.
Conflict of interest: None
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.