|Home | About | Journals | Submit | Contact Us | Français|
To quantify the multi-institutional and multi-observer variability of target and organ-at-risk (OAR) delineation for breast-cancer radiotherapy (RT), and its dosimetric impacts, as the first step of a RTOG effort to establish a breast cancer atlas.
Nine radiation oncologists specializing in breast RT from eight institutions independently delineated targets (e.g., lumpectomy cavity, boost planning target volume, breast, supraclavicular, axillary and internal mammary nodes, and chest wall) and OARs (e.g., heart, lung) on the same CT images of three representative patients with breast cancer. Inter-observer differences in structure delineation were quantified with regard to volume, distance between centers of mass, percent overlap, and average surface distance. The mean, median and standard deviation for these quantities were calculated for all possible combinations. To asses the impact of these variations on treatment planning, representative dosimetric plans based on observer-specific contours were generated.
The variability in contouring the targets and OARs between the institutions/observers was substantial. The structure overlaps were as low as 10% and the volume variations had standard deviations up to 60%. The large variability was related both to differences in opinion regarding target and OAR boundaries as well as approach to incorporation of setup uncertainty and dosimetric limitations in target delineation. These inter-observer differences result in substantial variations in dosimetric planning for breast RT.
The differences in target and OAR delineation for breast irradiation between institutions/observers appear to be clinically and dosimetrically significant. A systematic consensus is highly desirable, particularly in the era of IMRT/IGRT.
Conservative surgery with radiation therapy (RT) has been established as an alternative to mastectomy for the management of early stage breast cancer. It is generally considered to be desirable to deliver a relatively uniform dose distribution inside the breast volume with minimal hot spot regions, as it has been shown that excessive hot spots in the patient results in poorer cosmetic outcome (1-5). A major challenge to improving dose uniformity is the irregular shape and size of the breast. In addition, it is necessary to adequately spare the organs at risk (OAR), e.g., lung and heart, in order to minimize risk of treatment related complications (6-8). In recent years, conformal RT, particularly, intensity modulated RT (IMRT), is becoming popular for breast irradiation as it offers reduced hot spots and/or increased normal tissue sparing (9-17). Moreover, recently available image-guided RT (IGRT) can significantly improve accuracy of conformal treatment delivery (18-20).
Accurate delineation of volumes of the targets and OARs is prerequisite and critical for conformal RT, as all subsequent decisions on treatment planning and delivery are based on these volumes. A few previous studies have reported that there are significant variations in defining target volumes for breast RT (21-27). For example, Hurkmans et al (22) observed, in a single institutional study, that the clinical target volume (the breast volume) delineated based on CT by multiple observers varied by 17.5%. In another single institutional study, Struikmans et al (26) showed that two volumes delineated by different observers overlapped on an average of 87% or 56% for breast or boost volumes, respectively. Recently, Landis et al (21) reported large variations in delineating the lumpectomy cavity among four academic radiation oncologists who specialize in breast RT. It has been shown in some of these studies that detailed delineation instruction and/or training can improve delineation consistency (22-24,26). Dijkema et al reported that the anatomically based guidelines they developed were useful for delineation of regional nodes for conformal local-regional irradiation of breast (28).
Inconsistencies in delineating target and OAR volumes, which have also been identified for several other tumor sites, e.g., prostate, lung, and brain (29-36), present a significant problem that can hinder the use of conformal RT and other advanced technology (e.g., IMRT, IGRT) to improve treatment outcomes. To reduce these inconsistencies, the Radiation Therapy Oncology Group (RTOG) is conducting an effort to establish a breast cancer atlas for RT. This work, as the first step of this RTOG effort, aims to systematically quantify the multi-institutional and multi-observer variability of target and OAR delineation for breast RT, and its dosimetric impact. The targets considered include lumpectomy cavity, breast, chest wall, and nodal regions, and the OARs include lung and heart.
Nine radiation oncologists (Observers #1-9) specializing in breast RT from eight institutions independently delineated targets and OARs on the same CT images of the following three representative cases of breast cancer patients:
A patient with stage I (T1c, N0, M0), left breast cancer, who had undergone lumpectomy and sentinel node biopsy, now planned for breast irradiation. Six metal clips were placed at the lumpectomy site. Various radio-opaque external markers including 4 wire markers for the cephalad, caudad, medial, and lateral extent of anticipated tangent beams, a wire extending around inframammary fold, and a wire over lumpectomy scar, are evident from the CT images. The targets and OARs to be delineated include the lumpectomy cavity, the boost PTV (i.e., lumpectomy cavity plus margin), the breast volume, the heart, and the left lung.
A patient with stage IIIB (T-3, N-3, M-0), tumor size 7 cm, 11/15 nodes positive, left breast cancer who had undergone mastectomy and axillary node dissection, planned for chest wall and nodal irradiation. Various external markers, including wire markers for caudad, medial and lateral extent of anticipated tangent beams, a marker at anterior-posterior (AP) set-up point at anticipated matchline with the anticipated supraclavicular field, and a wire over the mastectomy scar, are shown on the CT images. The structures to be delineated are the supraclavicular, axillary and internal mammary nodes, the chest wall and the boost PTV (boost chest wall volume), the heart, and the left lung.
A patient with stage IIIA (T2, N2, M0), tumor size 3 cm, 4/18 nodes positive, right breast cancer who had undergone lumpectomy and axillary node dissection, planned for breast and nodal irradiation. Six clips were left at the lumpectomy site. CT images were acquired with various external markers including wire markers for caudad, medial and lateral extent of the anticipated tangential beams, a marker at AP set-up point at anticipated matchline with the anticipated supraclavicular field, a wire extending around inframammary fold, and a wire over lumpectomy scar. The structures to be delineated are the supraclavicular, axillary apex and internal mammary nodes, the lumpectomy cavity, boost PTV (i.e., lumpectomy cavity plus margin), the breast volume, and the right lung.
The participants were instructed to delineate the required structures using their own segmenting tools with a window/level setting of 600/40 for soft tissue during the contouring. Since the main purpose of this work, as the first step of generating an atlas for breast RT, is to quantify multi-institutional and multi-observer variability in routine clinical use, no specific instructions were provided to the investigators as to how to delineate the targets of interest and the OAR. The variability obtained would represent the worst case scenario, because the use of a delineation guideline would improve delineation consistency, as shown from previous studies (22-24,26). Except for the lung, all structures were delineated manually.
All delineated structures (a total of 144) were input into an image registration software tool (37) for analysis. The volume and the center of mass (COM) for each structure were computed using the software. To quantify the differences between the contours drawn by different observers, two additional quantities, percent overlap (PO), and average surface distance (ASD), were introduced. For two given contours V1 and V2, the percent overlap is defined as:
where V1 ∩ V2 is the volume of the intersection, and V1 V2 is the volume of the union. The average surface distance is calculated as:
where, A and B denote two sets of points of the surface of the two the contours (V1 and V2), d(a,b) is the Euclidean distance between two points.
To quantify the inter-observer variations in the delineation, volume difference, distance between COMs (DCOM), PO and ASD were calculated for two sets of contours for each possible combination of the nine sets of contours drawn by the nine observers for the same structure. The mean, median and standard deviation (SD) for these quantities calculated for all possible combinations for a given structure are reported.
To assess the impact of the variation in delineation on dosimetric treatment planning, nine 3D dosimetric plans based on the nine sets of observer-specific contours for each case were generated using a commercial planning system (CMS, St. Louis, MO) by an experienced dosimetrist. The planning goals, aiming to cover the targets (e.g., the boost PTV, the breast, and nodal regions) as drawn, for the three representative cases were: (1) Case A: 45 Gy to 95% of breast volume, 60 Gy to 95% of the boost PTV with less than 5% of dose heterogeneity, 60 and 54 Gy to less than 30 and 50% of breast volume, respectively, 20 and 5 Gy to less than 10 and 20% of ipsilateral lung, respectively, and 22.5 Gy to less than 5% of heart, (2) Case B: 45 Gy to 95% of chest wall, 60 Gy to the boost PTV with less than 10% of dose heterogeneity, 48 Gy to 95% of supraclavicular nodal volume, 45 Gy to 95% of both axillary and internal mammary nodes, 20 Gy to less than 25% of ipsilateral lung, and 25 Gy to less than 9% of heart, and (3) Case C: 50 Gy to 95% of the breast volume with less than 5% of dose heterogeneity, 60 Gy to 95% of the boost PTV, 50 Gy to 95% of supraclavicular nodal volume, 45 Gy to 95% of both axillary and internal mammary nodes, and 20 and 5 Gy to less than 10 and 20% of ipsilateral lung, respectively. For each set of contours, the dosimetric plans were generated by properly adjusting the beam apertures, angles, weights, energies, and wedges, via forward planning, so that the planning goals on target coverage and OAR sparing can be reached as close as possible. Beam apertures were selected to fully cover the targets for each set of contours. Photon beams of 6 and/or 15 MV were used to irradiate breasts, chest wall and boost PTVs tangentially, supraclavicular and axillary nodes anteriorly and posteriorly. A combination of 6 MV photon and electron beams was used for internal mammary nodes.
A comparison of contours drawn by different institutions/observers for selected targets and OARs are presented in Figures 1, ,22 and and33 for Cases A, B and C, respectively. The contours of lumpectomy cavity, the boost PTV, the breast volume, and the heart for Case A projected in an axial and a coronal images are shown in Fig. 1. In Fig. 2, the contours of the internal mammary nodes, the chest wall, the boost PTV and the heart in an axial plane and the contours of the supraclavicular and axillary nodes, the chest wall and the boost PTV, and the heart in a coronal plane are presented for Case B. Fig. 3 shows the contours of the internal mammary nodes, the lumpectomy cavity, boost PTV, and the breast volume in an axial plane and those of the supraclavicular and axillary nodes, the lumpectomy cavity, boost PTV and the breast volume are shown in a coronal plane for Case C.
The variations between observers presented in Figs. 1--33 can be quantified using the quantities of volume, DCOM, PO and ASD. The minimum, maximum, median, mean and SD of these quantities for selected structures delineated are tabulated in Tables 1, ,22 and and33 for Cases A, B and C, respectively. The variation for automatically delineated lung volumes was found to be negligible. Therefore, the data for the lung are not included in Figs. 1--33 and Tables 1--33.
It is seen from Figs. 1--33 and Tables 1--33 that the variations between institutions/observers are generally substantial for all the structures delineated, except for lung. The least variability was observed for the lumpectomy cavity, where the SD for the volumes were 5.1 (15%) and 8.4 (49%) cm3, the mean DCOMs were 0.16 and 0.29 cm, the mean POs were 86 and 73%, and mean ASDs were 0.14 and 0.28 cm for Cases A and C, respectively. Note that the seroma and surgical clips were clearly visible for both cases. The most variability was observed for the nodal regions. For internal mammary nodes of case C, for example, the SD for the volumes was 3.6 cm3 (54%), and the PO between two observers was as low as 10%.
The significant variability observed in the target and OAR delineation between different institutions/observers will inevitably result in variations in dosimetric planning for breast RT. It was observed during the dosimetric planning that, because of the large variation in the target and OAR delineation, it was not possible to achieve the planning goals for all sets of contours. Dose volume histograms (DVH) obtained based on the contours delineated by different observers are compared in Figs. 4--6.6. To take into account the build-up region, the volumes used to generate DVHs were modified to within 5 mm below the skin if necessary. The DVHs for Case A for the breast volume and the lung delineated by the nine observers are shown in Fig. 4. It was found, although the coverage criteria for all the PTV contours is not varied dramatically (not shown), the variation in other target coverage (breast volume) and in dose of critical structures [the lung and the heart (not shown)] is substantial. For example, the breast volumes covered by the prescription dose (45 Gy) in the nine plans are in the range of 85 - 95%, the V20 (percentage volume receiving 20 Gy) for the lung varies between 5% and 25%, and the V20 and V10 for the heart change from 0 to 7% and from 2 to 20%, respectively.
For Case B, the DVHs based on the contours from all nine observers for (a) the chest wall and the heart and (b) supraclavicular nodes and the lung are presented in Fig. 5. It was found that the planning goal for the boost PTV was achieved for all PTV contours (not shown), while it was not possible to reach the planning goal for all contours of supraclavicular nodes. The V20 for the lung varies between 15 and 35%, and the V20 and V10 for the heart change from 15 to 35% and from 20 to 45%, respectively.
Figure 6 presents the DVHs for the boost PTV, the breast volume and the lung for Case C. While the coverage for all boost PTVs is satisfactory and with small variation, the coverage for the breast volumes by the prescription dose varies between 70 and 95%. The V20 for the lung ranges from 15 to 30%.
Despite the proliferation of advanced RT technology (e.g., IMRT, IGRT) for breast cancer, a consensus on both the target and OAR definition is lacking. Studies to systematically quantify the variability in structure delineation and its dosimetric impact for breast RT remain sparse. The inconsistencies in contouring can undermine the advantages of IMRT and IGRT and can compromise the validity of clinical trial results.
This study documents that the variations in delineating the targets and OARs for breast RT by well-experienced observers from different institutions are substantial for all relevant structures. The delineated structures overlapped as low as 10% and the volumes varied with SDs up to 60%. For the breast and chest wall, the most variations were observed in delineating their borders in medial-lateral and cranial-caudad directions. The variation in the delineation of regional nodes was generally higher than other structures.
This substantial variability can result in dramatic variations in dosimetric planning for breast RT. For example, the dose-volume relationship for critical structures delineated by different observers can be remarkably different, although the dosimetric coverage of primary targets may be consistent. In particular, the large variation for the breast and chest wall borders in the medial-lateral direction contributes most significantly to the variation in normal structure dose. As a consequence, this contour and dosimetric variability would contribute to the variation observed when comparing radiation related toxicity data from multi-institutions. Understanding these variations would certainly improve, for example, retrospective dose-response analyses based on multi-institutional clinical data.
It should be noted that, although, in the clinical practice, radiation oncologists may choose to modify their contours based upon the plan obtained during the planning process, the participating radiation oncologists were not involved in generating and in reviewing the dosimetric plans in this work. This consideration was because allowing the contour modification would add another variable and complication in objectively quantifying the inter-observer variation in delineation. From this sense, the dosimetric variability data generated in this work may represent the worst case scenario.
The variability observed in the delineation of targets and OARs may be related both to differences in opinion regarding clinical target and OAR boundaries as well as approaches to incorporation of treatment setup uncertainty and dosimetric limitations (e.g., beam penumbra) in target delineation. For example, because there was no obvious border to the breast superiorly or laterally, a large difference was seen in delineating the breast borders in these two directions. Even though the seroma and surgical clips were clearly visible for Case C, the delineated lumpectomy cavity volumes varied with a SD of 49%. To reduce/eliminate this variability, this author group is working to establish a systematic consensus as an extension to this work. This consensus including anatomic boundaries of targets and OARs will be reported in the near future and it would serve as part of the guidelines for future prospective studies of breast RT, particularly involving IMRT and IGRT. A breast RT atlas will be generated as a part of this effort and will be uploaded onto the RTOG website.
In conclusion, the differences in target and OAR delineation for breast irradiation between institutions/observers appear to be clinically and dosimetrically significant. A systematic consensus on the delineation needs to be established.
We wish to thank CMS Inc. for providing contouring tools, Siemens OCS, for making a segmentation software available to this work, Joann Herman, CMD, for helping on dosimetry planning, and Ran Tao and Edward Yang for helping on data collection. XAL and JW were partially supported by a grant from the Susan Komen Breast Cancer Foundation.
Presented as an oral talk at ASTRO annual meeting 2007
Conflict of Interest: There is no conflict of interest for all authors.