|Home | About | Journals | Submit | Contact Us | Français|
: A study of interobserver variation in the segmentation of the post-operative clinical target volume (CTV) and organs at risk (OARs) for parotid tumours was undertaken. The segmentation exercise was performed as a baseline, and repeated after 3 months using a segmentation protocol to assess whether CTV conformity improved.
: Four head and neck oncologists independently segmented CTVs and OARs (contralateral parotid, spinal cord and brain stem) on CT data sets of five patients post parotidectomy. For each CTV or OAR delineation, total volume was calculated. The conformity level (CL) between different clinicians' outlines was measured using a validated outline analysis tool. The data for CTVs were reaanalysed after using the cochlear sparing therapy and conventional radiation segmentation protocol.
: Significant differences in CTV morphology were observed at baseline, yielding a mean CL of 30% (range 25–39%). The CL improved after using the segmentation protocol with a mean CL of 54% (range 50–65%). For OARs, the mean CL was 60% (range 53–68%) for the contralateral parotid gland, 23% (range 13–27%) for the brain stem and 25% (range 22–31%) for the spinal cord.
There was low conformity for CTVs and OARs between different clinicians. The CL for CTVs improved with use of a segmentation protocol, but the CLs remained lower than expected. This study supports the need for clear guidelines for segmentation of target and OARs to compare and interpret the results of head and neck cancer radiation studies.
Carcinoma of the salivary glands constitutes 1–3% of head and neck cancer (HNC), with most tumours arising in the parotid glands. The mainstay of treatment is surgery, with post-operative radiotherapy in selected cases. Post-operative radiotherapy reduces local recurrence rates from 30% to 10% but has not been shown to impact on overall survival . The aim of radiation therapy is to deliver homogeneous high-dose radiation to the target area with relative sparing of the healthy surrounding tissues. The physical delivery of radiotherapy has improved with the introduction of intensity-modulated radiotherapy (IMRT), which allows greater sparing of the surrounding healthy tissue and reduces the risk of long-term treatment morbidity , and image-guided radiotherapy (IGRT), which helps to improve geometric localisation of the high-dose volume to the target.
Sources of geometric uncertainty in dose delivery have been well characterised in modern IMRT workflows . Despite this, geometric uncertainty due to target volume and normal tissue structure segmentation has escaped close scrutiny. It has been suggested in a number of tumour sites that the largest systematic error in radiotherapy planning today is related to the radiation oncologist's ability to localise and delineate the target volume [4-15]. When using IMRT and inverse planning, accurate and consistent segmentation of target and normal tissues impacts not only on the spatial localisation of the high-dose volume, but also on accuracy of dose calculation to organs at risk (OARs). Inaccuracies in segmentation can compromise treatment, either by unnecessarily treating the healthy surrounding tissue (which may increase treatment morbidity) or by inadequate target segmentation risking marginal tumour recurrence.
Previously published studies have assessed techniques to reduce interobserver and intraobserver variation in segmentation of gross tumour volume (GTV) and clinical target volume (CTV). These include the use of multimodality imaging such as MRI and positron emission tomography (PET)–CT [10,16-20], multidisciplinary team contouring [5,21], use of segmentation guidelines  and using auto-delineation software [23,24]. Studies have looked at interobserver and intraobserver variation in GTV (primary tumour) segmentation for HNC [10,13,25,26]. Changes in the patient anatomy post surgery introduce difficulties for post-operative CTV segmentation. There is limited data on interobserver variation in CTV segmentation in HNC. Rasch et al  found significant interobserver variation in CTV segmentation in patients with paranasal sinus tumour, especially in the anterior border of the nasal cavity, possibly due to lack of anatomical landmarks to define the edge of the volume.
We studied interobserver variation in the segmentation of CTVs and OARs for patients receiving post-operative radiotherapy after parotidectomy and the impact of introducing segmentation guidelines for the CTV. We tried to replicate the clinical setting by asking radiation oncologists to outline both the primary tumour bed CTV and high-risk nodal groups (if appropriate). Currently, there are well-established guidelines for segmentation of CTV nodal volumes in HNC , but there are no standard segmentation guidelines for patients receiving post-operative radiotherapy to the parotid bed. Cochlear sparing therapy and conventional radiation (COSTAR) is a UK National Cancer Research Institute-supported head and neck trial, designed to demonstrate a difference in sensori-neural hearing loss with cochlear sparing IMRT compared with conventional radiotherapy . The COSTAR trial management group created a segmentation protocol for the parotid bed CTV, which includes landmarks for defining extent of microscopic disease and recommends guidelines for elective and post-operative nodal irradiation. We obtained written permission to use the COSTAR segmentation guidelines for this study.
Planning CT data sets of five patients were randomly selected from the Addenbrooke's radiotherapy database. All five patients had been diagnosed with malignant parotid tumours (two acinic cell carcinoma, one pleomorphic carcinoma, one adenocarcinoma and one neuro-endocrine tumour) and had completed post-operative radiotherapy. Each patient had been immobilised in a perspex shell, and non-contrast CT images were acquired on a CT scanner (GE Healthcare, Waukesha, WI) at 120 kV energy with a 50-cm field of view, 3-mm slice thickness and pixel size of 0.97 mm. Each data set was anonymised and distributed directly to the participating oncologists in the three regional cancer centres (Cambridge, Norwich and Ipswich). This study was registered as a service evaluation in the Anglia Cancer Network.
Four consultant radiation oncologists with a special interest in HNC were invited to participate. Each oncologist sees nearly 100 new patients with HNC per annum and has more than 5 years of experience in the use of IMRT for HNC. Radiation oncology trainees were not included in the study to minimise variability due to lack of experience. Each observer was identified by a letter (A–D) in order to preserve anonymity.
The four observers were asked to segment the CTVs and OARs on axial slices of the planning CT data sets. The baseline segmentation exercise was performed according to the oncologist's routine practice. The CT window levels were adjustable to suit individual preferences for segmentation. Information on clinical history, surgical findings, staging and histology report (Table 1) were provided. All four observers were later provided with the COSTAR segmentation protocol and asked to resegment the CTV on the same CT data sets with a minimum interval of 3 months to avoid recall bias. The COSTAR protocol defines the CTV of the parotid bed and the indication for ipsilateral neck irradiation. The CTV1 defines the high-risk volume, which includes the parotid bed, adjacent tissues at risk of microscopic spread and ipsilateral lymph node levels Ib, IIa and IIb. CTV2 defines the low-risk volume, which includes ipsilateral node levels III, IV and V. CTV1 and CTV2 were fused together as CTV for the purpose of the analysis. The OARs were not resegmented by the observers as the COSTAR guidelines have no segmentation protocol for OARs.
These measurements were grouped together and compared to assess:
We calculated the total volume outlined by each of the four clinicians for the CTV and OARs. The degree of agreement between observers' delineations was quantified using the conformity level (CL) proposed by Kouwenhoven et al . This CL is a generalisation of the Jaccard coefficient for more than two delineated volumes (the Jaccard coefficient for two delineated volumes A and B is equal to the ratio of the common and encompassing volumes, i.e. |A∩B|/|AB|). It has the desirable properties that it is unbiased with respect to the number of delineations and equals the Jaccard coefficient in the case of two delineations. The CL ranges between 0% and 100%, where 100% is full concordance and 0% is no concordance. Hence, the CL can express conformity of a group of contours without requiring the identification of a gold standard contour and is therefore particularly appropriate for assessing conformity of a group of outlines.
Some variation in the CL for spinal cord and brain stem OARs was due simply to variation in the superior and inferior extent of the OARs outlined, because of variation in the superior–inferior extent of the CTV. Such variation would have minimal clinical impact as the radiotherapy plans are optimised based on maximum point dose and are independent of the length of OARs outlined. We therefore also calculated the CL for brain stem and spinal cord after restricting the analysis to the CT slices on which OARs were segmented by all four clinicians; this we denote CL (axial).
A total of 40 CTV outlines and 20 OAR outlines each for spinal cord, brain stem and contralateral parotid gland were compared using the outline analysis method.
There was a significant variation in the total volume of the CTV among clinicians, with observer B consistently delineating the largest and observer C consistently delineating the smallest CTV. The mean volume and standard deviation (SD) for each case is presented in Table 2. A significant difference in CTV morphology was observed between the four observers, with the mean CL 30% (range 25–39%; Table 2).
Using the segmentation guidelines, the mean CL improved from 30% to 54% (p=0.043 using the Wilcoxon paired analysis; Table 2). This improvement in concordance between clinicians was also confirmed by reduced variation in CTV volume (reduced SD; Table 2). Using the segmentation guidelines, observer B's volumes were largest in four patients, with no observer consistently responsible for the smallest volume. Overall, there was a significant increase in the mean CTV volume calculated for each patient after the use of the COSTAR segmentation protocol from 170 to 273 cm3 (p=0.043 using the Wilcoxon paired analysis).
There was a significant variation in the volume of contralateral parotid gland between clinicians (Table 3) with a mean CL of 60% (range 53–68%). The concordance for brain stem and spinal cord was poor, with mean CLs of 23% (range 13–27%) and 25% (range 22–31%), respectively (Tables 4 and and5).5). The interobserver variation for brain stem and spinal cord was predominantly related to the segmentation of different lengths of the OARs; however, significant interobserver variation was also noted in the axial direction. The mean CLs (axial) for brain stem and spinal cord were 45% (range 42–51%) and 60% (58–64%), respectively.
This study demonstrates variability in target segmentation between experienced HNC oncologists with a mean CL for the CTV of 30%. This degree of uncertainty is not accounted for in the CTV-to-PTV margin calculation (typically 5 mm for HNC). It also confirms considerable variation among radiation oncologists in segmenting OARs. The uncertainty in segmentation of OARs is not accounted for in the currently used planning OAR volume (PRV) margin calculation (3 mm in our unit). Use of the COSTAR segmentation protocol improved the mean CTV CL (from 30% to 54%), and this was associated with an increase in the mean CTV volume for all five patients.
This study has a number of limitations. The number of clinicians involved was small, and, though all clinicians had more than 5 years of experience in HNC IMRT, each would see fewer than five patients each year with malignant salivary tumours. Clinicians had no access to the pre-operative radiology and surgical notes. The CT images used were non-contrast; use of contrast agent to highlight vascular structures may facilitate nodal delineation. However, this was standard practice in our region at the time, and it is unlikely that use of contrast would have significantly altered the results, as all clinicians had extensive experience of delineating without contrast. Finally, it should be remarked that this study was concerned with interobserver variation; there was no attempt to analyse intraobserver variation.
The potential reasons for significant interobserver variations in CTV outlines in the absence of a segmentation protocol (Figure 1, left image) include:
The use of the COSTAR segmentation protocol improved the CL in target segmentation from 30% to 54%. There remained significant variation in CTV segmentation anteriorly and in the infratemporal region between different observers after using the COSTAR protocol, possibly due to lack of bony reference points in the anterior part of the neck and in the infratemporal region. These results are consistent with previous publications for other tumour sites, including prostate cancer , oesophageal cancer  and HNC . Mitchell et al  showed significant interobserver variation in CTV segmentation for post-prostatectomy radiotherapy, which was reduced by the use a contouring protocol. Tai et al  reported an improved consistency in cervical oesophagus target segmentation after one-to-one training. Three cases were used to represent different clinical situations and the use of a segmentation protocol reduced the variation of the longitudinal positions of GTV, CTV and PTV for all cases. Berson et al  asked two neuroradiologists and two radiation oncologists to delineate the GTV on PET–CT fusion images for 16 HNC patients after a short tutorial and using a contouring protocol. They reported reduced interobserver variation after use of the contouring protocol. The observed increase in mean CTV volume with use of a segmentation protocol is also consistent with previously published studies [12,30].
For the contralateral parotid gland, the mean CL was 60% (range 53–68%). This variation appeared to be due mainly to difficulty in defining the deep lobe of the parotid gland on CT, lack of bony landmarks anteriorly and the use of different CT windowing during segmentation. The total volume outlined by the four observers differed significantly.
We found significant variation in the length of brain stem and spinal cord outlined. This was partly related to the variation in the length of the CTV; however, these large variations between experienced clinicians highlight that any contouring protocol used needs to be meticulous about defining the extent of OARs by agreed anatomical boundaries. We also found significant variation between outlines in the axial direction (Figure 2), which may have an impact on radiotherapy plan optimisation. We suspect that poor soft tissue characterisation on CT and use of different CT windowing contributed to the observed axial variation. We also found a significant variation among observers in defining the junction between brain stem and spinal cord. The inferior border of brain stem is at the level of the foramen magnum , and as the spinal cord is more radiosensitive than brain stem, we propose that the lower end of foramen magnum (posterior lip of the foramen magnum) is used as the junction between spinal cord and brain stem.
To our knowledge, this study is the first to look at interobserver variation in CTV segmentation for parotid tumours that has included nodal segmentation and assessed the impact of segmentation guidelines. We found that the CL improves after using the COSTAR protocol, but the results were still below 80%. Complete agreement between two or more observers (CL of 100%) is rare and a conformity of >80% is generally accepted as highly concordant . These levels of concordance are typically observed in the segmentation of intact gross tumour with non-infiltrative margins and high-quality soft tissue imaging. CTV segmentation in the post-operative setting is more difficult, as the clinician must make allowances for altered anatomy post surgery and estimate the extent of potential microscopic disease that cannot be imaged using CT or MRI. The use of contouring protocols can reduce interobserver variation but cannot completely eliminate it. Discussion with the head and neck radiologist in interpretation of CT-based cross-sectional anatomy and defining the optimal window setting for segmentation may be useful.
This study highlights the need for international segmentation guidelines for OARs. The advantage of high-dose sculpting with IMRT to reduce dose to OARs is neutralised if the segmentation is inaccurate or inconsistent. The low CLs for OARs observed in our study are similar to previous published results [25,33]. For example, Geets et al  showed poor reproducibility between five observers in segmenting the parotid gland and spinal cord on both CT and MRI. The MRI-based parotid volumes were however smaller than CT and more accurate in delineation of the deep lobe. The interobserver variation for OARs needs to be minimised in order to accurately compare and interpret treatment-related morbidity from different radiotherapy centres. To our knowledge, there are no studies looking at the effect of segmentation guidelines for OARs for HNC. van de Water et al  have defined segmentation guidelines for OARs involved with salivary function, which we anticipate will reduce interobserver variation.
This study demonstrates significant interobserver variation in CTV and OAR segmentation between experienced HNC radiation oncologists. The CL for CTV delineation improved significantly after using the COSTAR segmentation protocol, but delineations were still not highly concordant. Successful implementation of newer radiotherapy techniques such as IMRT and IGRT in HNC depends crucially on minimising this systematic treatment planning error. We strongly recommend that contouring protocols are used in the segmentation of the target volume in the post-parotidectomy radiotherapy setting and segmentation guidelines are developed for OARs. Outcome data for patients treated with IMRT should include assessment of local failure in relation to target delineation. More efforts are still needed to minimise systematic errors in radiotherapy planning.
We would like to acknowledge Dr Christopher Nutting, Principal Investigator for the COSTAR trial, and Dr Teresa Provost, Medical Statistician at the Centre of Medical Applied Statistics, Cambridge.