Despite the well-known consequences of geometric inaccuracy in target volume delineation15–17
, inter-observer variability in target definition has been demonstrated in a host of studies, in various anatomical sites18
. Simply put, “inter-observer variability in the definition of GTV and CTV is a major – for some tumor locations probably the largest – factor contributing to the global uncertainty in radiation treatment planning”18
. Consequently, there are continuing efforts to implement solutions to possible sources of variability/error in the target volume delineation process. These solutions have included optimization of imaging inputs19–22
, instructional protocol modification 5, 23, 24
, integration of specific training programs25, 26
, development of software tools27–31
, and implementation of standardized guidelines32–37
for distinct anatomic sub-sites. For clinical trials, the situation is potentially more vexing, as insurance of adequate treatment uniformity between comparison cohorts necessitates comparatively increased attention to both protocol construction and enrollee plan review, costing significant time in terms of resources for the primary investigator(s).
In terms of feasibility, the study was readily completed (total study duration of 5 months). A total of 12/26 (46%) invited SWOG institutions confirmed intent to participate; however, only 8/26 (31%) had resultant submissions. Nonetheless, our findings suggest that a reasonably powered target delineation trial might be implemented with a modicum of cooperative group resource allocation in timely manner, and that such a study is both technically and logistically feasible. Analysis of resultant data alludes to the difficulty of executing clinical trials in the conformal radiotherapy era. The high proportion of major protocol deviations was consistent with previous reports. The substantial variation from expert reference and median contour surfaces observed for all users pre-intervention ( and , ) suggests that efforts to further minimize inter-observer variability are imperative. As demonstrates, substantial inter-observer surface deviation was observed for multiple CTVA sub-regions before atlas implementation. After atlas administration, a reduction of 0.3, 0.6 and 0.8 cm was achieved for the upper-anterior, lower-lateral and lower-posterior CTVA sub-region upper limit of standard deviation from the median isosurface. Although >1 cm would still be needed to cover 95% of all contouring variability, the achieved reduction(s) would result in a decrement in required PTV expansion margins. However, further reduction of variation is desired, because the PTV margins required to encompass the residual variation in target delineation would limit the practical advantages of IMRT over conventional RT.
Several limitations to this pilot study are evident. The sample size is of this study is limited, and only a single case was contoured. Utilization of a reference expert’s contours as a de facto “gold standard” points to the fact that the “ground truth” in contouring clinical target volumes remains ambiguous (, noting variation within the reference expert user’s sequential contours). Some variance in the study might be attributable to instructions which were distinct from standard clinical practice (e.g. the external iliacs are not typically contoured for T3 rectal cases). Our invitation was limited to SWOG institutes, creating potential sampling bias, and the fact that only interested observers participated creates an avenue for selection bias. Nonetheless, our data suggest that inclusion of a visual atlas in addition to written instructions can improve conformance to a reference expert’s contours (), as well as reducing inter-observer variability, to a statistically detectable degree (). However, our data also suggest substantial residual variability in rectal target volume delineation, even after atlas utilization (, ).
This study is consistent with previous investigations of educational interventions and consensus guideline application in contouring studies. Recently, Bekelman et al.25
demonstrated improvement in contour quality after a directed teaching intervention, echoing previous work by Tai et al. showing increased protocol compliance after a site-specific educational experience26
. With regard to consensus guideline application as an avenue towards target variability reduction, Dimopulous et al.32
detail a study in which 19 cervical cancer cases were contoured using GEC-ESTRO guidelines by two observers, with a resultant between-user conformity index (CI)11, 21
in the range 0.6–07 for target volumes volumes, roughly consistent with CNs in the current series. Likewise, Wong et al. recently demonstrated, using a test-retest sequence, that improved consistency in seroma contouring could be observed after exposure to consensus guidelines38
. In the clinical trial setting, it is likely that “trial specific” atlases should be used, based on patterns of failure data (as per Roels et al.39
) or, possibly, after a pilot contouring trial similar to the present study. For instance, RTOG anorectal consensus guidelines stipulate coverage “extending CTVA ~1 cm into the posterior bladder, to account for day-to-day variation in bladder position10
.” This incorporation of motion into CTV generation, rather than PTV expansion, represents a conceptual break with ICRU 6240
and other guidelines39
, wherein the posterior bladder wall would not be contoured. No users in Group_A included significant portions of posterior bladder pre-atlas, whereas a majority did so after atlas exposure (in compliance with the presented atlas10
and consistent with the reference expert).
Future studies will be required to ascertain if observed effects of atlas administration are transferable to other anatomic sites with potentially more complicated anatomic relationships5, 24
. The SWOG Radiation Therapy Committee intends to suggest building target delineation studies into clinical trial protocol development/quality assurance processes. Aspects of this dataset may also be integrated into design of educational materials for a proposed Dutch cooperative group rectal study workshop. We plan to use portions of this dataset to construct composite models accounting for rectal motion and set-up variability14, 41
, as well as development of novel software strategies for evaluation42
and minimization of target delineation variance.