|Home | About | Journals | Submit | Contact Us | Français|
Target volume delineation is a critical, but time-consuming step in the creation of radiation therapy plans used in the treatment of many types of cancer. However, variability in target volume definitions can introduce substantial differences in resulting doses to tumors and critical structures. We developed TaCTICS, a web-based educational training software application targeted towards non-expert users. We report on a small, prospective study to evaluate the utility of this online tool in improving conformance of regions-of-interest (ROIs) with a reference set. Eight residents contoured a set of structures for a head-and-neck cancer case. Subsequently, they were provided access to TaCTICS as well as contouring atlases to allow evaluation of their contours in reference to other users as well as reference ROIs. The residents then contoured a second case using these resources. Volume overlap metrics between the users showed a substantial improvement following the intervention. Additionally, 66% of users reported that they found TaCTICS to be a useful educational tool and all participants reported they would like to use TaCTICS to track their contouring skills over the course of their residency.
Modern radiotherapy requires delivery of dose in a manner that conforms tightly to irregular tumor volumes[1–3]. Because these tumor volumes are often adjacent to radiosensitive normal structures, conformal dose deposition requires computer-optimized planning to select radiation beam angles that will provide the best ratio of tumor cell killing to normal structure dose .
The planning process for conformal radiotherapy is predicated on dose calculations derived from the Hounsfield units of given voxels on a simulation CT DICOM image set. Voxels within this dataset are then designated as regions of interest (ROIs) and assigned nominal designations as gross tumor volume (GTV), clinical areas of presumed microscopic spread (clinical target volume), or organs-at-risk (OARs). These ROIs, as DICOM-RT structures, can then be used as for dose calculation by the radiotherapy treatment planning software. Only those volumes defined by the physician observer are utilized for dose optimization, either to ensure sufficient dose to afford tumor control or spare OARs via dose constraint. As human users define these ROIs, substantial geometric variability can be introduced due to differences in target delineation. Target and OAR delineation is critical as this is initial step in the planning process and errors introduced here will be propagated during the creation of the treatment plans. Despite this criticality, inter-observer variability in target definition has been demonstrated in a series of studies, in multiple organ/anatomical sites. In fact, to quote a recent analysis, “interobserver variability in the definition of GTV and CTV is a major – for some tumor locations probably the largest – factor contributing to the global uncertainty in radiation treatment planning” .
In this paper, we briefly review the intended function and architecture of the TaCTICS (Target Contour Testing/Instructional Computer Software . TaCTICS is a prototype of a statistical software for target volume delineation of tumors and nearby organs-at-risk (OAR), intended for use in radiotherapy. It is a web-based tool that allows for near real-time data analysis and reporting of quantitative scoring metrics. These can include comparisons to reference contours, contours of other users or consensus structures derived from those of a set of designated experts. TaCTICS also provides users with a visual feedback of the target volume delineations in the form of structure overlays.
In the pre-CT era, radiotherapy was often delivered using bony anatomic landmarks with relatively large amounts of normal tissue coverage. The ability to use CT for imaging of target and normal structures allowed more accurate dose calculation as well as increased conformality, sparing non-target OARs [9, 10]and providing an avenue for safe dose escalation. More recently, computer optimization of beam position and intensity has allowed even greater conformality with regard to dose delivery using a technique called intensity modulated radiation therapy (IMRT) . Nonetheless, all radiotherapy is dependent on physician-derived inputs for targeting and dose avoidance. The physician user manually designates voxels as named ROIs on a DICOM file derived from a patient’s CT scan, using a conceptual framework defined by ICRU reports 50 and 62[12, 13]. ROIs within the aforementioned DICOM assigned nominal designations as visible tumor (GTV), clinical areas of presumed microscopic spread (CTV), or various named OARs. These ROIs are then utilized as DICOM-RT structures for dose calculation by the radiotherapy treatment planning software. Since these ROIs provide the definitions for dose constraints used for treatment planning, voxels which are uncontoured will not have dose monitoring, and incorrect designation will result in misattribution of dose delivery.  For instance, if tumor volumes are not contoured, they will likely be underdosed, resulting in persistent cancer, morbidity, and death. If normal structures are miscontoured, the actual anatomic structure may be overdosed, leading to side effects. Simply put, “garbage in, garbage out”.
In the pre-conformal radiotherapy era, standardized fields were used to prevent missing tumor. However, in the era of DICOM-based target delineation, data suggest that considerable observer-specific variation exists in target delineation and consequent dose distribution. High-interoberver variability complicates clinical trial quality assurance, and prevents ready comparison of treatment protocols. Clinical trial cooperative groups have recognized this as a major concern, with the largest cancer cooperative group stating emphatically :
“Most critical in the era of highly conformal radiation therapy is the delineation of the tumor and organs at risk… New three-dimensional and metabolic and molecular imaging studies are likely to become standard ….and their use therefore will need to be standardized… Indeed if this is not accomplished the future of reliable multidisciplinary cooperative group studies is at risk. Therefore, the Radiation Oncology Committee will need to develop methodologies that assure consistent and correct definitions of tumor and normal tissue when these technologies are employed.”
Consequently, there is a great unmet need for tools that allow evaluative measures to be collected and reported regarding inter-and intra-user performance in target delineation in a DICOM-RT environment, for use as self-assesment, teaching, testing, or credentialing end-points. Data have shown that formal educational efforts51 and atlas-based interventions  can improve contour standardization. Since formal training in radiotherapy practice begins in a 4-year post-medical school residency, this seemed a self-evident time to initiate a pilot study of a software designed for such aims, when the delineation learning curve begins. The purpose of this effort is to develop a software application which will allow users to delineate target structure ROIs in DICOM-RT compatible formats, followed by automated comparison and scoring of user-derived with ROIs defined by reference sets derived from expert users.
We begin with a brief review of the concept behind TaCTICS and the system architectures, as described in . TaCTICS is an open source, open access software package as well as hosted software currently containing a set of CT studies and associated contours for a multitude of anatomical sites.
In essence, TaCTICS is intended to be a user-friendly website that allows users to upload their radiotherapy planning structures in DICOM-RT format and to subsequently receive feedback regarding the conformance of these structures with a reference or those of other users. TaCTICS can be configured for a variety of scenarios including educational intervention, multi-site clinical trials, the development of consensus atlases and as a library of “expert” contours for training in resource-poor setting.
The three primary components of TaCTICS are the data, a set of metrics for comparison of structures and a user-interface for visualizing the comparisons and metrics.
As described previously , the hosted TaCTICS website currently contains data from two IRB-exempt projects conducted under the auspices of University of Texas Health Science Center San Antonio Institutional Review Board as well as two prospective studies.
The first retrospective dataset consists of DICOM-RT structured resulting from a double-blind study of an instructional intervention where users contoured a standardized case presentation of T3N0M0 rectal cancer case twice. Anonymized patient DICOM files of CT studies were used to the target delineation datasets. In this case, half of the participants had access to consensus-based anatomic atlas between the first and second sessions. 15 radiation oncologists, both experts and non-experts observers participated in the study and submitted a Gross target volume (GTV), and 2–3 clinical target volume (CTVs) for each of 2 contouring sessions, resulting in 94 distinct ROI structures available for analysis. Inter and intra-observer variability was evaluated and previously reported 
The second retrospective dataset consists of a previous study examining the effect of the human-computer user interface device (UID) on target volume delineation efficiency . In this study, observers contoured the several anatomical sites including a prostate, brain, lung, and head and neck case presentation twice. The first delineation was performed using a standard mouse-keyboard configuration while the second with a graphic tablet–pen interface. Twenty-one observers contoured target volumes with both UIDs resulting in more than 400 structures. Two users had been designated as ‘experts’ based for each site.
TaCTICS is currently capable of calculating a variety of volumetric overlap and surface distance metrics including Volumetric Difference (VD), Dice and Jaccard coefficients, and false positive and false negative Dice. Additionally, surface-based measures including the robust Hausdorff distance can be computed. 
TaCTICS also provided a number of techniques for creating consensus from a set of contours. These include probability maps as well as the commonly used Simultaneous Truth and Performance Level Estimation (STAPLE), an expectation- maximization algorithm  that computes a probabilistic estimate of true segmentation given a set of manual contours. In addition to the expert derived contours we have created an additional set of “ground truth” contours using this algorithm.
TaCTICS can be configured to allow users to compare their structures to the structures of all other users or to the structures from a particular “expert” user or consensus structures generated using STAPLE or a probability-map based approach.
The TaCTICS website itself was built using open-source software libraries, including the Ruby on Rails1 framework and the ruby-dicom library2. After a user uploads the DICOM-RT file produced by their treatment planning software, TaCTICS extracts information about the studies contained therein including location of the CT slices containing contours, DICOM header metadata such as region or ROI names, and volume and slice information for each the delineated structures. The system stores these data, together with the metrics derived from all users’ uploaded files, in a relational database. The actual processing of the structures themselves, along with the calculation of the metrics, is performed in C++ using the ITK toolkit3. The flow of data and user interaction for the system is described briefly.
Users can log in and download CT slices for their desired study (figure 1). They can the delineate the target volumes using their usual treatment planning system and upload the resulting DICOM RTSTRUCT file to the TaCTICS website. Based on how the system is configured, users can then select experts or consensus structures for comparison.
The users are able to compare their own or other users’ structures against reference structures as previously described, and in doing so may calculate any of the previously-described metrics and review histograms illustrating how their own performance compares to that of the other users of the system. They may also directly visualize their structure contours together with reference or expert contours composed over thumbnails of CT slices. Users can identify their place on a histogram of all users. Users can then, by their relative histogram position, judge visually as well as numerically their agreement with a reference.
Since OAR contouring is potentially as important and error-prone as target definition, we sought to ascertain whether TaCTICS, in concert with an atlas-based intervention, might improve OAR delineation for complex, less commonly contoured anatomic structures (e.g. which might not be used for dose constraint/monitoring in standard practice) in a comparatively complicated anatomic location, the head and neck. Since previous data suggests that failure to delineate OARs leads to clinically meaningful side effect profile alteration  the validation study is quite clinically pertinent to toxicity management. Eight observers in various stages of training (incoming PGY-1 residents through 9-months post-residency) were given a series of head and neck organ at risk (OAR) and anatomic lymph node (LN) levels to contour on a standardized normal (e.g. no lymphadenopathy) anatomic DICOM of a T1N0M0 head and neck cancer case (Case 1). Observers were asked to define as ROIs 26 OARs (parotids, submandibular glands, sublingual glands, upper and lower inner lips, soft palate, cochleas, vestibular apparatus, middle ear) and LNLs (levels II–V, retropharyngeals), designated right (R) and left (L) for lateralized ROIs. Users were asked to contour the case as they would normally based on their clinical and anatomic knowledge, without outside reference materials. Users had their contours submitted to TaCTICS, and were then asked to contour a similar case, this time with OAR and LN atlases[23–25]. Furthermore, users were given credentialing scenario in which, using TaCTICS, ROIs could be compared to an “Reference Caution” volume, representing anatomic borders of atlas-assisted delineation of OARs plus a 2 .5 mm margin and anatomic LN levels with a 5 mm margin (Figure 2) ROIs outside the “Reference Caution” would be deemed a minor credentialing variation, and, were the trial active, would result in a request to pay special attention to ROIs outside of the reference “safe zone”. A “Reference Flag” ROI, consisting of atlas-assisted delineation of OARs plus a 5 mm margin and anatomic LN levels with a 1 cm margin were also available (Figure 2). ROIs extending outside this volume would be “flagged” as contouring protocol non-compliant, and the user would be asked to review those ROIs that failed to meet criteria (Figure 3). Users could also compare their contours to other observers (Figure 4). Users were then given online access to peer-reviewed OAR/LNL contouring atlases, as well as real-time software feedback on submitted ROIs using TaCTICS to allow self-comparison on axial slices to reference ROIs demarcating atlas-compliant and non-compliant anatomic regions. Users then contoured a second DICOM set (Case 2), using TaCTICS and the online atlases. Online survey data regarding the users demographic data and subjective experience was collected after Case 2 completion. Non-parametric pairwise comparison of Dice similarity coefficient (DSC) values for Case 1 vs. Case 2 for all OARs/LNL ROIs were performed to evaluate inter-observer agreement.
This small, prospective study demonstrated the feasibility of providing a training intervention using TaCTIS in an educational setting. The mean inter-observer agreement improved in 23 of the 26 structures following the intervention using TaCTICS and the atlases. In particular, some structures showed very substantial improvements in interobserver agreement between users as see in Table 1.
A statistical pair-wise analysis indicated that this difference in average inter-observer agreement as measured by the Dice coefficient was DSC was statistically significant (p<=0.05) for bilateral LNL I–IV, bilateral parotid, and the bilateral submandibular gland ROIs (See Table 1)
Additionally, the average Dice coefficient of all users (all structures, compared to all other users) improved following the intervention as seen in table 2.
The initial Dice coefficient, not surprisingly, was correlated with the previous contouring experience and years or training as seen below in figure 5.
Survey results showed 66% of survey respondents reported software feedback as “helpful/extremely helpful”, with 100% deeming atlas-based resources “helpful/extremely helpful”. All respondents reported they would like to use TaCTICS to track contouring skills over residency.
Our pilot data suggested that atlas-based intervention combined with real-time software feedback was feasible and resulted in more uniform ROI contours. This suggests that online training modules might be constructed with TaCTICS to allow longitudinal self-assessment of OAR/LNL/target delineation for resident trainees, and for cooperative group clinical trail credentialing.
We have observed substantial variations in the delineation of tumors and nearby organs in all of the multi-site, multi-user studies that are currently in the TaCTICS database. We believe that this underscores the need for training and credentialing modules, especially for more novice users and residents in order to achieve compliance with consensus structures.
In this paper, we have reported on the findings of a prospective study using the real-time feedback provided by TaCTICS in combination with an atlas-based intervention that resulted in a demonstrable standardization of ROIs. This suggests that TaCTICS may facilitate longitudinal self-assessment of students and residents in their performance of tumor and normal structure delineations. Additionally, as previously reported, TaCTICS has demonstrated utility in generating consensus structures from a set of expert contours.
Consequently, we believe that the TaCTICS can be a useful tool in a variety of scenarios in radiation therapy target volume delineation. These include the educational and certification context, during multi-institutional clinical trials and in generating consensus among experts. Additionally, we believe that such a web-based software-assisted training platform may beneficial in disseminating best practices in resource or expertise poor locales.
We have made numerous improvements to the system based on user-feedback and will continue to do so. These include additional options for the user-interface and visualization, additional metrics, and methods for creating consensus. We currently have data from 7 studies in the system and will continue to increase the size and variety of the data.
Additional larger prospective studies are being initiated that will more formally evaluate the utility of this system in an educational setting .
JKC was supported by a K99-R00 grant from the National Library of Medicine 1K99LM009889-01A1
SDB was supported by a training grant T15LM007088 from the National Library of Medicine.
C.D.F. was supported by a T32 Training Grant from the National Institutes of Health/National Institute of Biomedical Imaging and Bioengineering, (“Multidisciplinary Training Program in Human Imaging”, 5T32EB000817-04), and the National Institutes of Health/National Cancer Institute Clinical Research Loan Repayment Program (“Optimization of physician target volume delineation in the cooperative group setting”, L30 CA136381). The funder(s) played no role in study design, in the collection, analysis and interpretation of data, in the writing of the manuscript, nor in the decision to submit the manuscript.
*-JKC and CDF contributed equally to this work.
The authors wish to gratefully thank Daniel Baseman, MD, Jehee Choi, MD, Emma Ramahi, MD, Abhilasha Patel, MD, Elizabeth Maani, MD, Anna Harris, MD, Virginia Clyburn, MD, and William E. Jones, III, MD for their efforts and participation in the aforementioned study.