|Home | About | Journals | Submit | Contact Us | Français|
Target volume delineation is a critical step in the creation of treatment plans in radiation therapy. However, intra-observer and inter-observer variability in target volume definitions can introduce substantial differences in resulting doses between treatments plans from different users and institutions Consequently, there is a need for tools that allow quantitative metrics to be collected and reported regarding inter-and intra-user performance in target volume delineation. We describe TaCTICS, a web-based educational training software application targeted towards residents and non-expert users. TaCTICS allows users to delineate target structures in DICOM-RT compatible formats using their preferred treatment planning system. After uploading the resulting structure file, users are provided a scoring of their structures based on comparison to reference sets derived from expert users using a variety of metrics for volume overlap and surface distances.
Conformal radiotherapy affords delivery of tumoricidal radiation doses to user-defined target volumes while minimizing dose to spatially adjacent non-target organs-at-risk. This precision of computer-enabled delivery allows exceptional dose-volume matching capability; nonetheless, the steep dose gradients imply that even minor geometric uncertainties may result in substantial dose deviations from intended prescriptions, which may in turn underdose tumors or overdose radiosensitive tissues. Target volumes (TV) and organs-at-risk (OARs) for treatment planning are manually defined by human users as regions of interest (ROIs), introducing possible geometric variability due to variations in gross tumor (GTV), clinical target (CTV) or internal target volume (ITV) delineation.
As the initial step in radiotherapy planning, target delineation becomes critical, since the most conformal plan is of reduced clinical utility if the delineated target volumes do not accurately depict actual targets or organ structures in 3D-space.
Despite the well-established clinical importance of accuracy in target volume delineation, inter-observer variability in target definition has been demonstrated in a series of studies, in multiple organ/anatomical sites [1–4]. Simply put, “interobserver variability in the definition of GTV and CTV is a major – for some tumor locations probably the largest – factor contributing to the global uncertainty in radiation treatment planning” .
In this paper, we discuss the development of a software platform “Target Contour Testing/Instructional Computer Software (TaCTICS): A Novel Training and Evaluation Platform for Radiotherapy Target Delineation”1. This is a prototype for a target delineation statistical software with a graphical user interface (GUI) that allows for near real-time data analysis and reporting of quantitative scoring metrics that compare user-derived structures with reference sets derived from expert users, specifically to allow self-evaluation for residents and other non-expert users
Conformal radiotherapy has brought the capacity to shape dose gradient in an effort to approximate three-dimensional tumors and target structures, such that tumoricidal dose may be delivered to areas at risk for neoplastic involvement, while sparing proximal non-target tissue. The planning process for conformal radiotherapy is predicated on dose calculations derived from the Hounsfield units of given voxels on a simulation DICOM image set (typically CT). Voxels within this dataset are then designated as ROIs and assigned nominal designations as GTV, clinical areas of presumed microscopic spread (CTV), or OARs. Only those volumes defined by the physician end user may be utilized to either prescribe sufficient dose to ensure tumor demise (GTV/CTV) or spare organs (OARs) through dose delivery constraints (International Commission on Radiation Units and Measurements. 1999). The ROIs are then utilized as DICOM-RT structures for dose calculation by the radiotherapy treatment planning software.
Since these ROIs provide the definitions for dose constraints used for treatment planning, accurate target delineation is crucial for precision radiotherapy .
In the pre-conformal radiotherapy era, standardized fields were utilized to ensure uniformity of treated regions. However, in the era of volume-based delineation, data suggest that considerable operator dependant variation exists in target volume delineation and consequent dose distribution . This variability complicates clinical trial quality assurance and prevents ready comparison of treatment protocols. Several groups have also sought to account for systematic variability introduced in target delineation in order to optimize volume definition and treatment planning margins.
Recent survey data suggest, that significant numbers of radiation oncologists receive minimal formal training in intensity modulation radiotherapy, and points to a need for greater research into optimum methodologies for user instruction in target delineation, among other aspects of IMRT practice. Furthermore, in recognition of the importance of proper target delineation, a host of didactic educational activities have been created to assist clinicians in developing target delineation skill-sets for clinical practice. While there are software programs allowing interactive instruction (www.educase.edu, www.anatom-e.com), few extant software/devices provide automated/semiautomated real-time instructional feedback regarding target volume delineation skill-development for trainees in radiation oncology. Little extant data exist regarding how to evaluate acceptable levels of user competency in target delineation . Despite great interest, comparatively little data has to date been presented regarding strategic optimization of target delineation itself, either as a function of standardized practices or as a function of deliberate educational curriculum.
Several series have also established that user variability in target volume delineation may result in potentially significant dosimetric differentials between prescribing radiation oncologists. Additionally, collected data suggest that clinical trial data may be obfuscated by user-dependent differentials in prescription volume determination. While this may be partially ameliorated by modification of study criteria with regard to volume delineation, there remains, at present, no efficient, automated mechanism to evaluate target volumes.
Consequently, there is a great need for tools that allow evaluative measures to be collected and reported regarding inter-and intra-user performance in target delineation in a DICOM-RT enviornment. The purpose of this effort is to develop a software application which will allow users to delineate target structure ROIs in DICOM-RT compatible formats, followed by automated comparison and scoring of user-derived with ROIs defined by reference sets derived from expert users.
The steps in creating TaCTICS consisted of :
We will briefly discuss each of these steps next.
These prospective IRB-exempt projects were conducted under the auspices of the University of Texas Health Science Center San Antonio Institutional Review Board. As part of two separate target delineation protocols, anonymized patient DICOM files were used to construct target delineation datasets for comparison of inter- and intra-observer target delineation variability. In each of these datasets, observers contoured the same dataset twice, albeit with either instructional or software modification  as a testing variable.
Dataset A consists of DICOM-RT ROIs derived from a double-blind, randomized hypothesis generating pilot study [9, 10] designed to test the impact of instructional modification of user-generated contours. Users were asked to contour a standardized case presentation of T3N0M0 rectal cancer case twice, with half of users randomized after the initial contouring session to receive a (then unpublished) electronic PDF of a newly developed consensus-based anatomic atlas. Results of this data have been presented previously. The study enrolled 15 radiation oncologist observers (experts and non-experts), who submitted a GTV, and 2–3 CTVs for ach of 2 contouring sessions, resulting in 94 distinct ROI structures available for analysis.
Dataset B consists of a series of DICOM-RT files derived from a study of human-computer user interface device (UID) modification on target volume delineation efficiency . Observers were asked to contour the stereotypic cases from several anatomical sites (representing a prostate, brain, lung, and head and neck case presentation) twice; once using a standard mouse-keyboard configuration, and once using a graphic tablet –pen interface. A total of 21 observers contoured brain, head and neck, lung and prostatic GTV/CTV ROIs once with each UID resulting in >400 collected ROI TV structures. For each of these sites, two users had been designated as ‘experts’ based on their experience and standing in the field.
Quantitative measures of conformity include volumetric measures of spatial overlap in 2D or 3D, as well as surface distance measures. Some of the most commonly used measures include :
The Dice and Jaccard coefficients are given as:
The Dice coefficient has been shown to be a special case of the kappa coefficient, a measure commonly used to evaluate inter--rater agreement. As defined, both of these measures are symmetric. However, in situations such as contouring for radiation oncology where the cost for missing the tumor is higher, false positive and false negative Dice measures can be used.
The false positive Dice (FPD) is measure of voxels that are labeled positive (i.e. 1) by the user but not the expert while the false negative Dice (FND) is a measure of the voxels that were considered positive according to the expert but missed by the user being evaluated.
The website was built using a Ruby on Rails framework2. The ruby-dicom gem3 was used for parsing the DICOM files. A PostgreSQL4 database was use to store the user information, information about the studies including location of the CT slices, information extracted from the DICOM header including names, volumes and slice information for the structures, and the metrics derived from all users. The main processing of the structures and calculation of the metrics was performed in C++ using the ITK toolkit.5 These procedures were wrapped in Ruby and called from the website to generate the report.
The flow of data and user interaction for the system is given below.
After a user logs in to the system, they can download the desired CT slices without the contours, contour the structures using their usual treatment planning system and upload the resulting DICOM RTSTRUCT file. They can then select an expert to be used as the reference. Alternatively, they can compare their structures to those created using STAPLE, as described in the previous section.
The users are then e-mailed a report containing all the chosen metric, histograms of all the corresponding metrics available in the system with their highlighted, as well as thumbnails of CT slices with their contours and those of the expert overlaid. An example for the prostate dataset for user #1 is shown below. Users can identify their place on a histogram of all users (red highlighted region). Users can then, by their relative histogram position, judge visually as well as numerically their agreement with a reference.
The users are then e-mailed a report containing all the chosen metrics, histograms of all the corresponding metrics available in the system with their values highlighted, as well as thumbnails of CT slices with their contours and those of the expert overlaid. An example of the histogram of the dice coefficients for the prostate dataset for user #1 is shown below. Users can then, by their relative histogram position, judge visually as well as numerically their agreement with an expert-derived reference. They can also perceive how they performed compared to other users of the system or compared to their metrics from previous attempts.
We have designed and implemented a contour evaluation software platform for use in radiation oncology. Our target audience for this phase was residents and other non-expert users who contour tumors and OAR as part of the process of creating a ratiation therapy plan. In order to get sufficent data to extract meaningful comparison between users, we have collected over 500 different structures for five anatomical locations.
Despite this great interest, comparatively little data has to date been presented regarding strategic optimization of target delineation itself, either as a function of standardized practices or as a function of deliberate educational curriculum. Using previously collected pilot data, we have determined that there is substantial inter-observer variability in terms of target volume reproducibility. We hope that by constructing a GUI that allows users to analyze target volume ROIs and gain meaningful “scores” regarding their performance, we will achieve more consisted target volume delineation.
We believe that the TaCTICS tool may be useful in the educational context, as well as in potentially reducing variability among users and sites during multi-institutional clinical trials. We have submitted a grant for a prospective study that aims to evaluate the effectiveness of TaCTICS is reducing inter-rater differences in target volume delineating and increasing the conformance of user contours to those of clinical experts. At present, our sample size of expert observers is limited. Our current dataset used 2 experts per anatomic subsite; we hope to expand this number substantially, in order to improve the quality of the reference data.
We will continue to add anatomical locations and additional patient studies as we acquire more data. Additionally, as users participate in utilizing this tool, we will expand our set of contours, enabling us to provide statistical data with increased sample sizes. Also, prospective studies allowing model/software validation as a training and feedback tool will assist in optimizing user interface features . These planned studies will formally evaluate usability, utility, and ease of implementation in an educational setting.
Finally, we plan create treatment plans using the more extreme contours to better understand the impact of the inter-rate variability in the contours on the final dose profiles in the tumors as well as nearby organs.
JKC was supported by a K99-R00 grant from the National Library of Medicine 1K99LM009889-01A1
C.D.F. was supported by a T32 Training Grant from the National Institutes of Health/National Institute of Biomedical Imaging and Bioengineering, (“Multidisciplinary Training Program in Human Imaging”, 5T32EB000817-04), a Technology Transfer Grant from the European Society for Therapeutic Radiology Oncology, and the Product Support Development Grant from the Society for Imaging Informatics in Medicine. The funder(s) played no role in study design, in the collection, analysis and interpretation of data, in the writing of the manuscript, nor in the decision to submit the manuscript.