|Home | About | Journals | Submit | Contact Us | Français|
3D ultrasound (US) images of the uterus may be used to adapt radiotherapy (RT) for cervical cancer patients based on changes in daily anatomy. This requires accurate on‐line segmentation of the uterus. The aim of this work was to assess the accuracy of Elekta's “Assisted Gyne Segmentation” (AGS) algorithm in semi‐automatically segmenting the uterus on 3D transabdominal ultrasound images by comparison with manual contours.
Nine patients receiving RT for cervical cancer were imaged with the 3D Clarity® transabdominal probe at RT planning, and 1 to 7 times during treatment. Image quality was rated from unusable (0)–excellent (3). Four experts segmented the uterus (defined as the uterine body and cervix) manually and using AGS on images with a ranking > 0. Pairwise analysis between manual contours was evaluated to determine interobserver variability. The accuracy of the AGS method was assessed by measuring its agreement with manual contours via pairwise analysis.
35/44 images acquired (79.5%) received a ranking > 0. For the manual contour variation, the median [interquartile range (IQR)] distance between centroids (DC) was 5.41 [5.0] mm, the Dice similarity coefficient (DSC) was 0.78 [0.11], the mean surface‐to‐surface distance (MSSD) was 3.20 [1.8] mm, and the uniform margin of 95% (UM95) was 4.04 [5.8] mm. There was no correlation between image quality and manual contour agreement. AGS failed to give a result in 19.3% of cases. For the remaining cases, the level of agreement between AGS contours and manual contours depended on image quality. There were no significant differences between the AGS segmentations and the manual segmentations on the images that received a quality rating of 3. However, the AGS algorithm had significantly worse agreement with manual contours on images with quality ratings of 1 and 2 compared with the corresponding interobserver manual variation. The overall median [IQR] DC, DSC, MSSD, and UM95 between AGS and manual contours was 5.48 [5.45] mm, 0.77 [0.14], 3.62 [2.7] mm, and 5.19 [8.1] mm, respectively.
The AGS tool was able to represent uterine shape of cervical cancer patients in agreement with manual contouring in cases where the image quality was excellent, but not in cases where image quality was degraded by common artifacts such as shadowing and signal attenuation. The AGS tool should be used with caution for adaptive RT purposes, as it is not reliable in accurately segmenting the uterus on ‘good’ or ‘poor’ quality images. The interobserver agreement between manual contours of the uterus drawn on 3D US was consistent with results of similar studies performed on CT and MRI images.
Uterine motion reduces the accuracy of external beam radiotherapy (RT) for cervical cancer,1, 2 with positional changes ranging from 2 to 60 mm between treatments.2, 3, 4, 5 To compensate for this positional uncertainty of the uterus, the planning target volume (PTV) for the primary tumor site (i.e., excluding nodal disease) is commonly generated by expanding the clinical target volume (CTV) by 6–40 mm.6 This leads to increased dose to surrounding normal tissues and incidence of adverse effects (such as both chronic and acute bladder, gastrointestinal, and hematological toxicities) and in addition, may not be sufficient for adequate uterus coverage in some cases.2, 7, 8, 9, 10, 11.
At present, most verification schedules rely on either megavoltage portal imaging or cone beam CT (CBCT) imaging of the bony anatomy. These images are commonly reviewed immediately prior to radiation delivery, and are used to correct for random errors by shifting the couch to align the patient's bony anatomy position during treatment with its position during planning (i.e., position in the CT simulation [SIM] image).12 However, a perfect bone‐match does not guarantee correspondence between the soft‐tissues; residual uncertainty regarding the shape and position of the uterus remains.1, 2 One approach to correct for this uncertainty uses fiducial markers as a surrogate for soft‐tissue imaging. Markers can be inserted into the uterus and imaged with x‐ray‐based modalities, though this is invasive and not always reliable as the fiducials can migrate.6, 13, 14
The Clarity® ultrasound‐guided RT (USGRT) system (Elekta Ltd., Stockholm, Sweden) has been developed to provide soft‐tissue imaging to improve the accuracy of RT for gynecological cancer compared with bony anatomy‐based image guidance. Briefly, the Clarity® system may be used to acquire ultrasound images in the planning CT room (US‐SIM) and treatment room (US‐Tx) frame of reference using an infrared‐tracked transducer that is spatially calibrated to the treatment co‐ordinate system.15 In the context of cervical cancer RT, this technology allows the user to localize the uterus on US with respect to the isocenter of the RT treatment room. This could enable: (a) soft‐tissue‐based couch shifts and/or (b) adaptive RT, where the uterine shape at the time of treatment is explicitly taken into account. Although soft‐tissue‐based couch shifts resulting from USGRT may improve the alignment of the uterine centroid with the treatment room isocenter, they do not address the issue of healthy‐tissue sparing because large margins to account for organ deformation are still required. Adaptive RT is therefore an attractive alternative because the RT beam aperture can be modified according to the shape and position of the target at the time of RT delivery to ensure adequate target coverage while minimizing the organ at risk (OAR) radiation exposure. Segmentation of the uterus could allow for automated selection of the plan‐of‐the‐day from a library of predefined treatment plans, or for online treatment replanning according to the patient's anatomy at each treatment fraction.5, 16, 17.
Manual contouring by an expert can be considered a gold standard for organ segmentation, though this is too time consuming to be a feasible option for online adaptive RT.18, 19 Online segmentation must be achieved on a timescale of minutes so that the additional time that the patient spends on the treatment couch during segmentation does not result in patient discomfort and/or movement, a delay in the clinical workflow, or significant natural changes in internal anatomy (such as bladder filling) that would displace the uterus from its position when it was first imaged. For such applications, a rapid method of capturing the 3D uterine outline at treatment time is greatly needed.
One method of localizing regions of interest (ROIs) at treatment is to incorporate a priori knowledge of ROI shape and size, which can be obtained from US‐SIM. The Clarity® system implements this approach by requiring a user to manually shift a Reference Positional Volume ([RPV] — the set of rigid manual ROI contours drawn on the US‐SIM image) to best match the apparent position of the ROI as visualized by US‐Tx. This allows for estimation of the ROI centroid position for soft‐tissue‐based patient setup. However, in the context of adaptive RT, this approach requires that the ROI undergo little or no deformation throughout the course of treatment so that the RPV is still a valid representation of the patient's anatomy at the time of radiotherapy delivery. As the large amount of deformation occurring in the uterus violates this constraint, rigid registration‐based techniques (including Clarity's® RPV method) for localizing the uterus at the time of treatment are not suitable for adaptive radiotherapy, as shown in Fig. Fig.11.28
Example of rigid registration ROI localization technique, where a Reference Positional Volume (RPV) from SIM is used to localize the uterus as visualized by US at treatment (US‐Tx). Note that the RPV is often a poor representation of the uterus ...
An alternative to manual contouring is to use a segmentation algorithm to automatically or semiautomatically (i.e., where user‐interaction is required) contour the uterus in 3D in place of an expert. To our knowledge, Elekta is the first to develop an automated solution for segmenting the uterus on 3D transabdominal US images via the “Assisted Gyne Segmentation” (AGS) tool.20 However, similar to the RPV method, the AGS tool is currently only used to guide soft‐tissue‐based couch shifts according to the apparent centroid position at treatment.
There may be considerable patient benefit in adaptive RT from employing a method that can automatically, and hence rapidly, segment the 3D uterine shape on 3D US images. However, neither the AGS tool nor any other method for automatically segmenting the uterus has yet been assessed for its accuracy and hence potential for application in adaptive RT. In this work, the following research questions were addressed:
All analyses were performed on 3D transabdominal US images acquired from nine cervical cancer patients.
Nine patients receiving radiotherapy for cervical cancer were included in this study: six from Herlev Hospital, Copenhagen, Denmark (23 US images acquired) and three from the Royal Marsden NHS Foundation Trust, London, UK (21 US images acquired). Ethics approval for these studies was obtained from the ‘De Videnskabsetiske Komiteer’ and the ‘NHS Research Ethics Committees (reference: 15/LO/1438)’, respectively. Median patient age was 49.5 yr (range 36–65 yr), median body mass index (BMI) was 27.6 (range 21.5–40.7), and median FIGO cervical cancer stage was IIB (range IIB–IIIB). The six patients from Herlev were instructed not to pass urine approximately 1 hour prior to RT treatment. The three patients from the Royal Marsden Hospital were asked to drink 200 mL of liquid and to refrain from passing urine in the hour prior to treatment. After being positioned on the couch, 3D transabdominal US images of the uterus were acquired for each patient at 2 to 8 times (once at US‐SIM and 1‐7 times at US‐Tx) during the course of treatment. All scans were acquired with the Clarity® USGRT system (Clarity® Model 310C00, Elekta, Montreal, Canada), using a 3D mechanically swept convex 5 MHz transducer (m4DC7‐3/40), with the pressure between the US transducer and the patient's skin as low as possible to minimize soft‐tissue displacement.
Manual Segmentation: Four experts [two clinical oncologists (IMW and SL), one radiologist (KD), and one researcher trained by an oncologist (SAM)] manually contoured the uterus in the sagittal plane on a RayStation 5.0 workstation (RaySearch Laboratories, Stockholm, Sweden) for all US‐SIM and US‐Tx images analyzed. In this study, the ‘uterus’ is referred to as a single structure containing both the uterine body and cervix.
AGS segmentation: The core of the AGS tool is a discrete dynamic contouring (DDC) algorithm, which is a gradient‐based segmentation technique commonly used in prostate segmentation applications.22 Elekta have adapted the methods employed by Ladak et al.,18 Hu et al.,23 and Ghanei et al.,24 such that the algorithm semiautomatically segmented the uterus on US. The same four experts who performed the manual uterine segmentations used the AGS tool to segment the uterus on all US image volumes. This required an initialization step where four hint points were placed on uterine features (the uterine fundus, both isthmus points, and base of the cervix) on a central sagittal slice (Fig. (Fig.22).
(a) Snapshot of AGS tool user interface and (b) resulting segmentation, where the placement of four anatomical hint points is required to initialize AGS segmentation. The hint points are the uterine fundus and base of the cervix (squares), and the anterior ...
Each 3D US image was rated twice on a 4‐point scale according to the criteria listed in Table 1 by one observer (SAM), with at least 10 days in between ratings of the same image. Any image receiving a rating of 0 at least once was excluded from further analysis. The final rating for the remaining images was calculated as the mean rating for each image, rounded to the nearest integer.
Scale used to rate image quality of transabdominal US uterine scans
Interobserver manual contouring variation was assessed by measuring the pairwise agreement between the four manual contours drawn on each US image; i.e., each observer contour was compared with the other three observers’ contours giving 12 pairwise comparisons per image. The accuracy of the AGS tool was quantified by measuring its agreement with manual contours via pairwise analysis; i.e., each AGS contour was compared with each manual contour, giving 16 pairwise comparisons per image. In some instances, the AGS algorithm did not produce a contour at all; these cases were referred to as failed segmentation attempts, and were excluded from the quantitative analyses. The AGS segmentation attempts that failed were reported as a percentage of all AGS segmentations attempted. In all cases, ‘contour agreement’ was assessed using the following four metrics, where A and B represent hypothetical 3D contours:
Interobserver manual contour agreement: A Wilcoxon rank sum test with Bonferroni correction was used to test for differences in DC, DSC, MSSD, and UM95 between manual contours in each image quality rating group (1, 2, and 3) to see whether agreement between observers increased with improving image quality.
AGS segmentation accuracy: A Wilcoxon rank sum test was used to test for differences in DC, DSC, MSSD, and UM95 between AGS and manual contours for all images, and when the images where grouped according to image quality (ratings 1, 2, and 3). The interobserver manual contour agreement was used as a benchmark to gauge the performance of semiautomatic segmentation methods; ideally, the agreement between an algorithmically derived contour and a manually derived contour should be the same as the variation in agreement between manual contours. To investigate whether better image quality improved AGS segmentation performance, a Wilcoxon rank sum test with Bonferroni correction was used to test for differences in DC, DSC, MSSD, and UM95 within each group (image quality ratings of 1, 2, and 3).
Image quality rating: 35 of the 44 US images acquired had an image quality rating of 1 or higher, and were included in subsequent quantitative analyses: 6/35, 18/35, and 11/35 US images received ratings of 1, 2, and 3, respectively. Interobserver manual contour agreement: The median [interquartile range (IQR)] DC, DSC, MSSD, and UM95 results for the interobserver manual contouring variation are given in Table 2, and Fig. Fig.3.3. The overall medians [IQR] for the DC, DSC, MSSD, and UM95 were 5.41 [5.0] mm, 0.78 [0.11], 3.20 [1.8] mm, and 4.04 [5.8] mm, respectively. Images with a quality rating of 2 had a significantly lower (P < 0.05) DC, DSC, and MSSD than images with a quality ratings of 1 or 3 in every metric but UM95 (Table 2). There was no statistical difference between images with a ranking of 1 and 3 in any of the agreement metrics considered.
Boxplot showing interobserver variability between manual contours (shaded boxes) and the accuracy of the AGS algorithm as measured by agreement with manual contours (white boxes). The asterisks denote statistical differences between manual and AGS segmentations ...
Col 3: Agreement (median [IQR]) between manual contours from different observers and Col 4: accuracy of the AGS tool, measured by pairwise analysis with manual contours. Symbols indicate statistical differences (P < 0.05) between image ratings ...
AGS contours acquired: Out of 140 attempts at using the AGS tool to segment the uterus (35 US images * 4 observers), 113 AGS contours were successfully obtained (80.7%), whereas the algorithm failed to return a result in 27 cases (19.3%). The 27 cases with no result were excluded from the quantitative analysis.
AGS segmentation accuracy: The median [IQR] DC, DSC, MSSD, and UM95 results for the AGS segmentation accuracy are given in Table 2. The AGS segmentations had a significantly better accuracy (i.e., agreement with manual contours) on images with a rating of 3 than images rated 1 or 2. However, there was no difference in segmentation performance between rating 1 images and rating 2 images. The AGS algorithm agreed with manual contours on images that received a rating of 3, as there was no significant difference between them in all metrics considered (Fig. (Fig.3).3). However, the AGS algorithm was less accurate in segmenting the uterus on rating 1 images according to all metrics considered, and also less accurate on rating 2 images according to DSC, MSSD, and UM95. Overall, the AGS algorithm was statistically equivalent to manual contouring in terms of DC and DSC, but not in terms of MSSD and UM95.
Image quality rating: Low bladder volume and high BMI can increase the attenuation of US and reduce image quality.32, 33 Not only does a full bladder help with tissue sparing in RT treatment for cervical cancer but it also has the added benefit of providing an acoustic “window" to the uterus, as urine has a low US attenuation coefficient compared with surrounding tissues. Patients with a high BMI are likely to have a greater amount of adipose tissue through which the US must travel, which may be important because fat has a relatively low speed of sound and its presence can cause image aberrations due to acoustic refraction, wave aberration, reverberations, steering errors, focusing errors, and spatially dependent image scale miscalibration. These factors may explain why eight of the nine of the unusable images (i.e., received an image rating of ‘0’) were acquired from patients who did not follow a stringent drinking protocol (the Herlev cohort), and why four of the nine unusable images were obtained from the same patient, who had the highest BMI (36.5) of the patients included in this study. Additionally, care was taken to apply low pressure to the abdomen when acquiring the US images to avoid internal soft‐tissue displacement; though this is crucial for RT applications, this comes at the cost of poorer image quality as contact between the transducer and the skin surface is decreased.34, 35 A larger study is needed to investigate methods of overcoming these challenges associated with implementing US guidance in adaptive RT to reduce the risk of obtaining an unusable image. One potential solution could be to ensure an adequate level of bladder filling at the time of treatment by enforcing a stringent drinking protocol, or by finding ways to compensate for variables such as poor hydration over the previous twenty four hours prior to treatment or reduced bladder capacity often occurring during treatment. Another solution could be establishing inclusion/exclusion criteria to identify good candidates for transabdominal US scanning. However, it should be noted that even without such measures in place, approximately 80% of the US images acquired in this study were used to successfully identify the position and shape of the uterus at the time of RT treatment.
Interobserver manual contour agreement: The DC, DSC, and MSSD values reported here (medians of 5.4 mm, 0.78, and 3.20 mm, respectively) are consistent with those reported in similar studies, though a direct comparison was not possible due to differences in: imaging modalities used, the disease status of the cohort investigated, the anatomical site contoured, and the number of observers. Baker et al. reported a median DC of 6.0 mm between contours of two observers in manually delineating the uterus on 3D US on a healthy volunteer cohort.36 In the literature, reported values of the DSC between manual contours drawn on CT and MRI images for a variety of anatomical sites ranged from ~ 0.7–0.9825, 26, 37, 38 with ~ 0.7–0.8 generally considered acceptable.25, 26, 27 The MSSD between manual contours drawn on US, CT, and MRI images reported in the literature for a variety of anatomical sites ranged from ~ 1–5 mm.26, 31, 39, 40 The fact that the UM95 required to overcome interobserver contouring variability in this study (median [IQR] of 4.04 [5.8] mm) was much smaller than the interfractional uterine motion commonly observed, (which can be as much as 60 mm) supports the idea that USGRT could reduce the size of the margins needed to compensate for organ motion, even in the presence of contouring uncertainties.5
As shown in Fig. Fig.4,4, common areas of disagreement between manual contours observed in this study arose from determining the left–right extent of the uterus, and distinguishing the base of the cervix from the top of the vagina. This may be attributed to problems associated with contouring in the sagittal plane. The agreement between manual contours did not correlate with improving image quality, despite the uterine boundary becoming sharper in higher quality images. This may be due to the experts’ abilities to infer the boundary of the uterus in places where it was obscured using prior knowledge of uterine shape and/or relative orientation of other anatomical landmarks in the US field of view. Even in the presence of these sources of disagreement, the manual contour agreement reported here is comparable with previous contouring variability studies, indicating that the uterus can be visualized with 3D transabdominal US at the time of RT treatment. Furthermore, USGRT could be dosimetrically beneficial to cervical cancer patients as the component of the margin needed to compensate for contouring variability (represented by the UM95) is still much smaller than the margin that is needed to compensate for uterine motion without any form of soft‐tissue guidance.
Heatmaps showing interobserver manual contour variability displayed on the uterine isosurface. Dark blue (black in print) indicates 0 mm deviation, and yellow (white in print) indicates > 10 mm deviation. The scale of the heatmap is different ...
AGS tool performance: When applied to images acquired from cervical cancer patients at RT treatment, the AGS tool failed to return a result in nearly 20% of segmentation attempts, which is unacceptable for use in adaptive RT considering that an ideal segmentation method should produce a result in 100% of segmentation attempts. This occurred in cases where the image quality rating was 2 or lower, indicating that a clearly defined boundary in all three anatomical planes is required to ensure that the AGS tool functions. Potential solutions for improving the image quality such that the probability of AGS returning a result is increased may include introducing a selection criteria at baseline to identify patients who have characteristics conducive to obtaining excellent US images (e.g., low BMI), or applying US image processing/acquisition techniques such as speckle reduction or image compounding to improve the contrast to noise ratio between the uterus and background tissues.42, 43, 44
In the 80% of cases where a result was returned, the values of DC, DSC, and MSSD between AGS and manual contours were dependent on image quality. The agreement between the AGS algorithm and manual contours was statistically equivalent to the interobserver agreement between manual contours for images with a rating of 3; this indicates that the AGS algorithm can accurately segment the uterus on US images containing virtually no imaging artifacts/imperfections. This is shown in column 1 of Fig. Fig.5,5, where the AGS (red) segmentations agree well with the manual (green) segmentations in on the US images with distinct, continuous uterine boundaries. Note that in these cases, the patients all had full bladders extending across the length of the uterus. However, the majority of the US images acquired in this study had some form of image artifact partially obscuring the true uterine boundary (image quality ratings 1 and 2). In these cases, the AGS algorithm performance was significantly poorer than its manual segmentation counterpart on all metrics considered (with the exception of the DC on rating 2 images), which may be attributed to the fact gradient‐based algorithms are susceptible to errors due to the speckle, shadowing, and signal variation with ultrasound beam angle commonly present in US images taken of cervical cancer patients during RT treatment,18 as shown in columns 2 and 3 of Fig. Fig.5.5. In these examples, the image artifacts either caused the AGS contour to deviate from the true uterine boundary (as defined by the manual segmentations), or confounded the US image to extend that the resulting shape of the uterus defined by the AGS tool was either corrupted, or unobtainable, despite good agreement between the corresponding manual contours. Furthermore, the statistical analyses performed to check for differences in AGS algorithm accuracy between image rating groups showed that AGS segmentations on images with a rating of 3 were significantly better than AGS segmentations on images with ratings 1 or 2.
Examples of manual (green online, light gray in print) and AGS (red online, dark gray in print) contours superimposed over a central slice of corresponding 3D US images. Images are grouped by column according to image quality rating. Labels in bottom ...
When comparing the overall performance of the AGS algorithm with the interobserver manual contours, there were significant differences in MSSD and UM95, but no significant differences in DC or DSC. Note that (a) DC does not take shape into account and (b) the DSC is only sensitive to changes in shape if that shape is accompanied by changes in the volume of overlap; for example, thin extrusions of the contour produced by the AGS algorithm in the presence of shadowing or speckle had little effect on the DSC, (c) the MSSD is a direct measure of contour surfaces, and therefore much more sensitive to local deviations in shape, and (d) the UM95 represents the volume expansion needed to account for contouring errors. Taking this into account, the statistical results were interpreted to mean that even though the AGS tool may be sufficient in terms of centroid position and volume, it's overall shape was often incorrect. This is of great concern when considering adaptive RT, which aims to modify the beam aperture such that it conforms to the boundary of the target. Furthermore, this difference in shape manifested itself in an increase in the UM95, suggesting that AGS segmentation errors would likely have a dosimetric effect.
Future work: This work highlights that there remains a need for a segmentation technique that is capable of conforming to the uterine boundaries at the time of treatment to accurately represent the position and shape of the RT target. Although the AGS tool is capable of achieving this in US images with excellent image quality, it is inaccurate and unreliable in images where the uterine boundary is blurred or partially obstructed. To overcome some of the pitfalls of the AGS tool, a new algorithm is being developed that is less dependent on image gradient to semiautomatically segment the uterus; one potential solution includes incorporating shape models into a gradient‐based segmentation framework to overcome errors associated with US shadowing.29, 45 Additional work will investigate methods of improving US image quality, image processing techniques to further distinguish the uterus from surrounding tissues, quantitative methods of directly comparing other imaging modalities (such as MRI, CT, and CBCT) with US in the ability to accurately represent the uterus, and dosimetric studies assessing the relationship between uterine segmentation accuracy and target coverage and OAR sparing.41
The good agreement between manual contours when compared with results from other imaging modalities such as CT and MRI supports the use of transabdominal US to visualize the uterus prior to RT treatment for cervical cancer patients. The AGS tool was able to accurately determine the uterine shape of cervical cancer patients as well as manual contouring in cases where the image quality was excellent, but not in cases where image quality was degraded by common artifacts such as shadowing and signal attenuation. The AGS tool should be used with caution for adaptive RT purposes, as it is not reliable in accurately segmenting the uterus on ‘good’ or ‘poor’ quality images. However, there may be potential to improve the performance of the AGS algorithm if the US image quality is improved. The unreliable performance of the AGS tool highlights a continuing need for a rapid method of segmenting the uterus at treatment to obtain both uterine position and shape; this is a critical step in implementing US‐guided adaptive RT for patients with cervical cancer.
The data collected at the Herlev Hospital, University of Copenhagen site was part of a 3‐year PhD research project, which was granted by Elekta Inc.
The authors thank Helen McNair and Naomi Hopkins for their help in coordinating the patient trial at the Royal Marsden Hospital site. Research at The Institute of Cancer Research is also supported by Cancer Research UK under Programme Grant C33389/A19727. We acknowledge NHS funding to the NIHR Biomedical Research Centre at The Royal Marsden and The Institute of Cancer Research. We also thank Gullander L, Pedersen S K, Zarp T, Jacobsson S, Pazhang S, and Lynnerup VK for helping to conduct the study at the Herlev Hospital, University of Copenhagen site. We also acknowledge Elekta's support for the this study.