|Home | About | Journals | Submit | Contact Us | Français|
Objectives To prospectively assess the diagnostic performance of simple ultrasound rules to predict benignity/malignancy in an adnexal mass and to test the performance of the risk of malignancy index, two logistic regression models, and subjective assessment of ultrasonic findings by an experienced ultrasound examiner in adnexal masses for which the simple rules yield an inconclusive result.
Design Prospective temporal and external validation of simple ultrasound rules to distinguish benign from malignant adnexal masses. The rules comprised five ultrasonic features (including shape, size, solidity, and results of colour Doppler examination) to predict a malignant tumour (M features) and five to predict a benign tumour (B features). If one or more M features were present in the absence of a B feature, the mass was classified as malignant. If one or more B features were present in the absence of an M feature, it was classified as benign. If both M features and B features were present, or if none of the features was present, the simple rules were inconclusive.
Setting 19 ultrasound centres in eight countries.
Participants 1938 women with an adnexal mass examined with ultrasound by the principal investigator at each centre with a standardised research protocol.
Reference standard Histological classification of the excised adnexal mass as benign or malignant.
Main outcome measures Diagnostic sensitivity and specificity.
Results Of the 1938 patients with an adnexal mass, 1396 (72%) had benign tumours, 373 (19.2%) had primary invasive tumours, 111 (5.7%) had borderline malignant tumours, and 58 (3%) had metastatic tumours in the ovary. The simple rules yielded a conclusive result in 1501 (77%) masses, for which they resulted in a sensitivity of 92% (95% confidence interval 89% to 94%) and a specificity of 96% (94% to 97%). The corresponding sensitivity and specificity of subjective assessment were 91% (88% to 94%) and 96% (94% to 97%). In the 357 masses for which the simple rules yielded an inconclusive result and with available results of CA-125 measurements, the sensitivities were 89% (83% to 93%) for subjective assessment, 50% (42% to 58%) for the risk of malignancy index, 89% (83% to 93%) for logistic regression model 1, and 82% (75% to 87%) for logistic regression model 2; the corresponding specificities were 78% (72% to 83%), 84% (78% to 88%), 44% (38% to 51%), and 48% (42% to 55%). Use of the simple rules as a triage test and subjective assessment for those masses for which the simple rules yielded an inconclusive result gave a sensitivity of 91% (88% to 93%) and a specificity of 93% (91% to 94%), compared with a sensitivity of 90% (88% to 93%) and a specificity of 93% (91% to 94%) when subjective assessment was used in all masses.
Conclusions The use of the simple rules has the potential to improve the management of women with adnexal masses. In adnexal masses for which the rules yielded an inconclusive result, subjective assessment of ultrasonic findings by an experienced ultrasound examiner was the most accurate diagnostic test; the risk of malignancy index and the two regression models were not useful.
When deciding on the type of surgery for a patient with an adnexal mass, estimating the risk of malignancy is essential. Benign masses can be managed conservatively or with laparoscopy, avoiding unnecessary costs and morbidity. On the other hand, peri-operative rupture of a stage I ovarian cancer may worsen the prognosis.1 When malignancy is suspected, referral to a gynaecological oncologist is needed for proper staging and debulking surgery.
Transvaginal ultrasonography is an excellent tool for discriminating between benign and malignant adnexal masses. Several studies have shown that the risk of malignancy is very low in unilocular ovarian cysts.2 3 4 5 The presence of morphological features other than a unilocular cyst, such as papillary structures and solid areas, as well as increased vascularity as determined by Doppler ultrasound, is associated with a variably increased risk of malignancy.2 5 Optimisation of the diagnostic performance of transvaginal sonography by creating predictive models with the use of scoring systems, logistic regression analysis, neural networks, and support vector machines has been attempted. However, when these models were tested prospectively, they performed less well than was originally reported.6 7 8 9 10 11 The risk of malignancy index is the test recommended by the Royal College of Obstetricians and Gynaecologists and in a recent review by Geomini et al.11 12 On the other hand, two logistic regression models (logistic regression model 1 and logistic regression model 2) developed in the International Ovarian Tumour Analysis (IOTA) study performed as well in new centres as in the units where the models were first developed,13 but no model or biochemical marker of ovarian malignancy has been shown to be superior to subjective assessment of grey scale and colour Doppler ultrasonic findings by an experienced ultrasound examiner.8 14 15 Unfortunately, the expertise of experienced ultrasound examiners is not easily transferred to less experienced examiners. Less experienced examiners might be helped by scoring systems and risk calculation models, but a criticism has been that the ultrasonic information required in some ultrasound based risk calculation models is too difficult to obtain outside specialist centres.16
In a previous report, we used data collected in the first phase of the IOTA study to develop simple and clinically useful ultrasound based rules for discriminating between benign and malignant adnexal masses.5 We developed simple rules and then temporally validated them prospectively in a small group of patients (n=507). On temporal validation, the simple rules yielded a conclusive result in 76% of all tumours, for which they resulted in a sensitivity with regard to malignancy of 95%, a specificity of 91%, a positive likelihood ratio of 10.5 and negative likelihood ratio of 0.06.5 We concluded that “most adnexal tumours in an ordinary tumour population can be correctly classified as benign or malignant using simple ultrasound-based rules. For tumours that cannot be classified using simple rules, ultrasound examination by an expert examiner might be useful.”5
The aim of the study reported here was to do a prospective temporal and external validation in a large study population to assess the ability of the previously published simple ultrasound based rules to distinguish between benign and malignant adnexal masses before surgery. A secondary aim was to determine the diagnostic performance of subjective assessment of ultrasonic findings by an experienced ultrasound examiner, the risk of malignancy index,12 and the logistic regression models 1 and 2 when used in tumours for which the simple rules yield an inconclusive result.
In this prospective study, the IOTA phase 2 study, we examined the performance of the simple rules in a population of women who had surgery for an adnexal mass. Local clinicians made the decision to operate on the basis of local rules and clinical judgment. We followed the guidelines of the standards for the reporting of diagnostic accuracy studies initiative.17
We tested the rules both in the same seven centres where they had been developed (old centres5 18) and in a further 12 centres that had not participated in any IOTA study before (new centres). In total, 19 centres from eight countries participated.
We included patients who presented with at least one adnexal mass and who had an ultrasound examination by a principal investigator at one of the participating centres. In the case of bilateral adnexal masses, we included the mass with the most complex ultrasonic morphology in our statistical analysis. If both masses had similar ultrasonic morphology, we included the largest one or the one most easily accessible by transvaginal ultrasound. We excluded patients who were pregnant or refused transvaginal ultrasonography and those who did not have surgical removal of the mass within 120 days after the ultrasound examination.
A dedicated, secure data collection system was developed for the study (IOTA 2 study screen, astraia, Munich, Germany). A unique identifier was generated automatically for each patient’s record. Clinicians at each centre could view or update only patients’ records from their own centre. We ensured data security by not transferring the patient’s name and by encrypting all data communication. Data integrity and completeness were ensured by client side checks in the astraia system and manual checks by one biostatistician and two expert ultrasound examiners.
A standardised history was taken in the same manner as in the IOTA phase 1 study.18 It included information on personal history of ovarian cancer and breast cancer; the number of first degree relatives with ovarian cancer or breast cancer; and the patient’s age, menopausal status, and current hormone treatment. Women aged 50 years or more who had undergone hysterectomy before menopause were defined as postmenopausal.
In all cases, a principal investigator at the participating centres did a transvaginal ultrasound scan in the same standardised manner as in the IOTA phase 1 study.18 The principal investigators were fully trained gynaecologists or radiologists with a special interest in gynaecological ultrasound and more than five years’ experience in this field. They used a variety of ultrasound machines with transvaginal probe frequencies ranging between 5 and 12 MHz. The investigators also used transabdominal ultrasonography to examine large masses that could not be seen in their entirety by using a transvaginal probe. They used grey scale and colour Doppler ultrasound images to obtain morphological and blood flow variables to characterise each adnexal mass. Details of the ultrasound examination technique and the ultrasound terms and definitions used have been described elsewhere.18 19 Finally, the investigator stated whether the mass was likely to be malignant or benign on the basis of subjective evaluation of ultrasonic findings (“subjective assessment”). The ultrasonic information was recorded prospectively and locked at the time of the examination and so could not be changed after surgery. We calculated the risk of malignancy by using the IOTA logistic regression models 1 and 2 centrally after the conclusion of the study, ensuring that these logistic regression models had no role in the decision making process. The same was true of the simple rules.
The reference standard was the histological diagnosis and, in case of malignancy, the surgical stage. Surgery was carried out by laparoscopy or laparotomy, according to the surgeon’s judgment. The excised tissues were examined histologically at the local centre. The pathologists had no knowledge of the ultrasound results. We classified tumours according to the criteria recommended by the International Federation of Gynaecology and Obstetrics.20
We applied to the tumours the simple ultrasound based rules that have been described in detail in a previous report.5 Briefly, we used five ultrasonic features to predict a malignant tumour (M features): irregular solid tumour (M1), ascites (M2), at least four papillary structures (M3), irregular multilocular solid tumour with a largest diameter of at least 100 mm (M4), and very high colour content on colour Doppler examination (M5). We used five ultrasonic features to predict a benign tumour (B features): unilocular cyst (B1), presence of solid components for which the largest solid component is <7 mm in largest diameter (B2), acoustic shadows (B3), smooth multilocular tumour (B4), and no detectable blood flow on Doppler examination (B5). If one or more M features were present in the absence of a B feature, we classified the mass as malignant (rule 1). If one or more B features were present in the absence of an M feature, we classified the mass as benign (rule 2). If both M features and B features were present, or if none of the features was present, the simple rules were inconclusive (rule 3).
The logistic regression model 1 was based on the age of the patient (in years), the presence of ascites, the presence of blood flow within a papillary projection, the largest diameter of the solid component (in millimetres, but with no increase >50 mm), the presence of irregular internal cyst walls, the presence of acoustic shadows, personal history of ovarian cancer, current hormonal treatment, the largest diameter of the lesion (mm), tenderness of the lesion during the examination, the presence of a purely solid tumour, and the colour score (1, 2, 3, or 4). The simpler logistic regression model (model 2) used only the first six variables. As suggested in the original publication, an estimated probability of malignancy above 0.10 by logistic regression model 1 or 2 classified the mass as malignant.18
We determined the risk of malignancy index by using the ultrasonic findings, the menopausal status, and the serum CA 125 concentration.12 We assessed five ultrasonic features suggestive of cancer in an ultrasound score (U): multilocularity, solid areas, bilateral masses, ascites, and evidence of metastases. U was 0 when none of these features was present, 1 if one feature was present, and 3 if two or more features were present. We assigned a score (M) of 1 to premenopausal women and a score of 3 to postmenopausal women. We defined the risk of malignancy index as U×M×serum CA 125 concentration (U/mL). As suggested in the original publication, a risk of malignancy index of more than 200 classified the mass as malignant.12
We compared the simple rules with subjective assessment by an experienced ultrasound examiner, the risk of malignancy index, and the logistic regression models 1 and 2 in cases in which the simple rules yielded a conclusive result. We also assessed the performance of a strategy in which the simple rules were used as a triage test,21 with a second stage test (subjective assessment, risk of malignancy index, or logistic regression model 1 or 2) being used for masses for which the simple rules yielded an inconclusive result.
We expressed diagnostic performance in terms of sensitivity and specificity. We used Wilson’s method to calculate the 95% confidence limits of binomial proportions. We used McNemar’s test to determine the statistical significance of differences in paired binomial proportions: sensitivity and specificity. We determined the statistical significance of differences in categorical data for unpaired comparisons by using the χ² test. We used SAS system release 9.2 for statistical analyses.
We enrolled 1970 patients between November 2005 and October 2007. Of these, we excluded 32 (1.6%) for the following reasons: no surgical removal of the mass within 120 days after the ultrasound examination (n=15), pregnant at the time of the examination (n=12), errors in data entry (n=4), and protocol violation (n=1) (figure(figure).). We thus included data from 1938 patients. The mean age was 46 (range 11-94) years, 38% (742) of the patients were postmenopausal, 41% (793) were nulliparous, and 11% (214) were receiving hormonal treatment. Of the tumours, 542 (28%) were malignant, including 111 (20%) borderline masses, 373 (69%) primary invasive masses, and 58 (11%) metastatic masses.
Table 11 shows the predictive value of each ultrasonic feature used in the simple rules. In total, the simple rules yielded a conclusive result (rule 1=malignant, rule 2=benign) for 1501 of the tumours. This corresponds to 77% (1501) of all masses in the dataset. The malignancy rate was 25% (369/1501) in masses for which the simple rules yielded a conclusive result, compared with 40% (173/437) in the remainder (P<0.001). In 456 cases at least one feature for a malignant tumour (M feature) was present, and in 389 (85%) of these no B feature was present. Of the 389 masses predicted to be malignant by the simple rules, 87% (340) were malignant according to histology. In 1179 cases at least one B feature was present, and in 1112 (94%) of these no M features were present. Of the 1112 masses predicted to be benign by the simple rules, 97% (1083) were benign according to histology.
Among the tumours for which the simple rules yielded a conclusive result, they had a sensitivity of 92% (340/369) and a specificity of 96% (1083/1132) (table 22).). Among these tumours, the sensitivity and specificity of subjective assessment were similar to those of the simple rules: 91% (336/369) (P=0.35) and 96% (1083/1132) (P=1.0). Subjective assessment missed 33 cancers (false negative) and gave 49 false positive diagnoses. The simple rules missed 29 cancers and gave 49 false positive diagnoses. The simple rules performed similarly in “old” and “new” centres: the sensitivity was 93% (179/192) in the old centres and 91% (161/177) in the new centres (P=0.42), and the specificity was 95% (487/513) in the old centres and 96% (596/619) in the new centres (P=0.27). The sensitivity of the simple rules was similar in premenopausal and postmenopausal patients (91% (102/112) v 93% (238/257); P=0.62) but the specificity was higher in the premenopausal patients (97% (829/857) v 92% (254/275); P=0.004). The simple rules yielded a conclusive result more often in premenopausal patients than in postmenopausal patients (81% (969/1196) v 72% (532/742); P<0.001).
Table 33 shows the diagnostic performance of subjective assessment, logistic regression models 1 and 2, and the risk of malignancy index among the tumours for which the simple rules yielded an inconclusive result (rule 3). Among these tumours, the diagnostic performance of subjective assessment by the ultrasound examiner was superior to that of logistic regression model 1, logistic regression model 2, and the risk of malignancy index in both premenopausal and postmenopausal patients. Logistic regression model 1 and logistic regression model 2 had low specificity, whereas the risk of malignancy index had low sensitivity. The specificity of logistic regression models 1 and 2 was significantly lower than that of subjective assessment (47% and 50% v 80%, P<0.001 for both comparisons among all patients; 64% and 63% v 85%, P<0.001 for both comparisons among premenopausal patients; 21% and 30% v 73%, P<0.001 for both comparisons among postmenopausal patients). The sensitivity of the risk of malignancy index was significantly lower than that of subjective assessment (50% v 89% among all patients, P<0.001; 32% v 84%, P<0.001 among premenopausal patients; 63% v 92%, P<0.001 among postmenopausal patients).
If the simple rules were used in all tumours, the sensitivity was 63% (340/542) and the specificity was 78% (1083/1396). If the simple rules were used as a triage test, and subjective assessment of ultrasound findings was used for those masses for which the simple rules yielded an inconclusive result (figure(figure),), the test performance was as follows: sensitivity 91% (494/542) (63% (340/542) by simple rules and 28% (154/542) by subjective assessment), and specificity 93% (1294/1396) (78% (1083/1396) by simple rules and 15% (211/1396) by subjective assessment). This performance was similar to that of using subjective assessment in all tumours, which had a sensitivity of 90% (490/542) (P=0.35) and a specificity of 93% (1294/1396) (P=1.0) (table 44).
The simple rules yielded a conclusive result in most benign tumours (81%, 1132/1396) and in most primary invasive tumours (74%, 275/373) but in only half of the borderline tumours (50%, 56/111) (table 55).). The performance of the simple rules was poor for abscesses, fibromas, and serous borderline stage I tumours.
In this study, we have prospectively validated the ability of the IOTA simple ultrasound rules to discriminate between benign and malignant adnexal masses. The results of this study confirmed that when the rules yielded a conclusive result, they reliably discriminated between benign and malignant adnexal masses. They did so just as well as did subjective assessment by an experienced ultrasound examiner. The rules worked well both on temporal validation in the centres where they had been developed and on external validation in the new centres. This confirms that the rules are generalisable. The test performance of a strategy in which the simple rules were used as a triage test and subjective assessment of ultrasonic findings was used as a second stage test in those masses for which the rules yielded an inconclusive result (sensitivity 91% and specificity 93%) was similar to that of using subjective assessment by an experienced examiner in all tumours (sensitivity 90% and specificity 93%). Because few clinicians have special skills in the ultrasound examination of ovarian pathology, a reliable test that can be used effectively by all ultrasound examiners is needed. The simple rules have the potential to become that test.
A strength of this study was its prospective and multicentre design. As the data were collected in different countries with patients with different characteristics, the simple rules are likely to prove applicable and to perform well in other populations. Another strength is the large number of patients studied with a detailed predefined protocol with agreed terms, measurement technique, and definitions. We did both a temporal and an external validation. Because the results were virtually identical in the old and new centres, we can justify reporting the results for the old and new centres together to take advantage of a larger study population and be able to estimate the measures of performance, such as sensitivity and specificity, with greater precision.
A limitation of the study is that all the examinations were done by experienced ultrasound examiners. Validation of the simple rules by less experienced examiners is needed. The purpose is to use the rules for triaging patients for referral to an examiner specialised in gynaecological ultrasound.
Previous studies on pre-operative characterisation of adnexal masses as benign or malignant were mostly small and single centre.7 8 9 12 We have previously developed and tested the simple rules in a multicentre study.5 However, Altman and colleagues wrote that neither internal nor temporal validation examines the generalisability of a model, for which using new data collected from an appropriate patient population in a different centre is necessary.22 This study contains the first prospective temporal and external validation of the simple rules, and the results of our external validation confirm the generalisability of these rules.
The main advantage of the simple rules is their simplicity. The ultrasonic variables are straightforward to obtain, and the rules are easier to use in clinical practice than are many mathematical models. By using a simple tick box system, a result can be produced rapidly without the need for computer software. Moreover, contrary to when the risk of malignancy index is used, no blood sample for a serum CA 125 is needed. The simple rules are therefore likely to be an ideal tool to help less experienced ultrasound examiners to differentiate between benign and malignant tumours. The disadvantage of the simple rules is that they yield an inconclusive result in about 25% of all tumours, whereas mathematical models yield a useful result in all masses.18 When using the simple rules, therefore, having established appropriate referral patterns to a specialist in gynaecological ultrasound for cases in which the simple rules yield an inconclusive result is important. The risk of malignancy index or the two IOTA logistic regression models 1 and 2 are not good enough for discrimination between benign and malignant tumours when the simple rules yield an inconclusive result. The rules seem to work less well for abscesses, fibromas, and stage I serous borderline tumours. These conditions are also difficult to classify with subjective assessment of ultrasonic findings.23 Future research needs to determine the performance of the simple rules when used by less experienced ultrasound examiners and whether the use of the simple rules will improve care of patients and reduce costs.
Because the simple rules offer a straightforward approach to correctly characterise about 75% of adnexal masses, their use should enable all sonographers and general gynaecologists to reliably distinguish between benign and malignant adnexal masses in most cases. Where the rules yield an inconclusive result, we propose referring the patient for subjective assessment of ultrasonic findings by an experienced ultrasound examiner, because this provides the most accurate diagnosis. If we use the simple rules as a triage test and subjective assessment by an experienced ultrasound examiner as a second stage test in those masses for which the simple rules yield an inconclusive result, we obtain the same diagnostic performance as when subjective assessment is used in all masses. In this way, the use of the simple rules has the potential to improve the management of women with adnexal masses.
Contributors: DT, LA, and TB developed the idea for this study. CVH, JV, LA, and DT did multiple manual and automated quality checks on the dataset, and LA and SVH did the statistical analysis. DT, JV, TB, LA, and LV wrote the paper. DT, LA, TB, and LV are the guarantors.
Funding: This research was supported by the Research Council KU Leuven: GOA-MANET, CoE EF/05/006 Optimization in Engineering (OPTEC); FWO: G.0302.07 (SVM), research communities (ICCoS, ANMMM); IWT-TBM 070706 (IOTA); Belgian Federal Science Policy Office: IUAP P6/04 (DYSCO); EU: BIOPATTERN (FP6-2002-IST 508803); Swedish Medical Research Council (grants K2001-72X 11605-06A, K2002-72X-11605-07B, K2004-73X-11605-09A, and K2006-73X-11605-11-3); funds administered by Malmö University Hospital; and two Swedish governmental grants (ALF-medel and Landstingsfinansierad Regional Forskning).
Competing interests: None declared.
Ethical approval: The study protocol was approved by the central ethical committee for clinical studies at the University Hospitals Leuven, Belgium, and by the local ethics committee at each recruitment centre.
Data sharing: No additional data available.
Recruitment centres: University Hospitals Leuven (Belgium), Ospedale S Gerardo, Università di Milano Bicocca, Monza (Italy), Ziekenhuis Oost-Limburg (ZOL), Genk, (Belgium), Medical University in Lublin (Poland), University of Cagliari, Ospedale San Giovanni di Dio, Cagliari (Italy), Malmö University Hospital, Lund University (Sweden), University of Bologna (Italy), Università Cattolica del Sacro Cuore Rome (Italy), DCS Sacco University of Milan (Milan A) (Italy), General Faculty Hospital of Charles University, Prague (Czech Republic), Chinese PLA General Hospital, Beijing (P.R. of China) King’s College Hospital London (UK), Universita degli Studi di Napoli, Napoli (Naples A) (Italy), IEO, Milano (Milan B) (Italy), Lund University Hospital, Lund (Sweden), Macedonio Melloni Hospital, University of Milan (Milan C) (Italy), Università degli Studi di Udine (Italy), McMaster University, St Joseph’s Hospital, Hamilton, Ontario (Canada), Istituto Nationale dei Tumori, Fondazione Pascale, Napoli (Naples B) (Italy)
IOTA Steering Committee: Dirk Timmerman, Leuven, Belgium; Lil Valentin, Malmö, Sweden; Tom Bourne, London, UK; Antonia C Testa, Rome, Italy; Sabine Van Huffel, Leuven, Belgium; Ignace Vergote, Leuven, Belgium.
IOTA principal investigators (alphabetical order): Artur Czekierdowski, Lublin, Poland; Elisabeth Epstein, Lund, Sweden; Daniela Fischerová, Prague, Czech Republic; Dorella Franchi, Milano, Italy; Robert Fruscio, Monza, Italy; Stefano Greggi, Napoli, Italy; Stefano Guerriero, Cagliari, Italy; Jingzhang, Beijing, People’s Republic of China; Davor Jurkovic, London, UK; Francesco P G Leone, Milano, Italy; Andrea A Lissoni, Monza, Italy; Henry Muggah, Hamilton, ON, Canada, Dario Paladini, Napoli, Italy; Alberto Rossi, Udine, Italy; Luca Savelli, Bologna, Italy; Antonia Carla Testa, Roma, Italy; Dirk Timmerman, Leuven, Belgium; Diego Trio, Milano, Italy; Lil Valentin, Malmö, Sweden; Caroline Van Holsbeke, Genk, Belgium.
Cite this as: BMJ 2010;341:c6839