Published online Sep 08, 2017.
https://doi.org/10.3802/jgo.2017.28.e86
Transvaginal ultrasound versus magnetic resonance imaging for preoperative assessment of myometrial infiltration in patients with endometrial cancer: a systematic review and meta-analysis
Abstract
Objective
To compare the diagnostic accuracy of transvaginal ultrasound (TVS) and magnetic resonance imaging (MRI) for detecting myometrial infiltration (MI) in endometrial carcinoma.
Methods
An extensive search of papers comparing TVS and MRI in assessing MI in endometrial cancer was performed in MEDLINE (PubMed), Web of Science, and Cochrane Database from January 1989 to January 2017. Quality was assessed using Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool.
Results
Our extended search identified 747 citations but after exclusions we finally included in the meta-analysis 8 articles. The risk of bias for most studies was low for most 4 domains assessed in QUADAS-2. Overall, pooled estimated sensitivity and specificity for diagnosing deep MI were 75% (95% confidence interval [CI]=67%–82%) and 82% (95% CI=75%–93%) for TVS, and 83% (95% CI=76%–89%) and 82% (95% CI=72%–89%) for MRI, respectively. No statistical differences were found when comparing both methods (p=0.314). Heterogeneity was low for sensitivity and high for specificity for TVS and MRI.
Conclusion
MRI showed a better sensitivity than TVS for detecting deep MI in women with endometrial cancer. However, the difference observed was not statistically significant.
INTRODUCTION
Endometrial cancer is the most common gynecologic malignancy in frequency in developed countries [1]. In 1988, the International Federation of Gynecology and Obstetrics (FIGO) established that endometrial cancer should be surgically staged [2]. Comprehensive surgical staging comprises total hysterectomy, bilateral salpingo-oophorectomy, cytologic washings and pelvic and para-aortic lymphadenectomy [2]. However, the therapeutic role of systematic lymph node dissection is still a matter of debate in low risk endometrial cancers [3]. Risk classification is mainly based on tumor histology, tumor grade, and myometrial infiltration (MI) depth [4]. Tumor histology and histological grade may be assessed preoperatively by endometrial biopsy. Thus, selecting low risk cases preoperatively, based on MI assessment, may help to better plan surgical procedures and to avoid unnecessary lymph node dissections [5]. Currently, transvaginal ultrasound (TVS) and magnetic resonance imaging (MRI) [6, 7, 8] are the commonest techniques used for assessing preoperatively the depth of MI.
Recent meta-analyses have shown that TVS have 78%–85% sensitivity and 82%–84% specificity for detecting deep MI [9], whereas MRI offers sensitivity ranging from 81% to 90% and specificity ranging from 82% to 89%, depending on the technique used [10, 11]. These meta-analyses included studies analyzing either ultrasound or MRI in different set of patients. However, to the best of our knowledge there is no meta-analysis including only studies that used both techniques on the same set of women. Such a meta-analysis would allow comparing the diagnostic performance of both techniques more appropriately from the meta-analytic point of view.
The objective of the present meta-analysis was to compare the diagnostic accuracy of TVS and MRI in for detecting MI in endometrial carcinoma, analyzing only studies that used both techniques in the same set of patients.
MATERIALS AND METHODS
1. Protocol and registration
We performed this systematic review and meta-analysis according to the Synthesizing Evidence from Diagnostic Accuracy TEsts (SEDATE) guidelines [12]. All methods for inclusion/exclusion criteria, data extraction and quality assessment were specified in advance. The protocol did not require registration.
2. Data sources and searches
Studies published between 1989 and January 2017 were identified by 3 of the authors (JLA, BG, BN) using 3 electronic databases (PubMed/MEDLINE, Cochrane, and Web of Science), to identify potentially eligible studies. For ongoing clinical trials, we searched in websites such as www.ClinicalTrials.gov and www.who.int/trialsearch.
We did not use methodological filters in database searches to avoid possible omission of relevant studies, according to the recommendations of Leeflang et al. [13]. The search terms included and captured the concepts of ‘endometrial,’ “cancer,” “carcinoma,” ‘transvaginal ultrasound,’ ‘sonography,’ ‘myometrial,’ and ‘magnetic resonance imaging.’ Language restriction in the search was set to English.
3. Study selection and data collection
One author (JLA) screened the titles and abstracts identified by the searches to exclude obviously irrelevant article, i.e., those not strictly related to the topic under review. Full-text articles were obtained to identify potentially eligible studies, and 3 reviewers (JLA, RS, BN) applied independently the following inclusion criteria:
1) Prospective or retrospective cohort study including patients who underwent both techniques, MRI and TVS, for evaluating MI in endometrial carcinoma as index tests.
2) Surgical assessment of the presence of MI according to histopathological permanent frozen section as reference standard.
3) Presence of results sufficient to construct the 2×2 table of diagnostic performance as minimum data requirement.
To avoid inclusion of duplicate cohorts in the meta-analysis in the case of 2 studies from the same authors, the study period of each study was examined; if dates overlapped, we chose the latest study according to the publication date, considering that patients from the first study were also included in the latest one. We used “snowball” strategy to identify potential interesting papers by reading reference list of those papers selected for full text reading. No attempts were made to contact the authors.
The Patients, Intervention, Comparator, Outcomes, Study design (PICOS) criteria were used for describing the studies included.
Diagnostic accuracy results and additional useful information on patients and procedures were retrieved from selected primary studies independently by 3 of the authors (JLA, RS, BN). Disagreements arising during the process of study selection and data collection were resolved by consensus among 2 of the authors (JLA, BN).
4. Risk of bias in individual studies
Quality assessment was conducted, adapting to this particular review the tool provided by the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) [14]. The QUADAS-2 format includes 4 domains: 1) patient selection, 2) index test, 3) reference standard, and 4) flow and timing. For each domain, the risk of bias and concerns about applicability (the latter not applying to the domain of flow and timing) were analyzed and rated as low, high or unclear risk. The results of quality assessment were used for descriptive purposes to provide an evaluation of the overall quality of the included studies and to investigate potential sources of heterogeneity. Two authors (JLA, BN) evaluated independently the methodological quality, using a standard form with quality assessment criteria and a flow diagram; they resolved disagreements by discussion between 2 of the authors.
The methodology of quality criteria was based on description of inclusion and exclusion criteria for patient selection domain, description about how the of the index test (TVS/MRI) was performed and interpreted for index test domain, description of reference standard used and whether pathologists were blinded or not to index test for reference standard domain and description of the time elapsed from index test assessment to reference standard result.
5. Statistical analysis
We extracted or derived information on diagnostic performance of TVS and MRI. A random-effects model was used to determine overall pooled sensitivity, specificity, positive likelihood ratio (LR+) and negative likelihood ratio (LR−). Positive and negative likelihood ratios (LRs) were used to characterize the clinical utility of a test and to estimate the post-test probability of disease. A LR of 0.2–5.0 provides weak evidence for either ruling out or confirming the disease. A LR of 5.0–10.0 and 0.1–0.2 provides moderate evidence to either confirm or rule out the disease. A LR >10 or <0.1 provides strong evidence to either confirm or rule out the disease [15].
Using the mean prevalence of deep (>50%) MI (pretest probability) in each subset, depending upon the technique assessed and LRs, post-test probabilities were calculated and plotted on Fagan nomograms.
We assessed the presence of heterogeneity for sensitivity and specificity using Cochran's Q statistic and the I2 index [16]. A p-value<0.1 indicates heterogeneity. The I2 index describes the percentage of total variation across studies that is due to heterogeneity rather than chance. According to Higgins et al. [16], I2 values of 25%, 50%, and 75% would be considered to indicate low, moderate and high heterogeneity, respectively. Forest plots of sensitivity and specificity of all studies were plotted.
Summary receiver-operating characteristics (sROC) curves were plotted to illustrate the relationship between sensitivity and specificity. Comparison of diagnostic performance between TVS and MRI for detecting deep MI was done using the bivariate method [15]. Meta-regression was used if heterogeneity existed to assess covariates that could explain this heterogeneity. The covariates analyzed were sample size, prevalence, mean patient age and number of observers (single/multiple). Publication bias was assessed by a regression of diagnostic log odds ratio against 1/√(effective sample size), weighted by effective sample size, with p<0.10 for the slope coefficient indicating significant asymmetry [17].
All analyses were performed using Meta-analytical Integration of Diagnostic Accuracy Studies (MIDAS) and METANDI commands in STATA version 12.0 for Windows (Stata Corporation, College Station, TX, USA). A p-value<0.05 was considered as statistically significant.
RESULTS
1. Search results
The electronic search provided a total of 747 citations. We did find 19 citations in websites for ongoing trials, but none of them were studies specifically related to the topic of the present meta-analysis. After removal of 371 duplicate records and 33 papers published in language other than English, 343 citations remained. Of these, 325 were excluded because it was clear from the title or abstract that they were not relevant to the review (papers assessing TVS but not MRI [n=54], papers assessing MRI but not TVS [n=124], papers not assessing diagnostic performance or not related to the topic [n=92], reviews [n=54] or letters to the editor [n=1]). We examined the full text of the remaining 18 articles. Finally, 10 studies were discarded because they did not meet inclusion criteria (studies using 3D ultrasound [n=5], studies using transabdominal ultrasound [n=2], studies in which data as table 2×2 was not possible to obtain [n=2], and one study that analyzed TVS and MRI but in different sets of patients). Thus, the remaining 8 studies were included in the review and meta-analysis [18, 19, 20, 21, 22, 23, 24, 25]. No additional relevant studies were found from references cited in the papers included in the review. A flowchart summarizing literature identification and selection is given in Fig. 1.
Fig. 1
Flow chart showing studies selection process.
2. Characteristics of included studies
A total of 8 studies [18, 19, 20, 21, 22, 23, 24, 25] published between January 1992 and February 2013 reporting on 560 patients were included in the final analyses. Among these 560 women, 192 had deep MI. Mean prevalence of deep MI was 33.7%, ranging from 7.1% to 52.4% [18, 19, 20, 21, 22, 23, 24, 25]. All studies reported the clinical characteristics of the cohort to some extent. Mean patients' age was reported in 6 out of 8 studies and ranged from 55 to 65 years. Table 1 shows PICOS features of the studies included.
Table 1
Characteristics of included studies in this systematic review according to PICOS criteria
3. Methodological quality of included studies
Study design was clearly stated as prospective in 6 studies [20, 21, 22, 23, 24, 25]. In 2 studies, design was not clear [18, 19]. A graphical display of the evaluation of the risk of bias and concerns regarding applicability of the selected studies is shown in Fig. 2.
Fig. 2
MRI, magnetic resonance imaging; QUADAS-2, Quality Assessment of Diagnostic Accuracy Studies-2; TVS, transvaginal ultrasound.
Histogram plot showing quality assessment (risk of bias and concerns about applicability) for all studies included in the meta-analysis.
Regarding risk of bias and the domain patient selection, 1 study was not clear regarding patient inclusion criteria [18] and 3 were considered as high risk for patient selection since only patients with “conclusive” or unequivocal” results for TVS and/or MRI were included [23, 24, 25].
Concerning the domain index test, with regard to TVS 5 studies adequately described the method of index text as well as how it was performed and interpreted, 2 studies were unclear [18, 22] and one was considered as high risk since MI was estimated “subjectively” by the examiner [25]. With regard MRI, 5 studies adequately described the method of index text as well as how it was performed and interpreted, 3 studies were unclear [18, 22, 25].
Concerning the domain flow and timing, the time elapsed between the index test and reference standard was unclear in 2 studies [22, 23].
For the domain reference standard, all studies were likely to correctly classify the target condition by the reference standard. However, in 3 studies it was not clearly specified if the results of the reference standard were interpreted using gross evaluation of the uterus or permanent frozen section [21, 23, 25]. Only 2 studies [18, 19] reported specifically that pathologists were blinded to imaging results, in the rest of the studies this was unclear.
Regarding applicability, for the domain patient selection, all studies were deemed to include patients that matched the review question. For the domain index test, most studies were considered as having low concerns for applicability as the index tests were described well enough for study replication, as was the reference standard domain.
4. Diagnostic performance of TVS and MRI for detection of deep MI
Overall, pooled sensitivity, specificity, LR+, and LR− of TVS for detecting deep MI were 75% (95% confidence interval [CI]=67%–82%), 86% (95% CI=75%–93%), 5.6 (95% CI=3.0–10.2), and 0.28 (95% CI=0.22–0.37), respectively. Low heterogeneity was found for sensitivity (I2=24.2%; Cochran Q=9.24; p=0.240) but significant heterogeneity was found for specificity (I2=80.6%; Cochran Q=36.14; p<0.001). On the other hand, pooled sensitivity, specificity, LR+, and LR− of MRI for detecting deep MI were 83% (95% CI=76%–89%), 82% (95% CI=72%–89%), 4.7 (95% CI=3.0–7.2), and 0.20 (95% CI=0.14–0.29), respectively. Low heterogeneity was found for sensitivity (I2=5.7%; Cochran Q=7.42; p=0.390) but significant heterogeneity was found for specificity (I2=83.4%; Cochran Q=42.21; p<0.001). No statistical differences were found when comparing both methods (p=0.314).
Fig. 3 shows forest plots for both methods. It can be observed in these graphics that most studies had acceptable CIs for both sensitivity and specificity for MRI and TVS. sROC curves are shown in Fig. 4. It can be observed that both techniques had similar areas under the curve for sROC curves, but the 95% prediction contour is narrower for MRI as compared with TVS. Fagan nomograms show that a positive test for TVS and MRI increases significantly the pretest probability deep MI, from 34% to 74% in case of TVS and from 34% to 71% in case of MRI, while a negative test significantly decreases the pretest probability, from 34% to 13% in case of TVS and from 34% to 9% in case of MRI (Fig. 5). Meta-regression showed that sample size, prevalence, mean patient age, and number of observers (single/multiple) did not explain heterogeneity observed for specificity. No publication bias was found, neither for TVS (p=0.650) nor for MRI (p=0.090).
Fig. 3
CI, confidence interval; MRI, magnetic resonance imaging; TVS, transvaginal ultrasound.
Forest plot for sensitivity and specificity for each study and pooled sensitivity and specificity for TVS (A) and MRI (B).
Fig. 4
AUC, area under the curve; MRI, magnetic resonance imaging; SENS, sensitivity; SPEC, specificity; sROC, summary receiver-operating characteristics; TVS, transvaginal ultrasound.
sROC curve for TVS (A) and MRI (B).
Fig. 5
LR, likelihood ratio; LR+, positive likelihood ratio; LR−, negative likelihood ratio; MRI, magnetic resonance imaging; TVS, transvaginal ultrasound.
Fagan nomograms showing how pre-test probability change after the test is performed (post-test probability) depending on a positive or negative result for TVS (A) and MRI (B).
DISCUSSION
In the present meta-analysis, we have evaluated and compared the pooled diagnostic accuracy of TVS and MRI for detecting deep MI in women with endometrial cancer undergoing surgical staging. We have found out that sensitivity was higher for MRI as compared with TVS, but this difference did not reach statistical significance. Pooled specificity was quite similar for both techniques.
These findings might be of clinical relevance since MRI is currently recommended for preoperative imaging in some guidelines [26, 27]. Taking into account the cost of MRI and the results of this meta-analysis, we believe that TVS may have a role as the first imaging technique for assessing MI in women with endometrial cancer, especially in low risk cases.
We observed low heterogeneity for sensitivity across studies independently the method used. However, we have found a significant heterogeneity for specificity. We were not able to find out any factor that could explain this heterogeneity.
The main strength of our meta-analysis is that we only included studies in which both TVS and MRI were used in the same set of patients. This allows a more reliable comparison between both techniques.
The main limitation of this systematic review is the small number of papers reported. Furthermore, most papers reported a small series. Therefore, results derived from this analysis are based on data from only 560 women, which is certainly a small sample size. Therefore, results should be interpreted with caution.
This review provides an idea of the methodological quality of studies using TVS and MRI for assessment of deep MI in endometrial cancer. It is clear that quality could be improved in many studies, especially concerning index test description, reference standard, and flow and timing.
We also observed that in all studies included were both high- and low-risk patients for deep MI. This may affect the clinical applicability of both techniques because, from the point of view of gynecological oncologists, preoperative assessment is appropriate in women with preoperative histological data indicating potential low risk, i.e., women with well or moderately differentiated endometrioid cancer, or in some high-risk cases with clinical suspicion of metastatic disease, especially when considering the use of MRI. Therefore, we cannot rule out that the diagnostic performance for assessing MI could be overestimated because of inclusion of high-risk cases, in whom the probability of deep MI is higher. This could also explain the heterogeneity observed among studies for pooled specificity. Additionally, the timing of papers included in the study varies from 1992 to 2013. This implies that technological advances in both TVS and MRI may also explain heterogeneity among the studies. Other potential factors for explaining heterogeneity could be different protocols for assessing MI among the studies, for both TVS and MRI.
In conclusion, our meta-analysis shows that MRI showed a better sensitivity than TVS for detecting deep MI in women with endometrial cancer. However, the difference observed was not statistically significant. Therefore, TVS should be considered as good enough for being used in clinical settings with limited resources. There is a need for more studies focusing just on grade 1 and grade 2 endometrioid carcinomas as per preoperative diagnosis to better define the actual role of intraoperative evaluation of MI in these a priori low-risk cases.
Conflict of Interest:No potential conflict of interest relevant to this article was reported.
Author Contributions:
Conceptualization: A.J.L., G.B., G.S.
Data curation: A.J.L., N.B., A.J.
Formal analysis: A.J.L., N.B., A.J.
Investigation: A.J.L., G.B.
Methodology: A.J.L., G.B., N.B., S.R., A.J.
Project administration: A.J.L.
Resources: A.J.L., G.B., N.B., S.R.
Software: A.J.L., N.B., S.R., A.J.
Supervision: A.J.L., G.S.
Validation: A.J.L., A.J.
Visualization: A.J.L.
Writing - original draft: A.J.L.
Writing - review & editing: G.B., N.B., S.R., A.J., G.S.
References
-
Alcázar JL, Pineda L, Caparrós M, Utrilla-Layna J, Juez L, Mínguez JA, et al. Transvaginal/transrectal ultrasound for preoperative identification of high-risk cases in well- or moderately differentiated endometrioid carcinoma. Ultrasound Obstet Gynecol 2016;47:374–379.
-
-
Alcázar JL, Orozco R, Martinez-Astorquiza Corral T, Juez L, Utrilla-Layna J, Mínguez JA, et al. Transvaginal ultrasound for preoperative assessment of myometrial invasion in patients with endometrial cancer: a systematic review and meta-analysis. Ultrasound Obstet Gynecol 2015;46:405–413.
-
-
European Network for Health Technology Assessment (EUnetHTA). EUnetHTA guideline: meta-analysis of diagnostic test accuracy studies [Internet]. Diemen: European Network for Health Technology Assessment; 2014 [cited 2017 Feb].Available from: http://www.eunethta.eu/eunethta-
guidelines.
-
-
Yahata T, Aoki Y, Tanaka K. Prediction of myometrial invasion in patients with endometrial carcinoma: comparison of magnetic resonance imaging, transvaginal ultrasonography, and gross visual inspection. Eur J Gynaecol Oncol 2007;28:193–195.
-
-
Özdemir S, Celik C, Emlik D, Kiresi D, Esen H. Assessment of myometrial invasion in endometrial cancer by transvaginal sonography, Doppler ultrasonography, magnetic resonance imaging and frozen section. Int J Gynecol Cancer 2009;19:1085–1090.
-
-
SGO Clinical Practice Endometrial Cancer Working GroupEndometrial cancer: a review and current management strategies: part I. Gynecol Oncol 2014;134:385–392.
-