Pretherapeutic Imaging for Axillary Staging in Breast Cancer: A Systematic Review and Meta-Analysis of Ultrasound, MRI and FDG PET

Background: This systematic review aimed at comparing performances of ultrasonography (US), magnetic resonance imaging (MRI), and fluorodeoxyglucose positron emission tomography (PET) for axillary staging, with a focus on micro- or micrometastases. Methods: A search for relevant studies published between January 2002 and March 2018 was conducted in MEDLINE database. Study quality was assessed using the QUality Assessment of Diagnostic Accuracy Studies checklist. Sensitivity and specificity were meta-analyzed using a bivariate random effects approach; Results: Across 62 studies (n = 10,374 patients), sensitivity and specificity to detect metastatic ALN were, respectively, 51% (95% CI: 43–59%) and 100% (95% CI: 99–100%) for US, 83% (95% CI: 72–91%) and 85% (95% CI: 72–92%) for MRI, and 49% (95% CI: 39–59%) and 94% (95% CI: 91–96%) for PET. Interestingly, US detects a significant proportion of macrometastases (false negative rate was 0.28 (0.22, 0.34) for more than 2 metastatic ALN and 0.96 (0.86, 0.99) for micrometastases). In contrast, PET tends to detect a significant proportion of micrometastases (true positive rate = 0.41 (0.29, 0.54)). Data are not available for MRI. Conclusions: In comparison with MRI and PET Fluorodeoxyglucose (FDG), US is an effective technique for axillary triage, especially to detect high metastatic burden without upstaging majority of micrometastases.


Introduction
Breast cancer is the most commonly diagnosed cancer among women worldwide [1], accounting for 25% of cancer cases and 15% of cancer-related deaths [2]. Axillary lymph node (ALN) metastases are detected in 30 to 40% of women with breast cancer and are associated with a less favorable prognostic [3,4]. Sentinel lymph node biopsy (SLNB) is the classical staging procedure for breast cancer patients with clinically and radiologically negative axilla [5][6][7][8]. Preoperative detection of ALN involvement by imaging may change management in several ways, from first-line ALN dissection to neoadjuvant chemotherapy [9]. However, it is now well established that axillary micro-and macrometastases do not have the same prognostic and therapeutic impact, and the detection of micrometastasis should not lead to an ALN dissection or an inappropriate chemotherapy. Consequently, the axillary staging by imaging should help selecting patients with macrometastatic ALN and patients with negative or micrometastatic ALN.
To our knowledge, no study has systematically evaluated the performance of each of the 3 main imaging techniques as a triage test for axilla staging for breast cancer patients, especially without palpable ALN, with a focus on the type of nodal involvement (micro-or macrometastases). Many of the previous analyses concerning axillary staging did not include nodal ultrastadification and were performed in a population in which a significant proportion of patients had palpable ALN. Palpable ALN constitute a contraindication for SLNB as grossly involved nodes may not retain the dye or the radio-colloid agent due to the replacement of macrophages by cancer cells [10][11][12][13]. Moreover, inclusion of a significant proportion of patients undergoing neoadjuvant chemotherapy may not allow an accurate evaluation, as node staging may change during neoadjuvant chemotherapy (false negative).
Hence, the role and performance of imaging (including ultrastadification) remains to be clarified for breast cancer patients without palpable ALN, as well as the choice of the adequate imaging modality.
This systematic review aimed at systematically evaluating the performances of US (with or without fine-needle aspiration or core needle biopsy), MRI, and fluorodeoxyglucose PET for axillary staging, with a focus on micro-or micrometastases in breast cancer patients without palpable axillary nodes, and to discuss their use in different clinical settings.

Search Strategy
This systematic review followed the recommendations in the PRISMA statement [17,18]. Two reviewers independently searched the relevant studies that assessed the accuracy and the utility of US, MRI, and PET in staging the axilla in patients with breast cancer. The MEDLINE database was used for all in vivo human studies. The discrepancies were resolved by consensus.

Inclusion and Exclusion Criteria
Studies with the following inclusion criteria were reviewed: (1) Published in English, (2) cohort studies (prospective or retrospective); (3) published between 1 January 2002 and 15 March 2018; (4) imaging was done to detect ALN involvement in patients with breast cancer, (5) imaging procedures were US, MRI, PET; (6) histopathological analysis of ALN obtained by SLNB or ALN dissection procedure were used as the reference standard test, and (7) true positive (TP), false positive (FP), true negative (TN), and false negative (FN) values were reported or, if there was sufficient data for them, were calculated.
We excluded studies with the following criteria: (1) Neoadjuvant chemotherapy was administered between imaging and axillary surgery; (2) patients with palpable ALN ipsilateral to the breast cancer; (3) no histopathological reference standard; (4) patients without breast cancer; (5) insufficient data available to calculate the TP, FP, TN, and FN values; (6) imaging was performed for the sole purpose of detecting sentinel ALN; (7) patients were shared with another study previously included; (8) experimental subject was an animal and ex vivo; (9) under 18 analyzable patients in the study, (10) the type of study was a case control study, review, case report, letter to the editor, and (11) we were unable to get the full text.
Some studies were also included if we could manually exclude patients with exclusion criteria-such as patients treated with neoadjuvant chemotherapy or with palpable node, or patients without breast cancer and if we could calculate VP, FP, VN, and FN in the new population.

Data Extraction and Quality Assessment
Data were extracted by one reviewer, checked by a second, and discrepancies resolved by discussion. Study quality was assessed using the QUality Assessment of Diagnostic Accuracy Studies (QUADAS) checklist [19]. All the 14 items in the checklist were used.

Data Synthesis and Statistical Analysis
Patients were classified as TP when both imaging techniques and the reference standard (e.g., ALN dissection or SLNB) detected axillary metastases; TN when neither imaging techniques nor reference standard detected metastasis; FN when the imaging technique failed to detect metastasis identified by the reference standard; and FP when the imaging technique incorrectly suggested metastasis not detected by the reference standard. Sensitivity was defined as TP/(TP + FN) and SP as TN/(TN + FP). The diagnostic odd ratio (DOR) values was obtained with different combinations of SE and SP and could be used as a single summary measure. It was defined as the ratio of odds of positivity in disease relative to non-diseased. The DOR value ranges from 0 to infinity and a higher value means better diagnostic performance. A value of 1 indicates that a test cannot distinguish between patients with or without the disease and values of <1 introduce more FN results among the diseased [20].
Considering the correlation between sensitivity and specificity, a bivariate random effects model was used to summarize performance estimates and their 95% confidence intervals (CI) [21]. Heterogeneity was assessed using the quantity I2 that lies between 0 and 100% (a value of 0% indicates no observed heterogeneity, values lower than 50% were considered as an acceptable level of heterogeneity) [22]. When no significant heterogeneity was observed between studies or when the number of considered studies was too small, a pooled analysis was undertaken. For all statistical tests, differences were considered significant at the 0.05 level. All statistical analyses were conducted using STATA 13.0 ® software (copyright College Station, TX: StataCorp LP).
Forest plots were generated within Review Manager 5 ® (copyright The Cochrane Collaboration, Copenhagen: The Nordic Cochrane Centre).

Subgroup Analyses
Subgroup analyses were undertaken according to US technique; US grayscale, US + fine needle aspiration/core needle biopsy, fine needle aspiration, and elastosonography. Subgroup analyses were conducted according to which MRI technique was used; MRI without diffusion weighted imaging (DWI), MRI with DWI, and DWI alone. Subgroup analyses were conducted according to which PET technique was used; PET without computed tomography (CT), and PET with CT.
In some studies, several results for one imaging technique, like MRI, were available, for example, for each MRI subgroup (e.g., MRI without DWI, MRI with DWI, DWI alone). As these results came from the same population, only one result could be considered for the pool estimates. Additionally, the subgroup with the best accuracy result ((TP + TN)/(TP + FP + FN + TN)) was considered.
For US studies, the US + fine needle aspiration/core needle biopsy criterion was preferred over US grayscale, because in routine clinical practice, any suspicious ALN in breast cancer undergoes ultrasound guided fine needle aspiration ore core needle biopsy. In studies evaluating elastosonography, nodes were considered abnormal if either US grayscale, elastosonography, or both were abnormal (disjunctive method).
Subgroups analysis were undertaken according to ALN involvement (micrometastases versus macrometastases and less than 3 ALN metastases versus 3 or more ALN metastases) in patient with T1-T2 breast cancer.

Number and Characteristics of Included Studies
The search identified 569 citations from the MEDLINE data base, 95 were examined for full text review analysis after primary screening of titles and abstracts. Study characteristics of each subgroup are described in Table 1A-D.
In total, 62 studies were suitable for inclusion ( Figure 1). There were 30 studies assessing US with or without fine needle aspiration/core needle biopsy, including 7546 patients of which 2668 had ALN metastases (prevalence = 35.4%), 10 studies assessing MRI, including 652 patients of which 211 had ALN metastases (prevalence = 32.4%), and 24 studies assessing PET, including 2388 patients of which 909 had ALN metastases (prevalence = 38.1%).    Figure 2 summarizes the methodological quality of the 62 included studies. In general, the reference standard was adequate, but was not the same for all patients (either SLNB or ALN dissection), and the choice of the reference standard depended on the index test results (for instance, ALN dissection was performed for biopsy-proven metastatic nodes). The reference standard and the index test were well described in every study.

Quality of Included Studies
The index test was interpreted by reviewers blinded to reference standard results in all studies. The index test was often interpreted by reviewers blinded to other clinical data, most of the cases for MRI and PET studies, but rarely in US studies. Uninterpretable results were discussed in only 5 studies.
Results are presented in Table 2 and Figure 3A-C.   The diagnostic odd ratio (DOR) values obtained with different combinations of sensitivity and specificity could be used as a single summary measure. It was defined as the ratio of odds of positivity in disease relative to non-diseased. The DOR value ranges from 0 to infinity, and a higher value signifies better diagnostic performance. A value of 1 indicates that a test cannot distinguish between patients with or without the disease and values of <1 introduce more FN results among the diseased [22]. Confidence intervals consider the heterogeneity beyond chance between studies (random effects models). The impact of unobserved heterogeneity is traditionally assessed statistically using the quantity I2.It describes the percentage of total variation across studies that is attributable to the heterogeneity rather than chance [22]. Magnetic resonance imaging (MRI) had a significantly higher sensitivity than other imaging modalities, whereas Ultrasonography (US) had a significantly higher specificity than MRI and to a lesser extent than fluorodeoxyglucose positron emission tomography (PET). DOR estimated for US was significantly greater than those of MRI, which in turn was significantly greater than those of FDG PET. Further analysis revealed that for all imaging modality, US + fine needle aspiration (FNA) or core needle biopsy (CNB) had the highest DOR value. For MRI studies, MRI with diffusion weighted imaging (DWI) had the highest DOR value and for PET studies. PET with or without computed tomography (CT) had the same DOR value.

Discussion
In this meta-analysis assessing the diagnostic performances of US, MRI, and PET for pretherapeutic ALN staging, we found that while MRI had a significant higher sensitivity than other imaging modalities, the performance of US significantly improved for macrometastases in more than 2 ALN. The association of US and fine needle aspiration had the highest diagnostic odd ratio, in part because of a specificity close to 100%.
Unlike other published meta-analysis, we chose to assess each of these 3 techniques to put in contrast their respective strengths and weaknesses and to offer an overview of the role of imaging for nodal staging and ultrastadification.
We did not include patients with clinically positive ALN, for which preoperative imaging is unlikely to change treatment plan [12]. We also chose not to include patients undergoing neoadjuvant chemotherapy, in order to have a gold-standard reference test available for every patient.
While previously published meta-analysis had a high prevalence of ALN metastasis [3,11], the metastasis rate in our study was in line with the commonly described rate of ALN metastasis in invasive breast cancer, between 30 and 40% [3,4].
Management of axilla has evolved with the increased use of neoadjuvant treatment. Furthermore, the ACOSOG Z0011 trial proved that women with micrometastases or less than 2 metastatic ALN and clinical T1-2 tumors undergoing lumpectomy and breast radiation therapy followed by systemic therapy, did not benefit from ALN dissection in terms of local control and 10-year overall survival [13]. An ideal preoperative axillary staging should therefore be able not only to detect macrometastasis with high accuracy, but also to evaluate the global axillary burden, in order to avoid unnecessary ALN dissection in low axillary burden.
We found that axillary US has a very high specificity (99%, 95% CI: 97-100%), in contrast with its much lower overall sensitivity [85,86], which indeed depends on the axillary burden: FN rate of US drops to 0.28 when more than 2 ALN are involved, while micrometastases are almost never detected. This data is fundamental to avoid over-treatment, as micrometastasis should not lead to an ALN dissection or the prescription of chemotherapy. A recent study on interobserver variability showed that the discrimination between low and high axillary burden on US is reliable and reproducible [87]. US should be used for first-line axillary triage, to detect high metastatic burden that could benefit from neoadjuvant chemotherapy, without diverting low-burden patients from SLNB procedure. Technical improvements, such as elastosonography [23,25] or the use of intradermal microbubbles to locate and biopsy the sentinel lymph node under ultrasound guidance [88] may further increase US sensitivity.
We found that MRI has a better sensitivity than US for detection of nodal metastasis. This is in line with the results of other meta-analysis, for example, Liang et al. [7] found a sensitivity of 82% (95%CI: [78][79][80][81][82][83][84][85][86]. The main drawback of MRI is its relatively low specificity compared to other imaging modalities, which makes it unsuitable for surgical or oncological planning. The adjunction of diffusion-weighting imaging seems to significantly increase its specificity while only slightly decreasing its sensitivity. In one study by Hieken et al. [89], second-look US after abnormal axillary findings on MRI allowed detection of abnormal nodes not previously detected by US in only 10% of the cases. In the clinical situation of a positive MRI with negative US, there is a significant risk of axillary false positive. In our study, PET shows a lower sensitivity than in Cooper's less recent meta-analysis (49% vs. 63%) [4]. Indeed, performance of PET may vary depending on breast cancer histological subtypes, with higher performances in basal than luminal subtypes [90,91] and also depending on the histological gold standard (e.g., high rate of micrometastases in recent studies [15]). A functional, high-sensitivity imaging, PET has a much higher detection rate of micrometastases than US, which can theoretically lead to unnecessary ALN dissection or neoadjuvant chemotherapy. Yet, PET has the unique ability to detect extranodal distant metastasis and should be used preferentially in patients at high risk for extranodal disease. Further technical improvements, especially new markers for hormone-positive or HER2-positive breast tumors, may redefine the role of PET imaging in axillary staging.
Our study has some limits. A relatively low number of MRI studies were included in our metanalysis, as this imaging modality has only been studied more recently for axilla staging. Likewise, probably due to the lower availability of MRI and PET, these modalities are more widely used for T3-T4 than T1-T2 stages. It may explain why MRI and PET studies include fewer T1-T2 breast cancer than US studies. However, the prevalence of ALN metastases for each of 3 modalities was roughly the same, between 30 and 40%. High heterogeneity of MRI subgroup analysis was probably due to the lack of consensus on the criteria used to define a suspicious ALN on MRI, as well as difference in imaging protocol between centers (MRI field strength, imaging parameters). Finally, information about axillary burden was not widely available in MRI and PET studies.
Thus, future imaging studies should systematically include such parameters as the number of metastatic ALN, the presence of micrometastases versus macrometastases, and the presence of a capsular rupture to avoid over diagnosis and over treatment.

Conclusions
US is an effective technique for axillary triage, especially to detect high metastatic burden that could benefit from neoadjuvant chemotherapy or axillary clearance, without upstaging the majority of micrometastases.