Diagnostic Performance of Digital Breast Tomosynthesis for Breast Suspicious Calcifications From Various Populations: A Comparison With Full-field Digital Mammography

The diagnostic performance difference between digital breast tomosynthesis (DBT) and conventional full-field digital mammography (FFDM) for breast suspicious calcifications from various populations is unclear. The objective of this study is to determine whether DBT exhibits the diagnostic advantage for breast suspicious calcifications from various populations compared with FFDM. Three hundred and five patients were enrolled (of which seven patients with bilateral lesions) and 312 breasts images were retrospectively analyzed by three radiologists independently. The postoperative pathology of breast calcifications was the gold standard. Breast cancer was diagnosed utilizing DBT and FFDM with sensitivities of 92.9% and 88.8%, specificities of 87.9% and 75.2%, positive predictive values of 77.8% and 62.1%, negative predictive values of 96.4% and 93.6%, respectively. DBT exhibited significantly higher diagnostic accuracy for benign calcifications compared with FFDM (87.9% vs 75.2%), and no advantage in the diagnosis of malignant calcifications. DBT diagnostic accuracy was notably higher than FFDM in premenopausal (88.4% vs 78.8%), postmenopausal (90.2% vs 77.2%), and dense breast cases (89.4% vs 81.9%). There was no significant difference in non-dense breast cases. In our study, DBT exhibited a superior advantage in dense breasts and benign calcifications cases compared to FFDM, while no advantage was observed in non-dense breasts or malignant calcifications cases. Thus, in the breast cancer screening for young women with dense breasts, DBT may be recommended for accurate diagnosis. Our findings may assist the clinicians in applying the optimal techniques for different patients and provide a theoretical basis for the update of breast cancer screening guideline.


Introduction
Breast calcification is a common manifestation in the breast cancer screening. A majority of calcifications are benign, while some may indicate the presence of early-stage breast cancer [1]. It is quite challenging to distinguish malignant calcifications from benign using current imaging techniques. The Breast Imaging Reporting and Data System (BI-RADS) of the American College of Radiology (ACR) clearly describes the morphology, distribution, and categories of breast calcifications, including typical benign calcifications and calcifications with suspicious morphology [2,3]. Some suspicious calcifications, especially clustered sand-like microcalcifications often indicate a malignant disease such as ductal carcinoma in situ (DCIS) [4]. Hence, the identification of suspicious calcifications is crucial.
Conventional full-field digital mammography (FFDM) is a 2dimensional (2D) imaging technique which is widely used to screen for early breast cancer and diagnose breast lesions [5]. However, it has some inevitable limitations because of its inability to accurately distinguish suspicious lesions from the adjacent overlapping tissue [6]. For instance, FFDM has been faulted for its high false-positive rate and low sensitivity, especially in women with dense tissue [7].
Digital breast tomosynthesis (DBT) is a 3-dimensional (3D) imaging technique developed to overcome some of the limitations of conventional FFDM [5], and has been increasingly employed in breast cancer screening and assessment [8][9][10]. It removes the overlapping of breast tissue which can mask breast abnormalities, potentially raise sensitivity for breast cancers, and decrease the false-positive rate [11]. DBT also improves preoperative breast cancer staging in dense breasts patients significantly [12]. Meanwhile, DBT shows higher detection rate and diagnostic accuracy for both benign and malignant masses, with better sensitivity and specificity and lower recall rates [13]. Our previous study also showed that DBT had a significant advantage over FFDM in the accuracy of diagnosis of breast cancer [14], especially in the diagnosis of breast mass-like lesions. However, this technology has not been widely used in China.
Multiple breast imaging studies have implemented the combined procedure of FFDM and DBT [15][16][17]. The combination of FFDM and DBT improved breast cancer accuracy, decreased the false-negative rate, and increased the sensitivity as compared to using only FFDM. The primary limitations of FFDM plus DBT for screening and clinical diagnosis are the increase of interpretation time [18] and radiation dose [19]. To break through these limitations, the synthetic 2D images are reconstructed from the information acquired during a DBT data acquisition procedure [19]. DBT plus synthetic 2D imaging increases cancer detection rates and the image reading times compared with FFDM, with comparable recall rates [20]. While for the evaluation of microcalcifications, the diagnostic performances of synthetic 2D imaging and FFDM were not significantly different, whether performed with DBT or alone [21]. TOMMY Trial found that synthetic 2D imaging plus DBT demonstrated similar performance to that of standard FFDM plus DBT, while the addition of DBT increased the sensitivity of FFDM in patients with dense breasts and the specificity of FFDM for all subgroups [22]. Some other studies estimated the 3D positions of the microcalcifications in each of the clusters and reconstructed the clusters as ellipsoids by utilizing multiple projections and the geometry of DBT, as well as demonstrated a possible way of 3D shape analysis by utilizing DBT to make the diagnosis more accurate [23]. These studies indicated that DBT might possess a potential benefit as a stand-alone modality in the screening and diagnosis, other than an adjunct of 2D imaging.
However, the detectability of stand-aloneDBT for the breast microcalcifications is still controversial. Some studies demonstrated that DBT enabled the detection and characterization of microcalcifications with no significant differences from FFDM [24,25]. Kopans et al. found that the clarity of DBT images in 92% cases was equal to or better than that of conventional mammography, and was judged to be better in almost half cases than conventional mammography [26]. In contrast, some other studies demonstrated that FFDM appears to be more sensitive than DBT for the detection of calcification [27,28]. To our knowledge, few studies have focused on the comparison of DBT and FFDM for breast suspicious calcifications with different breast densities and menopausal status. The aim of our analyses was to determine whether DBT exhibits the diagnostic advantage for breast suspicious calcifications from various populations compared with FFDM and assist the clinicians in applying appropriate method for different patients.

Patients
This retrospective study was approved by the Ethics Committee of the Affiliated Cancer Hospital of Zhengzhou University (Henan Cancer Hospital) and the patients provided written informed consent for the surgical biopsy and imaging. The study was conducted between 03/2015 and 03/2018. Study participation was offered to women who met the following recruited criteria: (a) participants who underwent FFDM and DBT examinations; and (b) calcification was found through either FFDM or DBT and was classified into category 4A or above according to BI-RADS of ACR [2]; (c) the lesions were finally confirmed by histopathology through surgical biopsy in our hospital. The exclusion criteria were the following: (a) mammography indicated typical benign calcifications [2,3]; (b) pregnant or lactating women; (c) participants who underwent breast surgery or breast treatment. We included a final cohort of 312 breast suspicious calcified lesions from 305 women (of which seven cases with bilateral lesions) (age range 31-72 years; mean age 48.7 years).

Image Acquisition
The patients underwent FFDM and DBT imaging of both breasts in the craniocaudal (CC) and mediolateral oblique (MLO) positions using a standard DBT system (Selenia Dimensions 5000, Hologic, USA) before surgery. The specifications of this machine are as follows: detector pixel size 3328 × 4096; resolution 7.1 lp/mm; pixel pitch 70 μm. Using the standard imaging phantom in the combo mode (DBT plus FFDM), the average glandular radiation doses for FFDM, DBT, and combo mode in a single view are approximately 1.25, 1.65, and 2.90 mGy, respectively. DBT examination was performed immediately after FFDM in the same compression mode (combo mode) using automatic exposure control by the same designated technician. For DBT, while the x-ray tube rotated through an arc of −7.5 to +7.5°, a series of low-dose 1 mmthick 2D images were obtained while the breast was compressed in the fixed position. These images were reconstructed into a series of 1 mm-thick slices using the filtered back projection technique automatically.

Image Analysis
Three radiologists (Hengwei Zhang, Hui Jiang, and Xuhui Guo) with 7-18 years of breast image experience participated in the image analysis. All readers had participated in prior reader studies of interpreting tomosynthesis examinations and had undergone training in the interpretation of DBT images. A standard hanging protocol was used to display both the FFDM and DBT images. Image interpretation was performed per breast, not per patient. The retrospective double-blind method was used. Each radiologist first evaluated the FFDM images randomly while blinded to the DBT images and the patient's clinical information, and assigned a BI-RADS category. Similarly, they evaluated the DBT images randomly while blinded to the FFDM images and the patient's clinical information. To minimize the learning and memory bias, they evaluated the DBT images and assigned the BI-RADS category with a one-month interval at least [24,29]. When the assigned BI-RADS categories within the same imaging modality were inconsistent among the radiologists, a consensus was reached through discussion [15].
Breast suspicious calcifications were characterized utilizing BI-RADS categories [2] from 2 to 5 to assess the probability of malignancy, as: 2, benign; 3, probably benign (0-2% malignant), initial short-interval (6 months) follow-up suggested; 4, suspicious abnormality, malignancy further stratified as 4A, N2% but ≤10%; 4B, N10% but ≤50%; and 4C, N50% but b95%; 5, highly suggestive (≥95% malignant), appropriate action should be taken. The categories 0 (requiring additional imaging evaluation or prior mammograms for comparison), 1 (negative), and 6 (known biopsy-proven malignancy) were not applicable, because this was a retrospective study. BI-RADS categories 2, 3, 4A were identified as negative, while categories 4B, 4C, and 5 were identified as positive [30]. The histopathology findings from surgery were used as the gold standard for the diagnosis of breast cancer. Category 4A means low suspicion of malignancy according to ACR BI-RADS [2]. Though histological diagnosis is necessary, the possibility of a benign lesion is much larger than the possibility of a malignant lesion. Therefore, we regarded BI-RADS 4A as negative in our analysis to avoid the dilution of the criteria of malignancy by a high number of benign lesions.

Statistical Analysis
Statistical analysis was performed using GraphPad Prism software (GraphPad Software Inc. CA, USA, version 6.02) and MedCalc software (MedCalc Software bvba, version 15.2.2). The overall comparison of clinical performance was derived from the differences between the mean area under the receiver operating characteristic (ROC) curve. The χ 2 test was used to determine differences in the final BI-RADS categories of breast suspicious calcifications based on the images taken with the two methods. Fisher's exact test was used to compare the sensitivity, specificity and to ascertain the difference in performance according to patient breast density and menopausal status. All statistical tests were 2-sided; a p-valueb.05 was considered to be statistically significant.

Calcification Characterization
Breast suspicious calcifications were characterized using BI-RADS categories from 2 to 5 to assess the probability of malignancy. The different BI-RADS categories of DBT and FFDM for suspicious calcifications are shown in Table 2. The difference of BI-RADS categories distribution for benign calcifications between DBT and FFDM was statistically significant, while no significant difference was observed in malignant calcifications. In benign group, FFDM classified more cases into BI-RADS 4B, 4C, and 5 which were identified as positive (44 cases vs 25 cases).

Comparisons of the Diagnostic Accuracy of DBT and FFDM in Patients with Different Menopausal Status and Breast Densities
The ROC curves of DBT and FFDM in premenopausal, postmenopausal, dense breast, and non-dense breast cases are shown in Fig. 1 and Table 4.
Of all 305 patients, 187 patients were premenopausal, whereas 118 patients were postmenopausal. The diagnostic accuracy of DBT and FFDM with different menopausal status is shown in Table 4. The diagnostic accuracy of DBT in premenopausal patients was higher than that of FFDM, with a statistically significant difference (88.4% vs 78.8%, χ2 = 5.576, p = .0182). In postmenopausal patients, the diagnostic accuracy of DBT and FFDM were 90.2% and 77.2% respectively, and the difference was also statistically significant (χ2 = 6.717, p = .0095). Of all 305 patients with 312 calcification clusters, 226 breasts were classified as dense (ACR3-4) and the remaining 86 as non-dense (ACR1-2). The diagnostic accuracy of DBT in dense breast cases was notably higher than that of FFDM (89.4% vs 81.9%, χ2 = 4.600, p = .0320). In non-dense breast cases, the diagnostic accuracy of DBT was slightly higher than that of FFDM, but the difference was not statistically significant (89.5% vs 84.9%, χ2 = 0.469, p = .4934).

Discussion
In the present study, we evaluated the diagnostic accuracy of 3D DBT relative to that of 2D mammography FFDM for breast suspicious calcifications, found that DBT could increase the sensitivity and specificity of the diagnosis of breast suspicious calcifications, which is quite helpful for the identification of benign calcifications, especially in young people with higher gland density. The diagnostic sensitivity of DBT and FFDM on breast calcifications were 92.9% and 88.8%; specificity were 87.9% and 75.2%, respectively, and the differences were statistically significant.
DBT significantly increased the diagnostic accuracy of total cases, from 79.5% for FFDM to 89.4% for DBT. The diagnostic accuracy difference between DBT and FFDM was significant for benign cases. Whereas, for malignant cases, a significant difference was not observed. The malignancy rates reported in the previous literatures ranged from 10% (8/78) to 39% (41/105) [31][32][33], which was consistent with our result. In our study, 98 (31%) sites of suspicious calcifications were proved malignant and 214 (69%) were benign. Our study did not include similar numbers of benign and malignant calcifications. It is possible that the less numbers of malignant calcification cases disturbed the difference between DBT and FFDM.
Our results seem to contradict several previous studies. Clauser et al. found that the diagnostic performance of DBT was as good as that of FFDM; however, a notable inter-reader difference was observed [24]. They concluded DBT enabled the detection and characterization of microcalcifications with no notable differences from FFDM. Even some investigators agreed that FFDM appeared to be slightly more sensitive than DBT for the detection of calcifications [27]. The inconsistency between their findings and ours may be due to some factors. First, these studies included fewer cases of calcifications, which adds the possibility of inaccuracy. Second, Clauser's study focused on comparing the differences between different readers and found high inter-reader variability  in the use of the descriptors [24], maybe this high inter-reader variability disturbed his research results. However, a few different studies indicated that inter-reader variability did not affect the accuracy, sensitivities and specificities between different methods for predicting the probability of malignancy [21,29]. Whether the inter-reader variability influences the diagnosis is still controversial. Dibble et al. found that DBT decreased inter-reader variability, increased the readers' confidence, and improved sensitivity in detecting breast architectural distortion [29]. In clinical application, accurately identifying findings from the mammography is mainly dependent on the reader's experience. In our study, we did not evaluate the inter-reader variability and the reading times. However, to decrease inter-reader variability as far as possible and guarantee a relatively accurate BI-RADS category, three experienced radiologists participated in prior training in the interpretation of DBT images. A consistent diagnosis among the three readers was employed in our study to decrease the bias of inter-reader variability and guarantee the diagnosis more accurate, which was consistent with Ohashi's study [15]. In our study, all the DBT images were acquired using the narrowangle (15°/15 projections) modality. The detectability of DBT is dependent on the tomographic scan angle, the number of projections, the radiation dose, and the reconstruction methods. In the wide-angle modality, owing to the greater tissue scanned by X-rays and the decreased dose per projection, the signals received by the detector lower and the relative noise increases, which may reduce the visibility of small structures (including microcalcifications) [28]. Maybe the use of wide-angle (50°/25 projections) DBT also can explain the results of Clauser's study [24].
Due to the pixels binning in DBT, the pixel pitch of DBT is larger than 2D-mammography, which makes DBT images look less sharp than FFDM images. Furthermore, the tube movement in DBT and the relative noise increase of each projection in wide-angle DBT may contribute to the geometric blurring [28]. Nevertheless, mild blurring of DBT images can't mask its advantage in breast screening and diagnosis [8][9][10][11][12][13]. According to the Mammography Quality Standards Act (MQSA) limit (a breast dose restriction of 3 mGy per acquisition) [34], slightly increasing the radiation dose of DBT may improve the blurring of DBT images. For the machine in our study, using the standard imaging phantom in the combo mode (DBT plus FFDM), the average glandular radiation doses for FFDM, DBT, and combo mode in a single view are approximately 1.25, 1.65, and 2.90 mGy, respectively. Every patient recruited in our research underwent FFDM and DBT imaging in combo mode. Although the average glandular radiation dose for combo mode doubled the dose for FFDM, the overall dose to the breast was within the MQSA limit. Hence, there is no need to worry that the use of this FDAapproved technique would be associated with any harm to the patients [35]. Osteras et al. investigated the average glandular dose in paired FFDM and DBT acquisitions in a population-based screening program (including 3819 women) and found that the mean dose for FFDM, DBT, and Combo was 1.72, 2.09, and 3.81 mGy, respectively [36]. Our study observed similar results. In the clinical practice, some parameters of the machine (including the tube loading and voltage) are determined by the automatic exposure control according to the features of the breast (compressed breast thickness and glandular composition), which make the actual breast doses vary between acquisitions.
Meanwhile, we characterized the suspicious calcifications using BI-RADS categories from two to five to assess the probability of malignancy. The difference of BI-RADS categories distribution for benign calcifications between DBT and FFDM was statistically significant, while the difference for malignant was not and thus both DBT and FFDM are alternative detection methods for malignant cases. Our results suggest that the accuracy of DBT in classifying benign calcifications is significantly higher than that of FFDM, and DBT classify more benign calcifications into BI-RADS 3 and 4A categories, probably because DBT relatively reduced the influence of overlapping tissues and radiologists are able to better assess the 3D character of a lesion in various planes [37,38]. Furthermore, we observed that, for malignant lesions, both DBT and FFDM classified most malignant calcifications into BI-RADS 4B, 4C, and 5 categories, which avoided delay in diagnosing the disease. In patients with benign calcifications, FFDM classified a significant proportion of patients into BI-RADS 4B category (Fig. 2), which may lead to unnecessary tests or even biopsies. This further proves that DBT has the advantage of avoiding unnecessary biopsies in patients with benign conditions manifest as microcalcifications. This observation regarding benign lesions is consistent with Tagliafico's study [25].
In premenopausal, postmenopausal, and dense breast cases, DBT diagnostic accuracy was higher than FFDM, with a statistically significant difference. But there was no significant difference in non-dense breast cases. This indicates the diagnostic advantage of DBT in premenopausal, postmenopausal, and dense breast populations. In Asian countries, including China, the female mammary gland is small and dense [39]. Compared to FFDM, DBT exhibits potential benefits in patients with dense breasts, such as reducing recall rates in screening mammography [6], improving preoperative cancer staging [12], improving cancer detection and mammographic sensitivity [17] by eliminating overlapping tissues, which can reduce false-positive rates and the number of biopsies.
According to the updated breast cancer screening guidelines [40], the American Cancer Society (ACS) currently recommends that women should undergo regular screening mammography starting at age 45 years. For women younger than 45 years, some may choose to be screened based on cancer individual risk factors, particularly those with family history [41]. The traditional mammography is not very useful in younger women owing to the dense breast tissue, making it harder to see potential cancers. DBT may be a better choice for screening in younger women, especially with microcalcifications.
This study had several limitations. First, we mainly recruited patients who underwent surgical biopsy for suspicious calcifications (BI-RADS 4A category or above) and agreed with this study. The patient who was diagnosed with suspicious calcification BI-RADS 2 or 3 category by both DBT and FFDM were excluded from our study. The bias of the population selection was inevitable. Second, this was a retrospective study and the patients were not randomized, which may not completely represent the clinical problem. Third, we reviewed the images and the pathological results mainly from our workstation and three readers were all from our hospital, which may not be consistent with a multicenter design. Nonetheless, the results of this study still provide clues not only in the diagnostic performance but also in the clinical operation, such as hookwire localization of breast suspicious calcification. Preoperative hookwire localization is an essential tool in the surgical management of non-palpable breast lesions, especially suspicious microcalcifications [42]. It is generally performed under 2D mammography guidance at the discretion of the radiologist. Then the computer calculates the skin entry site and the path to the lesion. Owing to the scattered and stereoscopic distribution of microcalcifications, and overlapping images of 2D mammography, inaccurate localization occurs frequently, which requires a relocation. DBT images can reveal the spatial distribution of calcifications [23], which may make the hookwire localization more accurate. Further studies on how to improve this technique is still needed.

Conclusions
From our data, compared with the conventional FFDM, DBT increased the sensitivity and specificity of the diagnosis of breast suspicious calcifications, which was beneficial for the identification of benign calcifications, especially in the young women with dense breasts. DBT exhibited a superior advantage in dense breasts and benign calcifications cases, while no advantage was observed in non-dense breasts or malignant calcifications cases. In the breast cancer screening for young women with dense breasts, DBT may be recommended for accurate diagnosis. Thus, our findings may assist the clinicians in applying the optimal techniques for different patients and provide a theoretical basis for the update of breast cancer screening guideline.

Ethical Approval
This retrospective study in this paper was approved by the Ethics Committee of the Affiliated Cancer Hospital of Zhengzhou University (Henan Cancer Hospital) and the patients provided written informed consent for the surgical biopsy and imaging. The study was conducted between 03/2015 and 03/2018.

Declaration of Interest
The Sponsors had no role on the design and conduct of the study; collection, management, analysis, or interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication. The views expressed are those of the authors and not necessarily those of the sponsors. The authors declare that they have no competing interests.

Submission declaration and verification
We declare that this manuscript has not been published elsewhere and is not under consideration by other journals. All authors have approved the manuscript and agree with submission to Computational and Structural Biotechnology Journal. If accepted, it will not be published elsewhere in the same form, in English or in any other language, including electronically without the written consent of the copyrightholder.