Artificial-intelligence-enhanced synthetic thick slabs versus standard slices in digital breast tomosynthesis

Objectives: Digital breast tomosynthesis (DBT) can provide additional information over mammography, albeit at the cost of prolonged reading time. This study retrospectively investigated the impact of reading enhanced synthetic 6 mm slabs instead of standard 1 mm slices on interpretation time and readers performance in a diagnostic assessment centre. Methods: Three radiologists (R1-3; 6/4/2 years of breast imaging experience) reviewed 111 diagnostic DBT examinations. Two datasets were interpreted independently for each patient, with one set containing artificial-intelligence-enhanced synthetic 6 mm slabs with 3 mm overlap, while the other set comprised standard 1 mm slices. Blinded to histology and follow-up, readers noted individual BIRADS categories and diagnostic confidence while reading time was recorded. Among the 111 examinations, 70 findings were histopathologically correlated including 56 malignancies. Results: No significant difference was found between BIRADS categories assigned based on 6 mm vs 1 mm datasets (p ≥ 0.317). Diagnostic accuracy was comparable for 6 mm and 1 mm readings (R1: 87.0% vs 87.0%; R2: 86.1% vs 87.0%; R3: 80.0% vs 84.4%; p ≥ 0.125) with high interrater agreement (intraclass correlation coefficient 0.848 vs 0.865). One reader reported higher confidence with 1 mm slices (R1: p = 0.033). Reading time was substantially shorter when interpreting 6 mm slabs compared to 1 mm slices (R1: 33.5 vs 46.2; R2: 49.1 vs 64.8; R3: 39.5 vs 67.2 sec; all p < 0.001). Conclusions: Artificial-intelligence-enhanced synthetic 6 mm slabs allow for substantial interpretation time reduction in diagnostic DBT without a decrease in reader accuracy. Advances in knowledge: A simplified slab-only protocol instead of 1 mm slices may offset the higher reading time without a loss of diagnosis-relevant image information in first and second readings. Further evaluations are required regarding workflow implications, particularly in screening settings.


INTRODUCTION
Digital breast tomosynthesis (DBT) constitutes a widely adopted technique for superior visualisation of pathologies in mammography, which in standard projections are often obscured due to superimposed parenchymal structures. DBT can reduce summation artifacts compared to full-field digital mammography (FFDM) by acquiring a series of low-dose mammographic images from different angles. Subsequent postprocessing of projection data allows for visualisation of the full breast thickness over multiple slices that can be analysed manually and with the help of computer-aided detection (CAD) systems. Furthermore, a synthetic mammography image can be reconstructed (synthetic 2D) without additional radiation exposure. 1 Tomosynthesis combined with mammography (FFDM or synthetic 2D) has been shown to improve cancer detection rates in different population-based screening settings [2][3][4][5][6][7][8][9][10][11] and is incorporated in the current European Guidelines. 12 While higher cancer detection rates are well established, contradictory results remain for recall rates. [13][14][15] However, a recently published meta-analysis 16 suggested fewer recalls when employing DBT including synthetic 2D compared to FFDM alone, as well as higher cancer detection rates among recalls.
On the other hand, studies have demonstrated a substantial increase in reading time, as much as doubling time per study when adding DBT to standard FFDM. 6,17 This is particularly important in the screening setting, where the number of patients is considerably higher than in a diagnostic Objectives: Digital breast tomosynthesis (DBT) can provide additional information over mammography, albeit at the cost of prolonged reading time. This study retrospectively investigated the impact of reading enhanced synthetic 6 mm slabs instead of standard 1 mm slices on interpretation time and readers performance in a diagnostic assessment centre. Methods: Three radiologists (R1-3; 6/4/2 years of breast imaging experience) reviewed 111 diagnostic DBT examinations. Two datasets were interpreted independently for each patient, with one set containing artificialintelligence-enhanced synthetic 6 mm slabs with 3 mm overlap, while the other set comprised standard 1 mm slices. Blinded to histology and follow-up, readers noted individual BIRADS categories and diagnostic confidence while reading time was recorded. Among the 111 examinations, 70 findings were histopathologically correlated including 56 malignancies. Results: No significant difference was found between BIRADS categories assigned based on 6 mm vs 1 mm datasets (p ≥ 0.317). Diagnostic accuracy was comparable for 6 mm and 1 mm readings (R1: 87.0% vs 87.0%; R2: 86.1% vs 87.0%; R3: 80.0% vs 84.4%; p ≥ 0.125) with high interrater agreement (intraclass correlation coefficient 0.848 vs 0.865). One reader reported higher confidence with 1 mm slices (R1: p = 0.033). Reading time was substantially shorter when interpreting 6 mm slabs compared to 1 mm slices (R1: 33.5 vs 46.2; R2: 49.1 vs 64.8; R3: 39.5 vs 67.2 sec; all p < 0.001). Conclusions: Artificial-intelligence-enhanced synthetic 6 mm slabs allow for substantial interpretation time reduction in diagnostic DBT without a decrease in reader accuracy. Advances in knowledge: A simplified slab-only protocol instead of 1 mm slices may offset the higher reading time without a loss of diagnosis-relevant image information in first and second readings. Further evaluations are required regarding workflow implications, particularly in screening settings. assessment setting. Standard DBT protocols usually rely on reconstructed slices with 1 mm thickness, hence, depending on breast thickness, up to a few hundred slices are generated per patient vs four standard projections in conventional mammography. In addition to increased reading time, considerably more storage space in picture archiving and communication systems is necessitated. 18 Above all, however, concerns regarding overall workload for reading radiologists and accompanying fatigue resulting in false assessment ought to be addressed. 19 With significantly more examinations being read by a radiologist per day, the number of images per DBT is especially impactful in the screening setting. Considering that diagnostic assessment demands a precise characterisation of lesions and includes a larger proportion of pathologic findings, however, the challenges for radiologists in this setting are not to be underestimated either.
Few studies have investigated the issue of optimal slab thickness in DBT with varying recommendations. 20,21 Since these prior analyses derived their populations mostly from screening centres, the purpose of this investigation was to analyse the interpretation performance and reading time in DBT by comparing standard 1 mm slices vs artificial-intelligence-enhanced synthetic thick slabs of 6 mm in a diagnostic assessment setting.

METHODS AND MATERIALS
This single-centre study was performed after gaining permission from the local institutional review board (IRB number: 20220425 02). The need for additional written informed consent was waived due to the retrospective study design. The Standards for Reporting of Diagnostic Accuracy (STARD) guidelines were adhered to.

Population
Patients undergoing DBT from September 2020 through December 2021 for further assessment of mammographic findings in a diagnostic setting at a tertiary-care university hospital were included in this retrospective evaluation. Since our institution is no dedicated screening centre, all patients were either symptomatic (20 cases, 19.8%), had a conspicuous finding in a previous examination (42 cases, 41.6%), were evaluated preoperatively for biopsy-proven malignancy (29 cases, 28.7%), or received further assessment during/after cancer therapy (10, 9.9%). The centre assesses 2500 cases per annum and the number of primary breast cancer treatments performed in our hospital in 2021 was approximately 300. Inclusion criteria for this study were defined as presence of one or more lesions with either histopathological correlation or benign result as per one-year consistency, definite benign correlate in ultrasound or pure summation artefact (asymmetry). Exclusion criteria comprised incomplete datasets, lack of follow-up or prior biopsy with clip insertion. Adhering to these standards, a total of 101 patients were eligible for study inclusion. The population consisted of 100 women and one male with a mean age of 59.7 ± 12.1 years. The left breast was examined in 64 cases (57.7%). When stratified by fibroglandular tissue type according to the American College of Radiology classification system, parenchymal density was "a" in 7 patients (6.3%), "b" in 57 patients (51.4%), "c" in 37 patients (33.3%), and "d" in 10 patients (9.0%).

Imaging
All DBT scans were performed either in mediolateral oblique (MLO: 78 patients; 70.3%), craniocaudal (CC: 21; 18.9%), or lateromedial orientation (LM: 12; 10.8%) with a dedicated mammography system (3Dimensions, Hologic Inc., Marlborough, Massachusetts, USA). Mean compression thickness was calculated at 54.1 ± 14.2 mm for MLO, at 60.0 ± 14.2 mm for CC, and at 47.3 ± 12.8 mm for LM. From each scan, two datasets were reconstructed, consisting either of synthetic 6 mm slabs with 3 mm overlap ("thick slabs"), or standard 1 mm slices ("thin slices"). In order to offset potential blurring and impaired delineation of regions of interest in maximum intensity projections, a vendor-specific algorithm enhanced by artificial intelligence (AI) was employed for generation of thick slabs (3DQuorum, Hologic). This machine-learning algorithm analyses the standard 1 mm slices for various findings such as bright spots, circumscribed densities, or radiating lines and emphasises these in the process of merging the thin slices in order to maintain conspicuity, especially with regard to minute lesions. While a CAD system is also integrated in the software, only the unmarked thick slabs were part of the reader analysis. An example depicting conventional slices vs AI-enhanced slabs in a case with several mass lesions and associated clustered microcalcifications representing a multifocal invasive breast carcinoma is provided in Figure 1. Synthetic 6 mm datasets included 22 ± 5 (MLO), 20 ± 5 (CC), and 18 ± 4 (LM) thick slabs, while standard 1 mm stacks comprised of 66 ± 14 (MLO), 60 ± 14 (CC), and 66 ± 12 (LM) thin slices. Effectively, thick slab datasets were reduced by twothirds of the standard file size.

Image analysis
Three radiologists with 6 (R1), 4 (R2), and 2 years (R3) of clinical work experience in breast imaging and DBT reviewed all datasets separately, independent from one another, and in randomised order. A 30-day washout period between cases of the same patient was ensured to minimise recall bias. The readers were provided with identical viewing setups on certified diagnostic monitors and a standardised assortment of individual datasets including the FFDM that warranted further assessment via DBT. Prior to commencement of study reading sessions, one author met with the readers and individually reviewed five training cases not included in the study population in order to accustom readers to the study workflow. Readers were blinded to any clinical and histopathological information, as well as to previous imaging studies. Lesions could be characterised as masses, asymmetries, microcalcifications, and architectural distortions. Masses associated with microcalcifications were classified as mass findings. Additional lesions within the same dataset were only evaluated if clearly not associated with the primary findings. For each lesion, readers noted Breast Imaging Reporting and Data System (BIRADS) categories and diagnostic confidence based on an equidistant 5-point-scale (1 = very low confidence, 5 = total confidence). For the calculation of diagnostic accuracy, BIRADS 4 and 5 categories were deemed indicative of malignancy suspicion, whereas BIRADS 1, 2, and 3 categories were considered to represent a primarily non-malignant assessment. As readers were blinded to histopathological information, BIRADS 6 categories could not be assigned for any finding. Furthermore, since all findings were unanimously proven to be either benign or malignant, readers were not allowed to attribute a BIRADS 0 category. Individual reading times, which included interpreting the case and opening/closing the file, were measured manually. Notably, time recordings were started by an external observer when the interpreting radiologist opened a case file and stopped when the radiologist closed the case file in the PACS system. In order to provide consistent measurements and since loading time is negligible on the diagnostic workstations positioned in our department (latency below 2 s), these intervals were not captured separately and the time from file opening to closing was considered representative of the actual interpretation time.

Data analysis
Data analysis was supported by dedicated software (SPSS Statistics 28, IBM, Armonk, USA). Presentation of non-parametric variables includes absolute numbers and frequencies with median values and interquartile ranges (IQR). Wilcoxon signed rank tests were employed for comparison of suchlike data. With regard to normally distributed parametric data, mean values ± standard deviation were reported and results were compared with paired student's t-tests. Differences in classification functions of diagnostic accuracy between thick slabs and thin slices were assessed by means of the McNemar test. For interrater reliability, the intraclass correlation coefficient was calculated in a two-way random effects model that analysed the absolute agreement of single measures. An α level of 0.05 was deemed representative of statistical significance.

Reading results and diagnostic confidence
Frequency of individual BIRADS categories attributed by each reader for thick slabs and thin slices are presented in Table 2. No significant difference was found between BIRADS categories assigned based on 6 mm vs 1 mm datasets (p ≥ 0.317). Diagnostic accuracy was comparable for readings of slabs and slices (R1: 87.0% vs 87.0%; R2: 86.1% vs 87.0%; R3: 80.0% vs 84.4%; p ≥ 0.125). In 1 mm slices, readers missed one malignant lesion each. In 6-mm synthetic thick slabs, all readers missed the same carcinoma, while R3 did not detect an additional malignancy. Diagnostic sensitivity among readers ranged between 96.43 and 98.21%. A comprehensive display of classification functions of diagnostic accuracy is provided in Table 3. Good interrater reliability was found for both slabs (ICC 0.848; 95% confidence interval 0.800-0.888; p < 0.001) and slices (ICC 0.865; 95% confidence interval 0.822-0.901; p < 0.001).

DISCUSSION
In this retrospective multireader study, we could demonstrate a significant reduction in reading time for synthetic 6 mm slabs enhanced by artificial intelligence vs 1 mm slices for digital breast tomosynthesis in a diagnostic setting without a decrease in diagnostic accuracy. All readers stated at least good confidence levels regarding interpretation of the synthetic slabs.
Our findings are in line with the results of Pujara et al., who found reduced reading time in three-quarters of all examined cases when employing a DBT protocol with 6 mm slabs while maintaining diagnostic performance. 21 Notably, the population of this investigation consisted of asymptomatic females who were referred for breast cancer screening, diagnostic evaluation, or biopsy. Furthermore, different mammography equipment and PACS software/image hangings were used, suggesting a certain degree of transferability of results to other study settings. Figure 3 illustrates a case of multicentric invasive disease and clustered microcalcifications from our population, where the complete extent of the aberration is depicted within one of the 6 mm slabs. Although the lesion was correctly identified by all three readers irrespective of slice thickness, superior delineation in the synthetic slabs and less scrolling may have contributed to the marked reduction in reading time by as much as 41.3% in the case of the most inexperienced reader. The diagnostic sensitivity achieved in our study with the use of 6-mm-thick synthetic   20 Of note, the authors of this publication relied on a mostly asymptomatic population derived from a screening trial, which was divided into a "specificity set" (12 cancers vs 882 negative DBT scans) and a "sensitivity set" (28 cancers vs 276 negative DBT scans), resulting in a high overall proportion of negative exams in their population. Despite the different study samples and higher slab thickness used by Iotti et al., the absolute and relative interpretation time reduction were comparable to our study. With that being said, we agree with the deduction of Pujara et al that 6 mm may be the 'sweet spot' for slab thickness, since the time saving effect with thicker slabs appears to be negligible compared to the ensuing loss of diagnostic sensitivity. However, we believe that the lesion size must be taken into consideration when designing a thick slab protocol.
Since the mean pathologic lesion size in this study was 30.8 ± 33.7 mm and therefore significantly larger than what would be expected in a dedicated screening centre, different specifications regarding slab thickness and overlap may produce superior results in other settings.
Partly in contrast to the results of the present study, Pujara et al reported reader confidence to be inferior for their experimental protocol with 6 mm slabs compared to a standard protocol with 1 mm slices despite their readers' longstanding experience in breast imaging which ranged from 10 to 30 years. 21 Possibly, a certain learning or accommodation period for new approaches ought to be considered, especially for users with long-term expertise in the field. Supporting this line of argument, i.e. that habituation to the commonly utilised 1 mm slices leads to inferior assessment of the newer thick slab technique, may also be reflected in our findings, as only the most experienced reader reported (slightly) lower confidence levels for reading 6 mm slabs.
Friedewald et al. described an approach to interpreting DBT images by scrolling through the reconstructed stack multiple times and focusing on different compartments of the breast during each pass. 22 It may be assumed that this technique synergises particularly well with the use of thick slabs due to less images being reviewed each time. In addition to a standardised Figure 3. 37-year-old female with biopsy-proven invasive breast cancer of no special type and associated ductal carcinoma in situ. Reading time of three radiologists was substantially shorter for 6 mm slabs compared to 1 mm slices, presumably due to superior visualisation of regional microcalcifications. viewing setup, toggling between prior and current examinations may also decrease the required reading time of radiologists. 23 A more advanced technique that can reduce interpretation time up to 30% is employment of CAD systems. [24][25][26] While the algorithm responsible for AI-enhanced postprocessing of 1 mm slices to create the synthetic thick slabs in our study does provide additional CAD, the results of these analyses were not accessible to the readers in order to maintain comparability of interpretation time and confidence. It remains to be seen whether the combination of CAD systems and AI-enhanced thick slabs brings further benefits compared to each technique on its own. Further investigations are also warranted with direct comparison of manually reconstructed 6 mm slabs as maximum intensity projections vs AI-enhanced synthetic 6 mm slabs.

LIMITATIONS
Some limitations inherent to the current study need to be addressed. For one, the retrospective study design cancels all consequences regarding faulty interpretations, possibly further impacting interpretation times in general due to the lack of clinical pressure. Because of the nature of the study, no randomised negative control group was included.The proportions of microcalcifications (13.9%) and DCIS (7.1%) within the group of histopathologically proven malignancies were statistically underrepresented in our dataset and do not reflect the typical frequency in a screening centre, where approximately 25% of cancers are DCIS. Furthermore, the mean pathologic lesion size in our patient sample was considerably larger than the vast majority of findings encountered in a dedicated screening setup.
Breast screening programmes such as the one in the UK stipulate minimal standards for breast cancer size detection under 15 mm (concerns more than 55% of both prevalent and incident round cancers). Therefore, limited generalisability of the presented results must be presumed for a dedicated screening setting. Of note, since tomosynthesis is considered to be less useful for assessment of microcalcifications than of masses, additional technical and software developments are required for the display of calcified lesions in DBT in general. Particularly, the current tendency to replace FFDM with synthetic mammograms from DBT examinations emphasises the need for proper solutions in that regard. 27 Lastly, our study was a single-institution and single-manufacturer investigation, hence the actual benefit of the AI-based reconstruction method cannot be quantified, since no conventional maximum intensity projections were available for comparison. Future studies should aim to replicate findings in larger datasets with additional unenhanced standard 6 mm slabs.

CONCLUSIONS
Employing a reconstruction protocol with artificial-intelligenceenhanced synthetic 6 mm slabs instead of 1 mm slices for digital breast tomosynthesis allowed for substantial interpretation time reduction without a decrease ininterpretation accuracy in the presented diagnostic setting.

FUNDING
Open Access funding enabled and organized by Projekt DEAL. Jan-Peter Grunz was supported by the Interdisciplinary Centre of Clinical Research Würzburg, Germany (Z-3BC/02).