Strengths and challenges of the artificial intelligence in the assessment of dense breasts

Objectives: High breast density is a risk factor for breast cancer and overlapping of glandular tissue can mask lesions thus lowering mammographic sensitivity. Also, dense breasts are more vulnerable to increase recall rate and false-positive results. New generations of artificial intelligence (AI) have been introduced to the realm of mammography. We aimed to assess the strengths and challenges of adopting artificial intelligence in reading mammograms of dense breasts. Methods: This study included 6600 mammograms of dense patterns “c” and “d” and presented 4061 breast abnormalities. All the patients were subjected to full-field digital mammography, breast ultrasound, and their mammographic images were scanned by AI software (Lunit INSIGHT MMG). Results: Diagnostic indices of the sono-mammography: a sensitivity of 98.71%, a specificity of 88.04%, a positive-predictive value of 80.16%, a negative-predictive value of 99.29%, and a diagnostic accuracy of 91.5%. AI-aided mammograms presented sensitivity of 88.29%, a specificity of 96.34%, a positive-predictive value of 92.2%, a negative-predictive value of 94.4%, and a diagnostic accuracy of 94.5% in its ability to read dense mammograms Conclusion: Dense breasts scanned with AI showed a notable reduction of mammographic misdiagnosis. Knowledge of such software challenges would enhance its application as a decision support tool to mammography in the diagnosis of cancer. Advances in knowledge: Dense breast is challenging for radiologists and renders low sensitivity mammogram. Mammogram scanned by AI could be used to overcome such limitation, enhance the discrimination between benign and malignant breast abnormalities and the early detection of breast cancer.


INTRODUCTION
Breast cancer is the most common cancer in females and a primary cause of death from cancer, but early detection and treatment can considerably improve outcomes. 1,2east density is a radiologic term that refers to the proportion of radio-opaque parenchymal tissue in mammograms compared to the radiolucent fatty tissue. 3Breast density is mostly inherited, but it can be influenced by other factors throughout a female's life.Multiple studies on dense breasts found that age, hormone use, body mass index, and the reproductive status are factors that could influence breast tissue density. 4gh breast density is an independent risk factor and the strongest marker of breast cancer, in both premenopausal and postmenopausal females. 3Females with extremely dense breasts were 17 times more likely to develop interval cancer than females with fatty breasts. 5nse breasts on mammogram present more false-negative results than less dense patterns.It is attributed to the masking effect that lowers the mammographic sensitivity with consequent threefold increase in the recall rate. 6east ultrasound has been used to characterize breast lesions for many years, and it has the advantage of being readily available, affordable, and well tolerated by females. 7t improves the sensitivity and specificity of mammography in dense breasts, elevates rates of breast cancer detection, and reduces unnecessary biopsies. 8 the last couple of years, mature products of artificial intelligence (AI) for computer interpretation of Objectives: High breast density is a risk factor for breast cancer and overlapping of glandular tissue can mask lesions thus lowering mammographic sensitivity.Also, dense breasts are more vulnerable to increase recall rate and false-positive results.New generations of artificial intelligence (AI) have been introduced to the realm of mammography.We aimed to assess the strengths and challenges of adopting artificial intelligence in reading mammograms of dense breasts.Methods: This study included 6600 mammograms of dense patterns "c" and "d" and presented 4061 breast abnormalities.All the patients were subjected to fullfield digital mammography, breast ultrasound, and their mammographic images were scanned by AI software (Lunit INSIGHT MMG).Results: Diagnostic indices of the sono-mammography: a sensitivity of 98.71%, a specificity of 88.04%, a positivepredictive value of 80.16%, a negative-predictive value of 99.29%, and a diagnostic accuracy of 91.5%.AI-aided mammograms presented sensitivity of 88.29%, a specificity of 96.34%, a positive-predictive value of 92.2%, a negative-predictive value of 94.4%, and a diagnostic accuracy of 94.5% in its ability to read dense mammograms Conclusion: Dense breasts scanned with AI showed a notable reduction of mammographic misdiagnosis.Knowledge of such software challenges would enhance its application as a decision support tool to mammography in the diagnosis of cancer.Advances in knowledge: Dense breast is challenging for radiologists and renders low sensitivity mammogram.Mammogram scanned by AI could be used to overcome such limitation, enhance the discrimination between benign and malignant breast abnormalities and the early detection of breast cancer.

BJR|Open
Mansour et al mammographic images have been developed and solved the limitations of the conventional computer-aided diagnosis (CAD) systems, which is the need for programmer-specified characteristics of malignancy. 9,10nce high density breasts are challenging and missing cancers on mammogram in such situation are not uncommon, so, we aimed to assess the impact of integrating AI with the mammogram to upgrade the diagnostic performance of cancer in high density breasts (i.e., ACR c and d) in correlation with the performance of mammogram combined by breast ultrasound.The strengths and challenges that may face the reader in such situation will be discussed to enhance and support the radiologist-artificial intelligence verification of the cases diagnosis.

METHODS
This prospective study included a total of 3300 patients presented with 6600 mammograms during the period from February 2020 till February 2022.The age of the patients ranged from 29 to 83 years (mean age 47.89 ± 10.99).
All the involved patients performed full-field digital mammography, breast ultrasound.Images of mammogram were exported and processed by a diagnostic supportive AI software (Lunit INSIGHT MMG).
Confirmation of the final diagnosis was done via: 1) true cut tissue core biopsy by a 14 G needle for all suspicious/malignant abnormalities and for benign looking lesions on request of the patient /the referring physician.2) At least 1 year of imaging follow-up by mammogram, ultrasound, and AI of stationary findings for the benign looking abnormalities and normallooking mammograms.
The study was approved by the ethical committee of the Radiology department, and an informed written consent was taken from all patients.

Inclusion criteria
• Patients with dense breasts on a mammogram (ACR c or d) presented for either screening or diagnostic scanning.

Exclusion criteria
• Patients with low breast density on a mammogram (ACR a or b).• Patients with induced increased density of the breast whether due to chemotherapy or recent post-operative changes.• Patients with a contraindicated status for mammography, e.g. pregnancy.

Equipment
• Full-field digital mammography machine (Amulet Innovality, Fujifilm Global Company, Japan) • Ultrasound device (LOGIQ S8-GE) using a high-frequency linear probe (7-12 MHz) for breast scanning.Interpretation of mammogram and ultrasound was done following the standard reporting system in BI-RADS ACR atlas fifth edition 2013, 12 for 4061 mammograms that showed benign and malignant breast abnormalities.Breast with multiple masses was assigned by the category of the mass presented the highest BI-RADS category score.
The used AI algorithm was developed based on deep convolutional neural networks (CNNs).ResNet-34, which is one of the most popular CNN architectures, was used as a backbone network. 13e algorithm training consisted of two stages: patch-level training from scratch for learning low-level features (Stage 1), followed by image-level finetuning from the stage-1 model for learning high-level context (Stage 2).For each mammogram view (i.e., one of the four traditional bilateral cranio-caudal and medio-lateral oblique views), the AI algorithm provides pixellevel abnormality scores as a heatmap and representative abnormality numerical scores which are floating-point values between 0 and 1.
The heat map highlighted the breast abnormality, provided locations information and generated scoring percentage reflecting the probability of malignancy for the lesion within the range <10 and 100% (100% represents the highest level of suspicion).
The AI category was determined for each breast according to the probability of malignancy score as follows: 0-9 for definite noncancer, 10-25 for probably non-cancer, 26-50 for possibly noncancer, 51-75 for possibly cancer, 76-99 for probably cancer, and 100 for definite cancer. 14rrelation between the sono-mammographic BI-RADS and the AI category for each breast and the histopathological results or follow-up were performed.in quantitative data and using frequency (count) and relative frequency (percentage) for categorical data.Standard diagnostic indices including sensitivity, specificity, positive-predictive value (PPV), negative-predictive value (NPV), and diagnostic efficacy were calculated.For comparing categorical data, the chi-square (c2) test was performed.Exact test was used instead when the expected frequency is less than 5.

Statistical analysis
The interobserver variability was measured (to calculate measurement error intrinsic to between-observer difference) using κ indices.Confidence interval percentage (CI %) was done for the range of the abnormality scoring values elicited by the AI software where the narrower the interval (upper and lower values), the more precise is the AI estimate.
The highest area under the ROC curve was plotted for the AI performance when added to the mammogram and a cut off value was calculated for the abnormality scoring percentage.

RESULTS
In the current study, 3135/3300 (95 %) cases were assigned a breast density score of "c" and 165/3300 (5%) cases were assigned a score of "d" according to the ACR breast density classification.
The remaining of the included benign lesions; 32.6% (n = 1327) and the normal assigned mammograms were confirmed by stationary course after one year of follow-up with regard conventional imaging (mammogram and ultrasound) and AI scoring of <10%.The distribution of the different pathological entities was demonstrated in Table 1 The included benign and malignant breast abnormalities were either mammography detected or mammography indiscernible from the overlapping glandular tissue and were detected on ultrasound scanning.BI-RADS category was given according to the combined evaluation of the mammography and ultrasound where 1544 breast lesions (23.4 %) were given BI-RADS 5 and 1125 lesions (17.1%) were given BI-RADS 4.
The mammography and ultrasound findings of the studied breast abnormalities (total = 4061) were demonstrated in Table 2.
After revising results of pathology or close follow-up (for benign looking or normal cases), 2141 mammograms were true positive, 3901 were true negative, 530 were false positive, and 28 were false negative.
Diagnostic indices of the performance of the mammogram combined by the ultrasound without consideration of the AI scoring were a sensitivity of 98.71% (95% CI: 93. 13  The diagnostic indices of the performance of the mammogram and ultrasound with and without the consideration of the AI scoring were summarized in Table 4. The highest area under the ROC curve was plotted for the AI performance when added to the mammogram and ultrasound to be 0.934 (95% CI: 0.854 to 0.966) which correlated with a cut-off value of 39% as the optimal abnormality scoring percentage to distinguish benign from malignant breast abnormalities in dense breasts.DISCUSSION 11 Higher mammographic breast density may adversely affect the sensitivity and specificity of mammography. 15th a relatively low number of expert screening radiologists, the expansion of breast cancer screening programs necessitates assistance in mammography evaluation to increase the detection rate and reduce workload issues.AI in breast cancer screening may meet these needs and can be used as a second reader for mammographic images. 16,17e diagnostic performance of AI was less affected by breast density than the performance of radiologists, resulting in a significant improvement in radiologists' AI-aided performance in dense breasts. 3is study discusses the performance of AI in the detection and establishment of the correct diagnosis for mammograms with high breast density i.e., ACR "c" and "d".
The findings of sono-mammography and the AI system were analyzed in 3300 patients, their ages ranged between 29 and 83 years with a mean age of 47.89 ± 10.99, and all had primary dense breasts.
Previous work found females of mean age 44 ± 7 years had the highest breast density with a significant negative correlation observed between age and breast density category. 18ong the malignant group, invasive ductal carcinoma (IDC) was the commonest histopathological type {76%(n = 1646)} and simple fibroadenoma {64.8% (n = 1226)} was the commonest benign one.
The False-negatives were encountered in 28 lesions were proved to be 1ten non-calcified DCIS, seven lobular carcinoma, five mucinous carcinoma, two papillary carcinoma, two malignant phyllodes, one metaplastic carcinoma, and one medullary carcinoma by pathological diagnosis.
Thus, the elicited sensitivity was 98.71% and specificity was 88.04% for sono-mammography.Chen et al, found out that the sensitivity of mammography was significantly low in patients with high breast density where it showed 84.5% in low breast density verses 65.8% in case of dense breasts (p < .001).Also, they stated that ultrasound had higher sensitivity than mammography (p < .001),and a better sensitivity was achieved when mammography was combined with ultrasound than mammography or ultrasound alone (p < .001), 19evious studies have shown that using AI supports the decision of the radiologists during their work in assessing breast lesions. 17-based CAD characterizes the detected lesion and stratifies the risk of biopsy.This would increase the positive-predictive value of biopsy while decreasing intra-and interobserver variation by reducing the impact of AI training and experience differences. 20 In this experience, reading of the mammograms was enhanced from a κ value of 0.86 to 0.91 (95% CI, 0.90-0.92)when supported with the AI algorithm.
In our study, the overall accuracy of AI (94.5%) was higher than that of combined mammography and ultrasound (91.5%).Upon correlation with the pathology or the interval follow-up, there were 1915 true positive, Figure 1 and 1730 true benign abnormalities, Figure 2.
Our results were comparable to a multireader study conducted upon 2652 exams and interpreted by 101 radiologists versus the AI software.Their study showed that the diagnostic accuracy of the AI system was statistically non-inferior to that of the radiologists.The AI system had a diagnostic accuracy of 84% and the average of the radiologists was 81.4%. 21e radiologist could use AI-CAD in the characterization of the suspicious microcalcifications on mammography as its performance was similar to that of the radiologists where the estimated area under the curves (AUCs) of the radiologists and AI-CAD were not significantly different (0.722 vs 0.745) which indicated that AI-CAD could be of benefit in making clinical decisions for breast microcalcification and accordingly avoid unnecessary biopsies. 22Through our experience, it was noted that the extend of the color hue and the abnormality scoring could be a clue to distinguish suspicious microcalcifications from the overlapping grouped benign ones (Figure 3).
It was also observed that although the AI presented lower sensitivity than sono-mammography in correlation with the final diagnosis, yet the specificity was higher.These results were comparable to another study performed by Roela et al., 2021 23 .
Their study showed that the sensitivity of AI; AI with radiologist; radiologist alone, was 76.08%; 84.02%; 80.91%, respectively.Specificity for AI; AI with radiologist; radiologist alone, was 96.62%; 85.67%; 84.89% respectively. 23Mansour and co-authors 24 showed a sensitivity of 96.8% and a specificity of 90.1% in the discrimination between benign and malignant breast lesions by the AI-aided mammograms and mammography combined with ultrasound examinations showed a sensitivity of 98.6% and a specificity of 91.6%.

BJR|Open
Original Artificial intelligence and dense breasts Some work had focused on the stand-alone performance of AI and compared it to the radiologist.For instance, Rodriguez-Ruiz, et al 21 found that the AI system had AUC higher than 61.4% of the radiologists since that the average value of the AI system was a 0.840 (95% CI: 0.820 to 0.860) and the average of the radiologists was 0.814.
Further studies showed the performance level of AI at the mammogram reader level was 0•940, 25 and when combined the AI algorithms with the ultrasound, performed by the radiologist, the higher area under the curve was 0.942 and achieved a significant specificity and sensitivity of 92.0%.
The current work agreed with these studies where the addition of the AI to the performance of the mammogram and ultrasound showed the highest point for area under the ROC to be 0.934 (95% CI: 0.854 to 0.966), which achieved higher specificity (96.34%) than the performance of the included imaging modalities without AI (88.04%) which correlated with a cut-off value of 39% in discrimination between benign and malignant abnormalities at dense breasts.
There were 162 false-positive lesions by AI in the current work.The relative decrease in false-positive lesions by AI compared to sono-mammography is contributed to the fact that the AI system performs two separate tasks.It does not only detect lesions but also classify them.For example, masses are classified according to density (gray level), shape, texture, and relation to surrounding to reach a final diagnosis. 16lse Incorrect high scoring in case of benign conditions presented in mammograms with heterogenous and/or in-distinction abnormalities as in extensive fibrocystic changes and diffuse scattered heavy powdery secretory-induced calcifications.Also small, localized organization of small, clustered cysts may mimic the AI On the other side, there were 254 false-negative lesions that was skipped from the AI algorithm due to either lesions were of small size, lesions were with relative fat near density, or both criteria (Figure 5).Also among the false-negative challenges were masses that showed breaking down with relatively decreased mammographic density as well as pathologies that presented by just asymmetry or increased density that displayed pattern of extension in plane with the breast tissue and were associated with no obvious distortion on mammogram, Figure 6.
False-negative results of AI, in general, may be attributed to the fact that it depends on the extraction of quantitative features, these features are related to tumor size, shape, intensity, and texture to depict information on pixel distribution within the image and provide a comprehensive results characterization. 26,27 AI algorithms become more complex, AI may begin to serve as an assistant rather than a tool, occasionally it can act independently with frequent periods of supervision by an expert radiologist. 28 MG has limited diagnostic sensitivity in patients small breast cancer, especially in those with dense breast tissue.US is better than MG at detecting small breast cancer, regardless of breast density. 20ring the current experience, it was observed that AI can detect small-sized carcinoma and moreover the tiny minions that could be a challenge in high density breasts (Figure 7).
The used AI algorithm does not score typically benign looking masses and sometimes underestimate breast lesions that lack most of the mammographic criteria of malignancy.In such situation, no focal areas will be spotted by the color hue at the heatmap images and the given abnormality scoring for the breast as a whole will be "Low" scoring.
With the AI assistnace, breast abnormalities in densely glandular tissue are easily targeted, probably categorized in less time and effort for tissue diagnosis.However,to get the best performance of AI-aided mammograms in case of dense breast (i.e., ACR c and d) and overcome the possibility of mismatching low scoring, it is recommended to support the AI-mammogram findings with an ultrasound examination. 24so, in case of diffuse breast diseases and proved malignancy; contrast-based study (i.e., contrast -enhanced MR imaging or contrast enhanced mammogram) is recommended in addition to the traditional imaging for precise disease demonstration (i.e., diagnosis and extension) Figures 2 and 7.

CONCLUSION
The AI algorithm applied on mammogram of the dense breasts showed a notable reduction of sono-mammographic misdiagnosis.Knowledge of such software challenges would enhance its application as a decision support tool for screening and diagnostic mammography of breasts with heavy glandular tissue.

Figure 1 .Figure 2 .
Figure 1.Left breast invasive ductal carcinoma grade III in a female patient 37-years-old with extremely dense breasts (ACR d).A: bilateral digital mammogram (cranio-caudal and medio-lateral oblique views).The density of the breasts decreased the mammogram sensitivity-there is no obvious masses at first sight-the left breast showed upper inner tissue peaking (arrow) which is indirect suspicion sign of malignancy.B: AI scanned mammogram presented intense color hue that targeted specifically the left breast mass, confirmed the suspicion of malignancy in the form of abnormality scoring of 99% and displayed the full extent of the disease.C: Ultrasound image displayed comparable information to that of the AI where there was 11 clock left breast purely solid irregular mass (BI-RADS 5) (arrows).The case was a true positive on the level of conventional imaging and AI algorithm -positive pathologies were proliferative fibrocystic mastopathy with micro-calcifications (n = 45), adenosis with calcifications (n = 40), fibrocystic changes without calcifications (n = 30), granulomatous mastitis (n = 21), benign phyllodes (n = 10), fat necrosis (n = 7), intraductal papilloma (n = 3), sclerosing adenosis (n = 3) pseudoangiomatous stromal hyperplasia (n = 2), and adenosis without calcifications (n = 1).

Figure 4 .Figure 3 .
Figure 4. Two cases; 4A, 4B and 4C, right breast atypical ductal hyperplasia in a heterogeneously dense breasts of a 37-year-old female (ACR c).A: bilateral mammograms showed right breast upper outer quadrant indistinct high dense abnormality likely mass (arrow).B, mammograms scanned by the AI algorithm displayed localized intense red color of the detected right breast abnormality and a very high abnormality scoring of 96% suggestive of most likely carcinoma.C, Ultrasound image of the right breast abnormality presented a cluster of small slightly turbid cysts (BI-RADS 3).The case with the aid of ultrasound was a true negative while it was a false positive for AI.Second case; 4D and 4E, female patient 50-year-old with dense breasts (ACR d), that had a left breast upper outer quadrant complicated large cyst.D: Artificial intelligence -aided mammograms displayed "Low" scoring although there is an obvious left breast upper outer quadrant mass of partly circumscribed, partly obscured border (arrow).E, ultrasound image of the left breast showed large cystic mass with turbid echogenic content and mural based non-soft tissue solid component

Figure 5 .Figure 6 .
Figure 5. Female patient 50 years old with heterogeneously dense breasts (ACR c) and diagnosed by left breast mucinous carcinoma grade I. A: bilateral digital mammogram (craniocaudal and medio-lateral oblique views).There is a left breast upper outer suspected mass predominantly obscured by glandular tissue (arrow).B: Ultrasound image displayed irregular solid mass of fat near echopattern and in plane with the breast tissue.C: Artificial intelligence scanned 4 view mammograms displayed no visual color overlying and "Low" (i.e., < 10%) abnormality scoring suspicion of malignancy

Figure 7 .
Figure 7. Multifocal invasive tubular/cribriform carcinoma of the left breast.A: upper row, digital mammogram craniocaudal and medio-lateral oblique views of the left breast that showed a small suspicious lesion (arrow).Lower row, contrast enhanced mammogram that showed multiple malignant looking masses not just one mass.B, Ultrasound image that showed the index malignant mass which was less than 2 cm and an adherent minion.C, AI-scanned mammograms showed wide area of hot color demarcation and an abnormality scoring percentage of 99%.In this case, the AI algorithm correctly diagnosed and targeted the carcinoma and moreover it provided similar extension of the small sized index mass and the related satellites to that of the contrast enhanced mammogram.

Table 2 .
Mammography and ultrasound findings among the studied breast abnormalities (P.S. each mammogram/ultrasound may include more than one finding in relation to the same disease)

Table 3 .
Percentage and category of each of the AI-scanned mammograms for the included benign and malignant lesions

Table 4 .
Diagnostic indices of the performance of the mammogram and ultrasound with and without the consideration of the AI scoring LHR, Likelihood ratio; NPV, Negative predictive value; PPV, Positive predictive value.6 of 10 birpublications.org/bjroBJR Open;4:20220018