Clinical Significance of Combined Density and Deep-Learning-Based Texture Analysis for Stratifying the Risk of Short-Term and Long-Term Breast Cancer in Screening

Assessing a woman’s risk of breast cancer is important for personalized screening. Mammographic density is a strong risk factor for breast cancer, but parenchymal texture patterns offer additional information which cannot be captured by density. We aimed to combine BI-RADS density score 4th Edition and a deep-learning-based texture score to stratify women in screening and compare rates among the combinations. This retrospective study cohort study included 216,564 women from a Danish populations-based screening program. Baseline mammograms were evaluated using BI-RADS density scores (1–4) and a deep-learning texture risk model, with scores categorized into four quartiles (1–4). The incidence rate ratio (IRR) for screen-detected, interval, and long-term cancer were adjusted for age, year of screening and screening clinic. Compared with subgroup B1-T1, the highest IRR for screen-detected cancer were within the T4 category (3.44 (95% CI: 2.43–4.82)−4.57 (95% CI: 3.66–5.76)). IRR for interval cancer was highest in the BI-RADS 4 category (95% CI: 5.36 (1.77–13.45)−16.94 (95% CI: 9.93–30.15)). IRR for long-term cancer increased both with increasing BI-RADS and increasing texture reaching 5.15 (4.31–6.16) for the combination of B4-T4 compared with B1-T1. Deep-learning-based texture analysis combined with BI-RADS density categories can reveal subgroups with increased rates beyond what density alone can ascertain, suggesting the potential of combining texture and density to improve risk stratification in breast cancer screening.


Introduction
Mammographic density is an independent risk factor for breast cancer and is also known to mask malignancies [1,2].Consequently, the sensitivity of mammography screening is lower, and the interval cancer rate is higher for women with dense versus non-dense breast tissue [3][4][5][6].Women with high mammographic density have a higher risk of developing breast cancer, compared to those with low density [7].Therefore, the current one-size-fits-all policy in mammography screening is not equitable [8].Developing a more personalized screening program that considers breast density is discussed widely [9].In the US, legislation mandates that physicians inform women of their mammographic density if categorized as heterogeneously dense or extremely dense, according to Breast Imaging-Reporting and Data system (BI-RADS) density score, to discuss the possibility of supplemental imaging.In 2022, the European Society of Breast Imaging began recommending supplemental screening for women with high density, and it is now also recommended by the European Commission Initiative on Breast Cancer [10,11].However, studies have indicated that relying solely on mammographic density as the primary risk factor when tailoring screening strategies for women may not be sufficient [5,12].
Other than mammographic density, mammographically derived risk factors, such as assessment of parenchymal texture patterns, have been shown to be associated with a risk of breast cancer independently of density [13][14][15][16][17]. Deep learning, a subfield of artificial intelligence, uses multiple hidden layers of neural networks to process and analyze complex datasets [18].This method provides an alternative to standard breast cancer risk models, potentially improving breast cancer prediction accuracy [19].Previous studies have focused on the discriminatory accuracy, but their clinical utility and potential consequences in a clinical setting are rarely examined.Lauritzen et al. developed a deep-learning-based texture model for long-term breast cancer risk, which led to the identification of a subgroup of high-risk women from a screening cohort [20].Breast density and parenchymal texture are correlated, but nevertheless, combining BI-RADS density score 4th Edition and a texture score, Euler-Chelpin et al. identified subgroups of women for whom the screening sensitivity was lower than indicated by the BI-RADS score alone [21].
Moving toward personalizing screening strategies, we should investigate the possibility of using different mammographic characteristics available at the screening site to better identify and understand the subgroups of women who could benefit from new screening schemes.
The aim of this study was to investigate the clinical impact of stratifying women according to their different mammographic types of breast tissue, stratified into subgroups by combining BI-RADS density score 4th Edition and a deep-learning-based texture risk score, both assessed in a baseline screening mammogram in a population-based screening program.The objectives are to compare the rate of short-term breast cancer (STC) and longterm breast cancer (LTC) among combinations of BI-RADS density score and texture score.

Study Population
The Capital Region Mammography Screening Program in Denmark offers populationbased biennial screening, free of charge, to all women aged 50-69 years at five public screening clinics.The study participants were women attending routine mammography screening between 1 November 2012, and 15 April 2021.A woman will enter the study at her first available mammography screening within the study period, available for texture scoring (referred to as the baseline screening).Women with breast cancer diagnosed before the baseline screening were excluded from the study.For screenings with suspicious findings (positive screening), diagnostic assessment was performed using the triple test, including clinical mammography, ultrasound, and biopsy.A cancer diagnosis was confirmed histologically.Women with a negative screening result were invited to be screened again in two years.Screen-detected cancer was defined as invasive carcinoma or ductal carcinoma in situ (DCIS) diagnosed within six months of a positive baseline screening.Interval cancer was defined as invasive carcinoma or DCIS diagnosed within 24 months after a negative baseline screening or 6-24 months after a positive baseline screening and negative diagnostic assessment, or before subsequent screening, whichever came first.Screen-detected and interval cancers were defined as STCs, see Figure 1.LTCs were defined as all breast cancers (invasive and DCIS) diagnosed more than 24 months after baseline screening, or diagnosed at a subsequent screening, whichever came first.Histological data were retrieved up until 15 April 2023, such that all women had at least 24 months of follow-up time.
screening, whichever came first.Screen-detected and interval cancers were defined as STCs, see Figure 1.LTCs were defined as all breast cancers (invasive and DCIS) diagnosed more than 24 months after baseline screening, or diagnosed at a subsequent screening, whichever came first.Histological data were retrieved up until 15 April 2023, such that all women had at least 24 months of follow-up time.For the analysis of LTCs, the study population was restricted to women without a STC diagnosis.In cases where a woman was diagnosed with both invasive carcinoma and DCIS, we classified the cancer as invasive carcinoma.Information on screening outcomes was retrieved by linkage based on unique personal identification numbers from the Radiology Information System (RIS), the Danish Civil Registration System, the Danish Breast Cancer Group, and the Danish Pathology Register.For the analysis of LTCs, the study population was restricted to women without a STC diagnosis.In cases where a woman was diagnosed with both invasive carcinoma and DCIS, we classified the cancer as invasive carcinoma.Information on screening outcomes was retrieved by linkage based on unique personal identification numbers from the Radiology Information System (RIS), the Danish Civil Registration System, the Danish Breast Cancer Group, and the Danish Pathology Register.

Image Technique and Interpretation
All women in the cohort received standard two-view (craniocaudal (CC) and mediolateral oblique (MLO)) FFDM.Women with subpectoral breast implants received additional views after pushback.Screening examinations were performed using Mammomat Inspiration or Revelation systems, Siemens AG, Germany.FFDM examinations were centrally and independently double read by two full-time breast radiologists.From November 2021 and forth AI was implemented as first reader in the low-risk group (stratified by AI).If the two readers disagreed on the need for recall, the final result was determined by consensus or arbitration with a third senior radiologist.At least one of the readers is a senior breast radiologist reading more than 3000 mammograms yearly.

Deep-Learning-Based Texture Model
The mammographic texture deep-learning model was trained to estimate the longterm risk of breast cancer cross-vendor.In-depth methodological and technical details were previously reported by Lauritzen et al. [20].The texture model was trained on a Dutch dataset consisting of 41,242 women with 1326 cancers sampled from the Dutch screening program in Utrecht, the Netherlands, from 1 January 2003, to 1 January 2015.The Dutch screening program offers biennial mammography screening to women between the ages of 50 and 74, using Hologic Lorad Selenia Systems.The model was trained to distinguish between women who remained healthy throughout the follow-up period of at least five years and those who developed breast cancer.Training images with a diagnosed cancer were excluded, so the model was trained on images of the cancer contra-lateral breast.The texture model therefore relies on features of healthy breast tissue to learn long-term risk.

Statistical Analysis
We combined the BI-RADS categories with the texture categories to create 16 subgroups.The screen-detected cancer rate and interval cancer rate were calculated as the number of screen-detected or interval cancers per 1000 women within each subgroup.Incidence rates of SDCs, ICs, and LTCs were calculated as number of BC cases divided by person-years.For STCs, person-years were accumulated from baseline screening to BC diagnosis, 6 (for SDCs) or 24 (ICs) months after baseline, or subsequent screening, whichever came first.For LTCs, person-years were accumulated from the end date of STC follow-up period to BC diagnosis or 15 April 2023, whichever came first.Number of person-years in each group are presented in the Appendix A, Table A1.Incidence rates in different BI-RADS/texture subgroups were compared by computing incidence rate ratios (IRRs) and corresponding profile likelihood 95% confidence intervals (CIs), through Poisson regression models.For each outcome (SDC, IC, LTC), we fitted 3 different Poisson regression models with increasing number of covariate adjustments: Model 1 represents the crude model (no adjustment); Model 2 is adjusted for age at baseline, with age included in the model as a natural cubic spline with 3 degrees of freedom (this number was selected to minimize the Akaike Information Criterion among different choices of d.o.f.from 1 to 5); Model 3 is further adjusted for year of screening (categorical) and screening location (categorical: Bispebjerg Hospital, Bornholm Hospital, Herlev Hospital/Gentofte Hospital, Hillerød Hospital, Hvidovre Hospital).Furthermore, mixed effect models with random intercept were run to investigate the possible clustering effect of the screening locations, finding however no variability among them (data not reported).Number of women in each age category are presented in the Appendix Table A2.In the following, we only reported results from Model 3, whereas results from Model 1 and 2 can be found in the Appendix Table A3.All analyses and plots were made using R version 4.3.2 with package tidyverse [23].

Short-Term Cancer
The screen-detected cancer rate ranged from 1.63 to 13.68 per 1000 women, with the highest rate observed in the B2 + T4, see Table 2.The T4 subgroups consistently showed the highest rates ranging from 9.36 to 13.68 per 1000 women, whereas the lowest rates were observed mainly in B4 subgroups.Within each BI-RADS category, the screen-detected cancer rate increased by increasing level of texture.
The interval cancer rate ranged from 0.54 to 8.4 per 1000 women across the subgroups, increasing with increasing BI-RADS and texture score, Table 3.The lowest interval cancer rate was observed in B1 + T1 (0.54 per 1000 women), while the highest was in B4 + T4 (8.4 per 1000 women).The highest rates occurred in the BI-RADS 4 subgroups, ranging from 2.71 to 8.4 per 1000 women.Our main results from the regression Model 3 are reported in Figures 2 and 3.The adjusted IRR for screen-detected cancer was highest in the T4 subgroups across the BI-RADS categories, ranging from 3.44 (95% CI: 2.43-4.82) to 4.57 (95% CI: 3.66-5.76),which were statistically significantly higher than the reference group, see Figure 2. The three subgroups with the lowest IRR (0.60 (95% CI: 0.15-1.61)-0.77(95% CI: 0.38-1.4))were observed in the B3 and B4 subgroups with low texture (non-significantly compared with the reference group).
The adjusted IRR for interval cancer shows a different pattern than for SDC, with increasing risk with increasing BI-RADS and texture.Notably, the highest IRR for IC (14.50 (95% CI: 7.81-27.35)-16.94(95% CI: 9.93-30.15))were observed in the top three subgroups all in the B4 category, indicating a distinct risk within these subgroups (statistically significantly higher than the reference group).Conversely, the three lowest IRR were observed in B1 and B2 categories (1.33 (95% CI: 0.59-2.83)-2.18(95% CI: 1.17-4.12)).The rate in the B3 + T1 subgroup was similar to that of B1 + T4, showing the potential of considering both texture and BI-RADS categories when stratifying for breast cancer risk.

Long-Term Cancer
The adjusted IRR increased for LTC with increasing BI-RADS and texture, with the subgroups exhibiting the highest IRR mainly being in the T4 and B4 subgroups (2.95 (95% CI: 2.07-4.08))-5.15(95% CI: 4.31-6.16),Figure 4.When comparing the subgroups with the reference group B1 + T1, all subgroups except B3 + T1, showed statistically significantly increased IRR.Selecting subgroups with a threshold of IRR at 4.0 and above (B3 + T4 and B4 + T4), meaning a group of women with more than four times higher rate of LTC compared to the B1 + T1 group, comprise 12.48% of our screening cohort.The adjusted IRR for interval cancer shows a different pattern than for SDC, with increasing risk with increasing BI-RADS and texture.Notably, the highest IRR for IC (14.50 (95% CI: 7.81-27.35)-16.94(95% CI: 9.93-30.15))were observed in the top three subgroups all in the B4 category, indicating a distinct risk within these subgroups (statistically significantly higher than the reference group).Conversely, the three lowest IRR were observed in B1 and B2 categories (1.33 (95% CI: 0.59-2.83)-2.18(95% CI: 1.17-4.12)).The rate in the B3 + T1 subgroup was similar to that of B1 + T4, showing the potential of considering both texture and BI-RADS categories when stratifying for breast cancer risk.

Discussion
In our study, we combined BI-RADS density score with a deep-learning-based texture score to risk stratify women aged between 50-69 at their baseline screening in a populations-based screening program.Our findings showed that texture enhanced breast cancer risk assessment beyond density alone, for both short-term and long-term risk.While one might anticipate the highest risk to be in BI-RADS 4, the use of texture revealed additional subgroups of high risk within lower density categories.Additionally, we identified subgroups within BI-RADS 3 and 4 categories combined with low texture scores, exhibiting a lower rate of breast cancer.
Many studies have consistently demonstrated the significance of breast density as an independent risk factor and the potential to improve risk assessment and mammographic density is the leading measure for risk assessment [2,24,25].The increased attention informing women of their breast density aims to highlight the risk associated with high density and the potential benefit of additional screening [26].However, there is growing evidence that texture not only reflects the inherent biological risk for cancer development, but also parenchymal changes associated with breast cancer development [12,27].A systematic review has shown that the addition of texture features to risk prediction models improves model performance beyond density [28], highlighting texture's contribution to risk assessment [12,13,17,28,29].Moreover, studies have shown texture to be able to distinguish BRCA1 and BRCA2 carriers from women of low risk of breast cancer, which is not the case for density [30].
Using a deep-learning-based texture model enables capturing important mammographic features associated with risk, instead of feeding the model prior assumptions [31].Lauritzen et al. [20] developed and validated the deep-learning texture model used in this study, yielding a moderate AUC for interval cancers of 0.69/0.70 and 0.64 for LTC's in their

Discussion
In our study, we combined BI-RADS density score with a deep-learning-based texture score to risk stratify women aged between 50-69 at their baseline screening in a populationsbased screening program.Our findings showed that texture enhanced breast cancer risk assessment beyond density alone, for both short-term and long-term risk.While one might anticipate the highest risk to be in BI-RADS 4, the use of texture revealed additional subgroups of high risk within lower density categories.Additionally, we identified subgroups within BI-RADS 3 and 4 categories combined with low texture scores, exhibiting a lower rate of breast cancer.
Many studies have consistently demonstrated the significance of breast density as an independent risk factor and the potential to improve risk assessment and mammographic density is the leading measure for risk assessment [2,24,25].The increased attention informing women of their breast density aims to highlight the risk associated with high density and the potential benefit of additional screening [26].However, there is growing evidence that texture not only reflects the inherent biological risk for cancer development, but also parenchymal changes associated with breast cancer development [12,27].A systematic review has shown that the addition of texture features to risk prediction models improves model performance beyond density [28], highlighting texture's contribution to risk assessment [12,13,17,28,29].Moreover, studies have shown texture to be able to distinguish BRCA1 and BRCA2 carriers from women of low risk of breast cancer, which is not the case for density [30].
Using a deep-learning-based texture model enables capturing important mammographic features associated with risk, instead of feeding the model prior assumptions [31].Lauritzen et al. [20] developed and validated the deep-learning texture model used in this study, yielding a moderate AUC for interval cancers of 0.69/0.70 and 0.64 for LTC's in their study, showing the potential of deep-learning texture for risk assessment for both short-and long-term breast cancer risk.Additionally, a systematic review by Schopf et al. showed that AI image-only risk models yielded a median AUC of 0.72 (0.61-0.90), outperforming model performance of density only or clinical risk factors (0.61 (0.54-0.69)) [32].Using automatic mammographic biomarkers can facilitate swift risk assessment at the screening site.
Identifying women for whom mammography will not provide adequate detection of breast cancer and with a high risk of being diagnosed with interval cancer in the near future could shorten the time to diagnosis, enabling a more efficient screening program through tailored screening.Increased breast density is associated with masking, as well as risk, complicating cancer detection in women with high density.Both density and texture have more impact on interval cancers, than on the screen-detected and LTC's.We found that the lowest IRR for screen-detected cancer, and the highest IRR for interval cancer, were within the BI-RADS 4 category, probably reflecting the masking potential of high density, especially for BI-RADS category 4. In the BI-RADS 2-3 categories combined with high texture, the difference between the risk of a screen-detected and interval cancer was more balanced.However, studies have shown that masking is not limited to higher BI-RADS categories, and that textural features can predict masking beyond density [33][34][35].Short-term risk prediction is useful for guiding tailored screening strategies.The IRR for interval cancer in the B3 + T1 subgroup was similar to that of B1 + T4, showing the diverse risk patterns across the BI-RADS categories, see Figure 3. Identifying women at their baseline screening according to the risk of interval cancer and masking, enables the possibility of offering them a shorter screening interval or supplemental screening modalities, including digital breast tomosynthesis, ultrasound, or MRI, targeting masking and potentially increasing cancer detection in a screening program [8,36].
Long-term risk assessment can guide primary prevention efforts, in addition to tailoring screening strategies acknowledging that masking diminishes its impact on long-term breast cancer risk [17,37,38].Selecting women with BI-RADS density 3 and 4 for supplemental screening would in our study encompass approximately 150,000 women or 35% of our cohort, with an IRR for LTC ranging from 1.29 (95% CI: 0.91-1.77) to 5.15 (95% CI: 4.31-6.16)through the subgroups.This would impose a considerable burden on a screening program and relying solely on BI-RADS density would lead to unnecessary or shortened screening intervals for low-risk, high density women, whereas women with low-density and high-risk would miss out on relevant screening [5].Our results support further stratification beyond density, showing the highest rates for LTC where found primarily in the subgroups of the highest texture quartile.
This study has some limitations, including the small size of some subgroups, limiting the reliability of our estimates.Additionally, the use of baseline screening may not accurately represent the true baseline screening for all women.Lack of data on immigration status or death could introduce bias due to loss of follow-up.However, the strength of our study was the large cohort from a real-life population-based screening program with long follow-up.Furthermore, the deep-learning-based texture model was trained cross-vendor, improving the reliability of our findings across different vendors.

Conclusions
Our results have shown a broad range of cancer detection across subgroups, emphasizing the importance of tailored screening strategies.The rates for screen-detected cancer were highest in the T4 subgroups, whereas the lowest rates were observed in the B4 subgroups.For interval cancer, the highest rates were observed in the three B4 + T2-4 subgroups.When analyzing LTC, the highest rates were observed in the T4 and B4 subgroups combined.In our study, high density does not necessarily indicate a high rate of breast cancer, and conversely, subgroups of women with low density does not equate to a low rate of breast cancer.Using deep-learning-based texture analysis within BI-RADS density categories improves risk stratification in breast cancer screening beyond density

Figure 2 .
Figure 2. Matrix plots of the adjusted incidence rate ratio (IRR) with 95% confidence intervals for screen-detected cancers stratified by BI-RADS density score and texture score, the Capital Mammography Screening Program, Denmark, 2012-2023.IRR is adjusted by age, year of screening, and hospital of screening (model 3).

Figure 2 . 18 Figure 3 .
Figure 2. Matrix plots of the adjusted incidence rate ratio (IRR) with 95% confidence intervals for screen-detected cancers stratified by BI-RADS density score and texture score, the Capital Mammography Screening Program, Denmark, 2012-2023.IRR is adjusted by age, year of screening, and hospital of screening (model 3).Diagnostics 2024, 14, x FOR PEER REVIEW 8 of 18

Figure 3 .
Figure 3. Matrix plots of the adjusted incidence rate ratio (IRR) with 95% confidence intervals for interval cancers stratified by BI-RADS density score and texture score, the Capital Mammography Screening Program, Denmark, 2012-2023.IRR is adjusted by age, year of screening, and hospital of screening (model 3).

Figure 4 .
Figure 4. Matrix plots of the adjusted incidence rate ratio (IRR) with 95% confidence intervals for long-term cancer stratified by BI-RADS density score and texture score, Capital Mammography Screening Program, Denmark, 2012-2023.IRR is adjusted by age, year of screening, and hospital of screening.

Figure 4 .
Figure 4. Matrix plots of the adjusted incidence rate ratio (IRR) with 95% confidence intervals for long-term cancer stratified by BI-RADS density score and texture score, Capital Mammography Screening Program, Denmark, 2012-2023.IRR is adjusted by age, year of screening, and hospital of screening.

Table A1 .
Person-years and rate per 1000 person-years for short-term cancers (STC) stratified by combining BI-RADS density score, deep-learning-based texture score, and by combination, the Capital Region Mammography Screening Program, Denmark, 2012-2023.IC = interval cancer, SDC = screendetected cancer.B = BI-RADS density score 4th edition corresponding to each of the four categories, T = texture quartile corresponding to each of the four quartiles.

Table 2 .
Screen-detected cancer rate in subgroups stratified by combining BI-RADS density score and texture score.The Capital Mammography Screening Program, Denmark, 2012-2023.B = BI-RADS density score 4th edition corresponding to each of the four categories, T = texture quartile corresponding to each of the four quartiles.

Table 3 .
Interval cancer rate in subgroups stratified by combining BI-RADS density score and texture score.The Capital Mammography Screening Program, Denmark, 2012-2023.B = BI-RADS density score 4th edition corresponding to each of the four categories, T = texture quartile corresponding to each of the four quartiles.

Table A2 .
Number of women included in this study, stratified by BI-RADS density score and deeplearning-based texture score, and by combination, and further stratified by age.

Table A3 .
Model 1 interval and long-term cancer.Model 1 represents the crude model (no adjustment); Model 2 is adjusted for age at baseline, with age included in the model as a natural cubic spline with 3 degrees of freedom (this number was selected to minimize the Akaike Information Criterion among different choices of d.o.f.from 1 to 5).

Table A4 .
Characteristics for short-term cancer (STC) stratified by BI-RADS density score, deeplearning-based texture score, and by combination of BI-RADS density score and deep-learningbased texture score, the Capital Region Mammography Screening Program, Denmark, 2012-2023.B = BI-RADS density score 4th edition corresponding to each of the four categories, T = texture quartile corresponding to each of the four quartiles.

Table A5 .
Characteristics for long-term cancer (LTC) stratified by BI-RADS density score, deeplearning-based texture score, and by combination of BI-RADS density score and deep-learningbased texture score, the Capital Region Mammography Screening Program, Denmark, 2012-2023.DCIS = ductal carcinoma in situ, Q1-3 = interquartile 1-3, B = BI-RADS density score 4th edition corresponding to each of the four categories, T = texture quartile corresponding to each of the four quartiles.Flowchart describing sample selection, the Capital Mammography Screening Program, Denmark, 2012-2021.Code 3 refers to "Technical recall".Code Z is when there is no examination.BC = Breast Cancer.