Accurate risk assessment is essential for the early detection and prevention of breast cancer. With the foresight offered by risk models, high-risk patients can benefit from supplemental imaging, more frequent screening, and chemoprevention to improve their outcomes. Similarly, low-risk patients can be guided toward longer screening intervals and avoid overtreatment. As such, there have been considerable investments in the development of risk-based guidelines for supplemental imaging, personalized screening frequency, and chemoprevention.1,2,3,4 However, the risk models underlying these national efforts give gross, generalized risk estimates that are inaccurate at the individual level, limiting the efficacy of existing guidelines. For instance, current National Comprehensive Cancer Network (NCCN) guidelines recommend supplemental magnetic resonance imaging (MRI) for patients with 20% or greater lifetime risk of breast cancer.5 However, under these guidelines, more than 97% of supplemental screening MRIs will not detect cancer,6 indicating that most of these patients did not need MRIs. Conversely, only 25% of patients with breast cancer will be eligible for MRI before their diagnosis, indicating a missed opportunity for 75% of patients with cancer.7 Guidelines for chemoprevention and screening frequency are similarly inefficient. These challenges stem from the limitations of the guideline’s underlying risk models. Improving predictors of individual cancer risk remains essential to improving the systematic early detection and prevention of breast cancer.

In a recent Journal of Clinical Oncology article, Eriksson et al.8 have demonstrated that image-derived risk models, which were previously shown to outperform the Tyrer–Cuzick model in short-term risk estimation (2-year risk),9 also outperform the baseline in long-term risk estimation (10-year risk). Specifically, the study demonstrated that an image-based risk model, based on prespecified mammographic features, outperformed the Tyrer–Cuzick v8 model on a case-control cohort across a 10-year period. Throughout their 10-year follow-up window, they found that 20% of all women with breast cancer were deemed as high risk by their image model, compared with 7.1% by Tyrer–Cuzick v8. While this result may be confounded by good performance cancers earlier in the observation window, the study remains promising. This improved capacity to predict long-term risk is especially important to support the primary prevention of breast cancer, as tumor development is estimated to take 5–20 years. By extending the successes of image-based risk modeling9,10,11,12,13,14 to long-term risk estimation, Eriksson et al. contribute to a larger paradigm shift in risk modeling.

The traditional approach for developing risk predictors, as exemplified by the Tyrer–Cuzick model,15 relies on expert knowledge to identify key risk factors. These curated risk factors, (e.g., patient age, family history, mammographic density, etc.), are then combined in statistical models to estimate breast cancer risk. While traditional tools such as the Tyrer–Cuzick model are widely adopted, the models demonstrate limited performance at the individual level. Moreover, it has proven difficult for experts to improve these tools with new risk factors, suggesting that this approach may have already reached its limits. For instance, investigators have extensively explored mammographic breast density as a marker of risk; however, the performance of the Gail and Tyrer–Cuzick models improved only marginally after the introduction of this factor, obtaining areas under the curve (AUCs) of 0.61 and 0.59 compared with 0.57 and 0.55, respectively.16 The manual identification of risk factors remains a critical bottleneck in improvement of risk models.

Recently, radiomics approaches,17,18,19 which evaluate combinations of texture or shape features, have emerged as a popular direction in medical imaging. Radiomics models promise to improve the flexibility of traditional risk models by adding expert-defined, yet automatically measured, imaging features. While significantly improved, this approach is still fundamentally limited by features selected by the investigator, restricting the ability of the developed tools to discover new features directly from the patient data.

Artificial intelligence (AI) methods, which can operate directly on full resolution images and leverage any predictive pattern available, offer a paradigm shift in the development of risk models. Instead of relying on investigator ingenuity, AI approaches allow us to view risk modeling as an optimization problem. In doing so, AI tools can uncover risk cues unknown to human readers directly from the data, allowing them to reap the full potential of the modality.

This AI approach has already transformed short-term risk prediction, with models such as Mirai7 and Sybil10 achieving state-of-the-art performance in 5-year breast cancer and 6-year lung cancer risk, respectively, in large international external validation studies.10,11 In this study, Eriksson et al. further this general paradigm shift, extending these successes to long-term risk estimation. Like prior studies in AI-derived risk assessment, the model in this paper did not outperform the Tyrer–Cuzick model because it had access to more data; as both methods used mammograms in some fashion. Instead, the methods differ in how they leveraged the mammograms. By detecting and combining subtle features of the mammogram, the image-based model was able to significantly outperform the Tyrer–Cuzick model, which only benefits from breast density. These results reinforce the exciting promise of AI methods to transform risk assessment. Moreover, while progress in traditional risk models has stagnated, AI-driven methods have the potential for further dramatic improvement. Current AI methods only leverage a single episode of mammography to assess cancer risk, which only touches a fraction of the rich multi-modal and longitudinal imaging available. Ever larger multimodal datasets, improved algorithms, and increased computing power all have the promise to further advance AI risk assessment.

To realize the promise of AI-driven risk models, future work requires strengthened standards of rigor to ensure methodological progress. Specifically, studies should compare their proposed methods to published prior work, establishing means to gauge technical improvements. While Eriksson et al. focused on a single commercial model for their study, other image-based risk models, including Mirai, are publicly available. We believe the study would have benefited from wider benchmarking. Similarly, the study would have benefited from validating their results on more diverse datasets, such as EMBED.20 Currently, the study only evaluates their risk model on the same screening cohort used to develop the model; as a result, it remains difficult to gauge the external validity of the current results to more diverse populations. Finally, AI model developers should make their tools easily available to other researchers to enable new work to compare against their approaches. Open benchmarking and globally diverse validation efforts are critical to ensuring that AI methods for cancer risk assessment are actually improving.

More clinical research is also needed to translate the advancements in cancer risk modeling into tangible advances in care. In both this study and prior work validating Mirai, AI risk scores show higher accuracies in the near term (within 2 years) than far term (5–10 years), suggesting the models are partially capturing cancers already present within the mammogram but not detected by the radiologist. These trends, in addition to the overall improved accuracy, suggest that leveraging these models to decide supplemental imaging, instead of current lifetime risk measures,5 would be more effective in reducing interval cancers. Moreover, while lifetime risk measures disproportionately exclude older women, image-based risk tools are not inherently skewed by patient age21 and they can identify the near-term risk signal most relevant for personalized screening. Given that AI risk scores remain more accurate than traditional measures after 10 years,8 chemoprevention guidelines could also benefit from AI risk tools. While the broad opportunities for AI tools to improve guidelines for cancer prevention and screening are clear, the optimal clinical protocols for each of these use cases remain unclear. Prospective studies with diverse populations are needed to advance clinical guidelines and to achieve the broad promise22,23,24 of AI in medicine. The future is AI-derived risk estimation and AI-powered clinical guidelines, and the faster we get there, the better our patients will be cared for.