Breast Density Transformations Using CycleGANs for Revealing Undetected Findings in Mammograms

: Breast cancer is the most common cancer in women, a leading cause of morbidity and mortality, and a signiﬁcant health issue worldwide. According to the World Health Organization’s cancer awareness recommendations, mammographic screening should be regularly performed on middle-aged or older women to increase the chances of early cancer detection. Breast density is widely known to be related to the risk of cancer development. The American College of Radiology Breast Imaging Reporting and Data System categorizes mammography into four levels based on breast density, ranging from ACR-A (least dense) to ACR-D (most dense). Computer-aided diagnostic (CAD) systems can now detect suspicious regions in mammograms and identify abnormalities more quickly and accurately than human readers. However, their performance is still inﬂuenced by the tissue density level, which must be considered when designing such systems. In this paper, we propose a novel method that uses CycleGANs to transform suspicious regions of mammograms from ACR-B, -C, and -D levels to ACR-A level. This transformation aims to reduce the masking effect caused by thick tissue and separate cancerous regions from surrounding tissue. Our proposed system enhances the performance of conventional CNN-based classiﬁers signiﬁcantly by focusing on regions of interest that would otherwise be misidentiﬁed due to fatty masking. Extensive testing on different types of mammograms (digital and scanned X-ray ﬁlm) demonstrates the effectiveness of our system in identifying normal, benign, and malignant regions of interest.


Introduction
Breast density is considered to be a measure of fibrous and glandular tissue existence (also known as fibroglandular tissue) in the whole breast when compared to the fat tissue. It has no direct relation to the breast size or its firmness. There are three basic components of a breast: connective tissue, ducts, and lobules. The connective tissue, which is formed of fatty and fibrous tissue, envelops and holds everything in place. Lobules are the small glands that produce milk, while ducts are the tiny transport tubes that carry milk from the lobules to the nipple. Together, lobules and ducts are known as glandular tissue (Figure 1). Fibrous tissue and fat give breasts their size and shape while they also hold the rest of the breast components in place. Most breast cancers begin in the ducts or lobules. Breast density as a measurement is important mainly for two reasons: Women who have dense breast tissue, present a higher risk of developing breast cancer compared to women with less dense tissue. It is unclear until now why high density is associated with higher breast cancer risk. This may be attributed to the fact that dense breast tissue is formed by a bigger proportion of cells that could be evolved into abnormal under some conditions. The absolute relationship between the risk factor of breast cancer development in women with increasing breast mammographic density has already been reported in many relevant studies [1]. The second reason is that dense breast tissue (fibrous and glandular) makes it harder for radiologists to detect cancerous regions in mammograms because it looks white or opaque on an MRI. Breast masses and cancerous regions also share the same dominant white characteristics as the rest of the healthy tissue, so density makes it harder for the abnormalities to be traced by radiologists or computer assisted diagnosis (CAD) systems. In contrast, a fatty tissue looks almost black, so it is easier to see abnormalities that have large intensity values in a low intensity background. In screening mammography, according to the American College of Radiology Breast Imaging Reporting and Data System (ACR BI-RADS), there exist four different levels of density [2]. Almost entirely fatty indicates that breasts are almost entirely composed of fat (ACR-A). Scattered areas of fibroglandular density indicate there are some scattered areas of density, but most of the breast tissue is non-dense (ACR-B). Heterogeneously dense indicates that there are some areas of non-dense tissue, but most of the tissue is dense (ACR-C). Finally, extremely dense indicates that nearly all breast tissue is dense (ACR-D).
Breast density cannot be detected through physical examination but only through mammography, and it is an important variable that affects the sensitivity of mammography [3][4][5][6]. Over 40% of women with dense breast tissue are characterized as heterogeneously dense (ACR-C) or extremely dense (ACR-D). Dense breast tissue is an independent risk factor for the development of abnormalities and decreases the likelihood of breast cancer being detected successfully on screening mammography, leading potentially to delayed diagnosis, which can have detrimental results.
In order to automatically identify and categorize breast lesions in mammograms using traditional machine learning models and to bring these results to doctors' attention, computer-aided detection systems (CAD) were developed in the 1990s [7][8][9]. Yet, due to their low specificity, current conventional CAD systems are unable to considerably The second reason is that dense breast tissue (fibrous and glandular) makes it harder for radiologists to detect cancerous regions in mammograms because it looks white or opaque on an MRI. Breast masses and cancerous regions also share the same dominant white characteristics as the rest of the healthy tissue, so density makes it harder for the abnormalities to be traced by radiologists or computer assisted diagnosis (CAD) systems. In contrast, a fatty tissue looks almost black, so it is easier to see abnormalities that have large intensity values in a low intensity background. In screening mammography, according to the American College of Radiology Breast Imaging Reporting and Data System (ACR BI-RADS), there exist four different levels of density [2]. Almost entirely fatty indicates that breasts are almost entirely composed of fat (ACR-A). Scattered areas of fibroglandular density indicate there are some scattered areas of density, but most of the breast tissue is non-dense (ACR-B). Heterogeneously dense indicates that there are some areas of nondense tissue, but most of the tissue is dense (ACR-C). Finally, extremely dense indicates that nearly all breast tissue is dense (ACR-D).
Breast density cannot be detected through physical examination but only through mammography, and it is an important variable that affects the sensitivity of mammography [3][4][5][6]. Over 40% of women with dense breast tissue are characterized as heterogeneously dense (ACR-C) or extremely dense (ACR-D). Dense breast tissue is an independent risk factor for the development of abnormalities and decreases the likelihood of breast cancer being detected successfully on screening mammography, leading potentially to delayed diagnosis, which can have detrimental results.
In order to automatically identify and categorize breast lesions in mammograms using traditional machine learning models and to bring these results to doctors' attention, computer-aided detection systems (CAD) were developed in the 1990s [7][8][9]. Yet, due to their low specificity, current conventional CAD systems are unable to considerably increase screening performance. The success of these algorithms to identify and categorize abnormalities in mammograms is related to specificity. This differs from diagnosis, which draws conclusions about the cause of an aberration. It is crucial to find irregularities in mammograms, which could be caused by mistakes or tired observers. Convolutional neural networks (CNNs) have gained popularity in recent years for a variety of image processing classification tasks. CNN-based CAD systems have proven to be quite effective, with high rates of breast cancer diagnosis. However, there are still some open issues in the automatic breast cancer detection problem, one of the most important being the breast density as already described earlier.
In the literature, the use of generative adversarial networks (GANs) for numerous medical challenges, including data synthesis and augmentation, is constantly growing. However, these models may experience numerous artifacts (i.e., checkerboard artifacts), which may affect the quality of the final synthesized images, especially when working with full-size mammograms. GANs can help in the synthesis of a variety of plausible-looking mammography images either in full size [10][11][12][13] or in ROI-based approaches [14,15].
Due to its impact on the accurate detection of cancer in mammograms, the problem of automatic breast tissue recognition has been extensively studied over the last decade, with a large number of papers published in this area proposing systems that use either traditional machine learning techniques or, more recently, deep learning networks and architectures [4][5][6][7]13,15,16]. However, to our knowledge, none of them proposes a method to "transform" breast density to lower density levels and thus enhance the diagnostic accuracy of CAD systems.
Motivated by the crucial role that breast tissue density plays in the detection of breast cancer, as it makes it more challenging for radiologists to accurately detect cancerous regions in mammograms, we sought to investigate to what extent computer-assisted diagnosis systems are affected by this challenge, using different types of mammograms ranging from scanned film to fully digital images in our experiments. To address this, we propose a novel breast tissue transformation using CycleGAN network topology that can be applied to any region of interest (ROI) to adjust its density to match the characteristics of an ACR-A class tissue, which is easier to diagnose successfully. CycleGAN was chosen due to its widespread applications and accomplishments in the field of cancer imaging [16]. A crucial methodological characteristic of CycleGAN is that it can train on unpaired data without the need for matching image pairings in the source and target domains. As our datasets lack image pairings and the same patient's breast cannot belong to both the high and low breast density domains, using unpaired training data confirmed that CycleGAN would work with our datasets. The main contribution of our system is taking breast density into account, and by using a CycleGAN model, transforming the density of the ROI's tissue to match the characteristics of an ACR-A class tissue. This process significantly improves recognition accuracy while reducing the number of undetected ROIs due to their dense breast tissue.
The structure of the paper is as follows: In Section 2, we give a detailed description of the proposed CAD system. We present the different modules that it consists of and the different datasets that were used to test the efficacy of the system. In Section 3, we present the experimental setup and results, which are further discussed and commented in Section 4. Finally, in Section 5, some conclusions and remarks are given concerning the limitations and further research.

The Proposed System Overview
The main objectives of this work are summarized as follows: • Development of a novel approach/method to detect hidden suspicious abnormalities in ROIs that are partially or completely masked by surrounding tissue by taking into account the local breast density as recognized by a CNN classifier that marks the tissue ACR into four levels (A, B, C, D).

•
Elimination of the masking effect due to the surrounding tissue on the examined ROI by transforming the ROI's ACR level into the A category. This procedure is achieved using a GAN/CycleGAN topology that cycles through the ACR levels.
• Improvement of the overall abnormal region detection performance using a CNN network architecture based on a fine-tuned VGG16 network and extended tests on five of the most well-known datasets in the field. The proposed CAD system used in this work is illustrated in Figure 2. It consists of three main modules: (a) the data preparation module where image preprocessing and data augmentation take place, (b) the deep learning module where breast tissue segmentation, breast density transformation, and suspicious region detection are performed, and finally, (c) the evaluation module where annotation of abnormal regions on the given mammogram under examination is performed. To make our system more robust and to consider the different types of mammograms acquired during examination (fully digital, as well as scanned X-ray film), we thoroughly tested our system using five different datasets containing digital and film-scanned mammograms.
account the local breast density as recognized by a CNN classifier that marks the tissue ACR into four levels (A, B, C, D).

•
Elimination of the masking effect due to the surrounding tissue on the examined ROI by transforming the ROI's ACR level into the A category. This procedure is achieved using a GAN/CycleGAN topology that cycles through the ACR levels. • Improvement of the overall abnormal region detection performance using a CNN network architecture based on a fine-tuned VGG16 network and extended tests on five of the most well-known datasets in the field. The proposed CAD system used in this work is illustrated in Figure 2. It consists of three main modules: (a) the data preparation module where image preprocessing and data augmentation take place, (b) the deep learning module where breast tissue segmentation, breast density transformation, and suspicious region detection are performed, and finally, (c) the evaluation module where annotation of abnormal regions on the given mammogram under examination is performed. To make our system more robust and to consider the different types of mammograms acquired during examination (fully digital, as well as scanned X-ray film), we thoroughly tested our system using five different datasets containing digital and film-scanned mammograms.

A. The VinDR-Mammo dataset (digital)
The VinDr-Mammo [17] dataset is a large-scale dataset of full-field digital mammograms consisting of 5000 four-view examinations accompanied by breast-level assessments and findings annotations. To the best of our knowledge, VinDr-Mammo is currently the largest public dataset (containing approximately 20,000 scans) of full-field digital

A. The VinDR-Mammo dataset (digital)
The VinDr-Mammo [17] dataset is a large-scale dataset of full-field digital mammograms consisting of 5000 four-view examinations accompanied by breast-level assessments and findings annotations. To the best of our knowledge, VinDr-Mammo is currently the largest public dataset (containing approximately 20,000 scans) of full-field digital mammograms that also provides breast-level BI-RADS assessment categorization together with suspicious or possible benign findings that require follow-up examination as well as ACR breast tissue/finding level annotations.

425
The SuReMaPP [18], published recently, consists of 343 mammograms that have been hand-labeled by expert radiologists to identify suspicious regions, such as abnormalities (benign and malignant) and calcifications. SuReMaPP contains mammograms with ACR keyword descriptions that are corresponding to the ACR BI-RADS specification.

C. The MIAS dataset (film)
The Mammographic Image Analysis Society (MIAS) [19] dataset consists of 322 film mammograms (106 fatty and 216 dense images). Annotations are given in a separate file containing the background tissue type, the class and the severity of the abnormality, x and y coordinates of the center of the irregularities, and the approximate radius of a circle enclosing the abnormal region in pixels. For this dataset, there is no annotation for ACR breast tissue level.

D. The DDSM dataset (film)
The digital database for screening mammography (DDSM) [20][21][22] is provided by the University of South Florida. It contains film mammograms, which are digitized using four different types of digitizers. The database contains approximately 2500 studies. Each study includes two images (views) of each breast, as well as some associated patient information (age at time of study, ACR breast density rating, subtlety rating for abnormalities, and ACR keyword description of abnormalities) and image information (scanner used for the digitization, scanner spatial resolution, etc.). The ACR keyword description of the database was matched to the ACR BI-RADS categorization.

E. The INbreast dataset (digital)
The INbreast dataset [23] was used in the training phase of our CNN-based CAD system (patch extraction-based approach) and as a golden standard in all our experiments. The other datasets were used to evaluate the performance of the proposed system under different types of acquired mammograms.

Input Image Normalization
To eliminate the differences in the intensity levels of the mammograms used in the databases, the histogram transfer method was applied from the INbreast dataset to all other images. This normalization preprocessing step in CAD systems is crucial as it can account for large intensity variations that are typically attributed to the use of different scanners with varying parameters in the image-capturing process. These intensity variations can also severely affect the performance of processing and analysis steps, such as image registration, segmentation, and tissue volume estimation. To ensure objective image comparison between different mammograms, a normalization algorithm is performed in advance to modify the distribution of intensity values of each scan and match the selected baseline image. This preprocessing step was adopted from [24].
In order to help radiologists detect abnormalities, the adaptive histogram enhancement (AHE) [25,26] method is typically applied as a preprocessing step in CAD systems. The AHE method is a contrast-boosting technique that enhances local contrast and image details. Medical images can benefit greatly from this preprocessing step, but it can also generate a lot of noise as a side effect. To increase the image contrast and eliminate the noise enhancement, a variation known as the contrast-limited adaptive histogram equalization (CLAHE) technique [27][28][29] is used, as proposed in the literature.

Image/Breast Tissue Segmentation
To perform breast tissue segmentation, we estimate the tissue masks using a VGG-UNET network. The UNET architecture was initially proposed by Ronneberger et al. [30] for biomedical image data segmentation. We trained the VGG-UNET network using the images from the INbreast dataset and then applied the learned model to images from other datasets. For the segmentation step, we replaced the UNET encoder with a pre-trained VGG16 encoder, as depicted in Figure 3. The reason for this is that the VGG16 is already pretrained on the ImageNet dataset, whereas the UNET encoder would have to be trained from scratch to learn the features and the breast tissue area characteristics with significantly lower performance. Finally, the VGG16 encoder is converted into a symmetrical UNET architecture. We applied the constructed model to all datasets (except INbreast), to extract the tissue regions that will be used in the following analysis steps (Figure 3).
To perform breast tissue segmentation, we estimate the tissue masks using a VGG-UNET network. The UNET architecture was initially proposed by Ronneberger et al. [30] for biomedical image data segmentation. We trained the VGG-UNET network using the images from the INbreast dataset and then applied the learned model to images from other datasets. For the segmentation step, we replaced the UNET encoder with a pretrained VGG16 encoder, as depicted in Figure 3. The reason for this is that the VGG16 is already pretrained on the ImageNet dataset, whereas the UNET encoder would have to be trained from scratch to learn the features and the breast tissue area characteristics with significantly lower performance. Finally, the VGG16 encoder is converted into a symmetrical UNET architecture. We applied the constructed model to all datasets (except INbreast), to extract the tissue regions that will be used in the following analysis steps (Figure 3).

Feature Extraction/Classification
In the literature, many techniques for feature extraction have been proposed. In recent years, deep convolutional neural networks (DCNNs) have attracted great attention due to their outstanding performance. In image classification issues, including image analysis as in [31,32], CNNs have been proven to be successful. A convolutional neural network (CNN) is made up of a series of trainable stages stacked on top of one another, a supervised classifier, and feature maps [33].
Transfer learning is used on our first model, which is based on the VGG16 architecture. The model was pre-trained on ImageNet, and the first four blocks of residual layers were kept frozen, except for the batch normalization (BN) layers, which required retraining to achieve better convergence. By applying transfer learning, a model can be trained using smaller sets of training data while still being capable of accurate predictions, mostly due to the learned parameters from the source model (in our case the pretrained ImageNet). An additional fully connected (FC) layer with a size of 1024 is added to the overall architecture, followed by a dropout regularization layer to ensure generalization performance. For the output layer, a final FC layer is added. Our model has three output  In the literature, many techniques for feature extraction have been proposed. In recent years, deep convolutional neural networks (DCNNs) have attracted great attention due to their outstanding performance. In image classification issues, including image analysis as in [31,32], CNNs have been proven to be successful. A convolutional neural network (CNN) is made up of a series of trainable stages stacked on top of one another, a supervised classifier, and feature maps [33].
Transfer learning is used on our first model, which is based on the VGG16 architecture. The model was pre-trained on ImageNet, and the first four blocks of residual layers were kept frozen, except for the batch normalization (BN) layers, which required retraining to achieve better convergence. By applying transfer learning, a model can be trained using smaller sets of training data while still being capable of accurate predictions, mostly due to the learned parameters from the source model (in our case the pretrained ImageNet). An additional fully connected (FC) layer with a size of 1024 is added to the overall architecture, followed by a dropout regularization layer to ensure generalization performance. For the output layer, a final FC layer is added. Our model has three output classes: normal, benign, and malignant. This model, which we will refer to as VGG16-NBM, is used to categorize each image patch from the entire breast MRI as either normal, benign, or malignant. A similar VGG16 model was also constructed with four outputs corresponding to the four different categories of breast density: ACR-A, ACR-B, ACR-C, and ACR-D. Similar approaches have also been reported in the literature that perform very well [34]. We will refer to this model as VGG16-ACR, and its task is to recognize the density class for each image patch.
For the training of the two VGG16-based models (VGG16-NBM and VGG16-ACR), the INbreast database was used. The input to the VGG16 models is patches the size of 256 × 256 pixels. To augment our training set, we exploited the capability of GAN topologies to produce artificial samples of a specific domain as presented in the following section.

Generative Adversarial Networks (GANs)
Recently, the idea of adversarial training has gained popularity, and deep learning research has advanced significantly. Since their initial presentation, generative adversarial networks (GANs) have attracted attention worldwide, and every year, even more studies are published and presented in different research areas, especially in medical image analysis. GANs have been used for data augmentation in several recent works [35][36][37][38][39], including medical image analysis.
Generally, training on a set with a large number of samples, performs well and gives high accuracy rates. However, biomedical datasets usually contain only a relatively small number of samples due to the limited number of patients that can be involved in different studies. To solve this problem, data augmentation can be used to increase the size of the input data by generating new data from the original input data. Given the rapid progress of generative models in synthesizing realistic images and the known effectiveness of simple data augmentation techniques (e.g., horizontal flipping, rotation, shifting, brightness adjustments), we have integrated two GAN models in our CAD system to synthetically augment the extracted patches from the training database. In this way, we can balance the class ratio of normal, benign, and malignant samples in the training set.
The first GAN (GAN-NBM) was used to produce synthetic patches from normal/benign/ malignant classes to ensure the robustness of the CNN-based classifier and variability of the samples. The second one (GAN-ACR) was implemented to produce synthetically augmented patches belonging to the four different tissue ACR categories. Figures 4 and 5 depict patches generated by the GAN model that belong to the specified classes. In Figure 4, the GAN-NBM produces ROI patches from the normal, benign, and malignant classes. In Figure 5, ROI patches representing the ACR breast density class are generated. The role of both GANs is to augment the training dataset in an unsupervised manner. For both cases, the INbreast annotation, which refers to the annotation of masses and microcalcifications in mammograms, was used, imposing at least an 85% ratio of overlap with the breast tissue mask that was estimated from the previous image segmentation step, especially for the patches of classes ACR-A, -B, -C, and -D.

CycleGANs
CycleGANs are used to train an image-to-image translation model, which does not depend on paired datasets to learn the mapping between the input and the output images [40].
The key to CycleGAN's success is the idea of an adversarial loss that forces the generated images to be, in principle, indistinguishable from real images. In our work, we adopt the architecture of CycleGAN as proposed by Johnson et al. [41], which has shown impressive results for neural-style transfer and super-resolution. Formally, given a source domain X and a target domain Y, CycleGAN aims to learn the mapping of G: X → Y between input and output images such that the G(X) is the translation of the image from domain X to domain Y. Additionally, it also aims to learn a reverse mapping of F: Y → X such that F(Y) is the translation of the image from domain Y to domain X.
In our work, the CycleGAN model is used to transform patches from classes (domain) ACR-B, -C, and -D to class ACR-A. The main purpose of this model is to subtract the effect of tissue masking from the breast patches and transform their tissue into class ACR-A. The training of the CycleGAN was performed after creating data belonging to two categories: one containing image patches from class ACR-A and the other constructed by considering all the remaining patches from classes ACR-B, ACR-C, and ACR-D ( Figure 6).
In Figures 7 and 8, patches from the unpaired categories ACR-A, ACR-B, -C, and -D are shown. The ROI patches in the first row of Figure 7 (belonging to classes ACR-B, ACR-C, and ACR-D) are transformed to the corresponding class ACR-A patches in the second row using the CycleGAN model. In Figure 8, the opposite transformation is depicted. Due to the cycle consistency characteristic of CycleGAN, we can also transform ACR-A patches back to the ACR-A class, thereby examining the convergence of the model (Figure 9). main) ACR-B, -C, and -D to class ACR-A. The main purpose of this model is to subtract the effect of tissue masking from the breast patches and transform their tissue into class ACR-A. The training of the CycleGAN was performed after creating data belonging to two categories: one containing image patches from class ACR-A and the other constructed by considering all the remaining patches from classes ACR-B, ACR-C, and ACR-D ( Figure  6). In Figures 7 and 8, patches from the unpaired categories ACR-A, ACR-B, -C, and -D are shown. The ROI patches in the first row of Figure 7 (belonging to classes ACR-B, ACR-C, and ACR-D) are transformed to the corresponding class ACR-A patches in the second row using the CycleGAN model. In Figure 8, the opposite transformation is depicted. Due to the cycle consistency characteristic of CycleGAN, we can also transform ACR-A patches back to the ACR-A class, thereby examining the convergence of the model (Figure 9).
Since data from this categorization are highly unbalanced, the GAN-ACR model described in the previous section was used to produce synthetic data up to a total of 10 million patch images. The CycleGAN was left for several epochs to run (in each epoch the network uses a pair of 10 million image patches, which are randomly shuffled at the end of each epoch).     In Figure 10, we present some examples of ROI patch transformations to ACR sity class with the CycleGAN topology along with the changes in the heatmap b the density transformations. For these cases, even after their change in density, th fication of the patch remains unaltered. We have also noted that in a small num cases, some artifacts appeared in the lower right corner of the resulted patches, b do not affect the system's performance in any significant way. Since data from this categorization are highly unbalanced, the GAN-ACR model described in the previous section was used to produce synthetic data up to a total of 10 million patch images. The CycleGAN was left for several epochs to run (in each epoch the network uses a pair of 10 million image patches, which are randomly shuffled at the end of each epoch).
In Figure 10, we present some examples of ROI patch transformations to ACR-A density class with the CycleGAN topology along with the changes in the heatmap between the density transformations. For these cases, even after their change in density, the classification of the patch remains unaltered. We have also noted that in a small number of cases, some artifacts appeared in the lower right corner of the resulted patches, but they do not affect the system's performance in any significant way.
In Figure 10, we present some examples of ROI patch transformations to ACR-A density class with the CycleGAN topology along with the changes in the heatmap between the density transformations. For these cases, even after their change in density, the classification of the patch remains unaltered. We have also noted that in a small number of cases, some artifacts appeared in the lower right corner of the resulted patches, but they do not affect the system's performance in any significant way.  An advantage of the utilized translation model is that consecutive applications of density transformations of the input patches do not alter the ACR classification when the input patch falls in the ACR-A category. In Figure 10, we depict the results of two successive density transformations of a malignant ACR-A ROI to the ACR-A class via the CycleGAN. It can be seen that the ROI's density is not altered visually, while the network classifies it again as malignant.
All experiments in this work were carried out on a Linux workstation equipped with an NVIDIA RTX 3090 24 GB, GDDR6X. The deep learning models were all implemented in Python 3.8, in Ubuntu 20.04, with TensorFlow 2.8 and Keras 2.8 API. The CycleGAN model used comes from the original implementations [41]. Training time for CycleGAN was 25,500 min/epoch for the initially constructed dataset. The model was on average trained for 10 epochs, and the training set was gradually increased via artificially generated images produced by the acGAN model. The augmentation of the datasets was performed via the albumentations library.

Experimental Results
To evaluate the performance of the proposed CAD system for each of the previously presented mammographic databases, we have used the precision, recall, accuracy and F1-score metrics as follows: True positive (TP) represents the number of positive cases that have been correctly classified as positive. True negative (TN) is the number of negative classes that have been correctly classified as negative. False positive (FP) represents the number of negative classes that have been misclassified as the positive class. False negative (FN) represents the number of positive classes that have been misclassified as negative. Typically for each experiment, a confusion matrix was also generated reporting the following cases.
The VinDR-Mammo dataset contains 20,000 total images, of which 988 (4.94%) are malignant, and 5606 (28.03%) are benign. The ACR density distribution is 0.5%, 9.54%, 76.46%, and 13.5% for the four ACR classes, respectively. To compare the overall improvement of the proposed CAD system, all ROI patches are classified before and after changing their density to ACR-A using the CycleGAN network. The performance of the CNN model based on VGG16 before and after utilizing the CycleGAN transformations is depicted in Figure 11, which shows a significant improvement in the metrics. The recognition results on the left of each dotted line correspond to the initial CNN performance, while the results on the right correspond to the performance after the application of the CycleGAN density transformations. The overall accuracy is dramatically increased from 85% to 91%. The SuReMaPP dataset contains 343 images, with 0 (0%) malignant and 132 (38.4 benign cases. The results for the SuReMaPP dataset are shown in Figure 12. The ove accuracy is further improved from 96% to 98%. The SuReMaPP dataset contains 343 images, with 0 (0%) malignant and 132 (38.48%) benign cases. The results for the SuReMaPP dataset are shown in Figure 12. The overall accuracy is further improved from 96% to 98%. The SuReMaPP dataset contains 343 images, with 0 (0%) malignant and 132 (38.48%) benign cases. The results for the SuReMaPP dataset are shown in Figure 12. The overall accuracy is further improved from 96% to 98%.  Figure 13. The overall accuracy is increased from 96% to 97%. The MIAS dataset contains 322 total images, with 54 (16.77%) malignant and 69 (21.43%) benign cases. The results for the MIAS dataset are shown in Figure 13. The overall accuracy is increased from 96% to 97%. The DDSM (digital database for screening mammography) dataset contains 10,480 images, with 1936 (18.7%) malignant and 2628 (25.4%) benign cases. The results for the DDSM dataset are shown in Figure 14. The overall accuracy is dramatically improved from 67% to 79%.  Figure 14. The overall accuracy is dramatically improved from 67% to 79%. The DDSM (digital database for screening mammography) dataset contains 10,480 images, with 1936 (18.7%) malignant and 2628 (25.4%) benign cases. The results for the DDSM dataset are shown in Figure 14. The overall accuracy is dramatically improved from 67% to 79%. For comparison reasons, we report the proposed system's performance when using the INbreast dataset, which was used in all the relevant CNN/GAN-based topologies for training. The INbreast dataset contains 410 images, with 100 (24.39%) malignant, 243 (59.27%) benign, and 67 (16.34%) normal cases. The percentage of images in each ACR category is 36%, 35%, 22%, and 7%, respectively. In Table 1, we present the evaluation of the VGG16 classification model (normal-benign-malignant) For the incorrectly classified ROI patches, after exploiting the CycleGAN model to transform their density to ACR-A class, the classification results are shown in Table 2. From the above table, we see an F1-Score of 60%. Only the patches falsely classified as normal-benign-malignant in Table 1 are processed by the CycleGAN model, which transforms their ACR densities to ACR-A and recalculates the classification outcome. The combination of the data in Tables 1 and 2 give a total classification accuracy for the INbreast dataset of 99.77%.

Discussion
Breast tissue density is a known risk factor for cancer development, as women with denser breasts have a higher likelihood of developing cancerous regions compared to women with less dense tissue. However, abnormalities in the breast, whether malignant or benign, can often be concealed by the glandular and connective tissue, making it difficult for both radiologists and computer-assisted diagnosis systems to identify them early or during follow-up screening. Because connective tissue, glandular tissue, and malignancies all appear as white regions on a mammogram, cancerous regions may be hidden by healthy tissue. Our approach considers the density of the examined region of interest (ROI) by attempting to "uncover" and reveal what is masked by the tissue effect.
In this study, we used two types of public screening mammography datasets, film and digital, to demonstrate the effectiveness of our proposed method. It is important to note that our proposed reverse transformation process based on CycleGANs only operates within the breast density mask and applies solely to those ROI patches that are incorrectly classified by classical CNN-based CAD systems as normal, benign, or malignant. Consistent with our experimental findings, the CycleGAN model successfully learned to translate ROI breast density from low to high (and vice versa) while preserving all domain features necessary for accurate type classification.
In the film mammography datasets (MIAS, DDSM), the accuracy improvement is up to 12%, which can be attributed to the CycleGAN model's ACR reverse transformation process acting as an image enhancement step. However, the image quality of these datasets is inferior to that of fully digital ones (VinDR, SuReMaPP, INbreast), where the accuracy improvement is up to 6% at most, as these mammograms have better quality with no extreme intensity variations. When dealing with datasets acquired with equipment variations in hardware and time, the histogram transfer technique produces a good common reference and makes film mammography usable in mixed CNN-based solutions.
Although CycleGANs often introduce artifacts in the output images (as seen in the lower right corner of patches in Figures 7 and 10), this behavior is expected and does not affect the validity of the artificial patches. These artifacts can be resolved with longer training epochs and more unpaired patch samples. In this work, we used 18K total unpaired ACR class patches (9K from class ACR-A, and 9K from classes ACR-B, -C, and -D, both real and artificial ones produced by the acGAN model), which took approximately six months on an RTX 3090/24 GB-based system for producing valid ACR class patches. The ACR heatmaps demonstrate that the transformed patches' density changes do not affect the patch's classification into the appropriate ACR class, providing macroscopic evidence of the process's ground truth.
We did not optimize the CNNs/GAN topologies in terms of hyperparameters, but we attempted to keep our system's design relatively simple while enhancing its accuracy performance. More sophisticated and precise models can be deployed. Furthermore, our task, which involves transforming the ACR density of the examined ROI to reveal underlying findings, works particularly well for ACR density classes B and C, which account for 80% of all female breast cancer cases. For ACR-A, the density transformation back to the same class A contributes very little, as expected. Similarly, findings in class D density ROIs are much harder to identify, and the density transformation process should be augmented by extra information (such as BI-RAD characterization) for better diagnostic outcomes.

Conclusions
Our study presents a new computer-aided diagnosis (CAD) system for breast cancer that can classify suspicious regions of mammograms into three categories: normal, benign, and malignant. Our aim is to improve the accuracy of this classification by taking into account the density of the patient's breast tissue. Dense breasts are more likely to have invasive ductal carcinoma due to the increased amount of glandular tissue, which can make it harder to detect abnormalities. Our proposed CAD system solves the problem of "masking" in mammograms, where dense breast tissue can hide abnormalities. We achieve this by using a process that reverses the effects of breast density on mammograms.
As we hypothesized, the CycleGAN models not only learned how to translate from low-to-high breast density but also preserved the domain characteristics during translation. However, the present study is not without limitations. One limitation of our work involves the availability of healthy ACR-D mammograms to train the generative models. There are not many annotated datasets publicly available that are fully digital, so one must resort to a closed set of mammograms. Another limitation is the imbalanced nature of the problem; the ratio between the different ACR classes is by no means equally distributed. In our approach, this was partially resolved by using image manipulation methods (i.e., GANs) that produced artificially patched images via domain adaptation. However, it is known that this process does not extend the feature space of the problem but rather produces structurally similar images, which, in many cases, result in overfitting. Although the sophisticated mathematics underlying deep learning training algorithms is conceptually understandable, their architectures are more of a "black box" paradigm. In the case of the breast density CycleGAN, one must comprehend the learned mapping via post hoc explainability [42]. The post hoc explanation is a task that will be performed in future work. We also plan to conduct further testing of our proposed system using real clinical images and interpret the transformations performed by the CycleGAN with the assistance of a team of radiologists.
Although more data are needed to fully examine the extent of this reverse process, in all datasets that we tested, the overall percentage of successful recognition for normal/benign/malignant ROIs was improved significantly. In future work, we plan to expand this method to the whole breast region, by using more advanced GANs and CNNs that can analyze the relationships between neighboring areas. Our approach could help doctors and radiologists identify suspicious regions and plan treatments at an early stage, potentially avoiding consequences and treatment difficulties.