GARL-Net: Graph Based Adaptive Regularized Learning Deep Network for Breast Cancer Classification

Across the globe, women suffer from breast cancer fatal disease. It is arising surprisingly due to a lack of awareness among them and the inconvenient reach of diagnostic systems. Many computer-aided diagnostic systems have been developed for the detection of cancer. These systems have quite lower performance, so more accurate diagnosis is the need of the time to save the life of human beings. For large and imbalanced image datasets, efficient learning of the network is very important to detect and classify breast cancer more accurately. In this paper, a graph based adaptive regularized learning of deep network (GARL-Net) is proposed for more accurate breast cancer classification. Transfer learning is used for training the backbone network DenseNet121. furthermore, fine tuning of backbone network is followed by the estimated improved loss function. The improved loss function is actually graph based adaptively regularized complement cross entropy loss. The SoftMax cross entropy in itself is not sufficient to classify image samples accurately, so complement entropy technique is incorporated with the cross-entropy loss to overcome the misclassification issue. Further, by adaptive scaling of regularization term with spatial graph Laplacian basis used to adaptively penalize the complement cross entropy loss for improving the learning of the network. The performance of the proposed method is evaluated using BreakHis and BACH 2018 histopathology image datasets and outperforms the existing state-of-the-art methods and achieved 99.00% of precision, 99.40% of recall, 99.20 % of F-1score, and 99.49% of accuracy for binary classification of breast cancer image samples of BreakHis dataset.

Batch Normalization (BN) concepts suggested by the Sergey Ioffe in 2015 and drop out concepts suggested by the Hinton in 2012 for Fully Connected Network (FCN) to overcome the problem of overfitting and covariance shift during the training of the network [24], [25], [26], [27], [28]. Global average pooling also helps to avoid the overfitting because of absence of parameter [24]. Cross entropy (CE) is the most general used loss estimation concept, but there is some sample misclassification issue. The complement cross entropy (CCE) estimation as cost function is capable of solving the problem of sample misclassification for balance and imbalance types of available training datasets [53].
The graph based convolutional neural networks have proven their capability in the diverse field of applications such as: image processing, video processing, protein-protein structure recognition, genome sequencing etc. specially in image processing, it has wide applications because of its robust structural discrimination in an image [54], [55], [56]. To handle multiple multivariate time-series forecasting problems Recurrent Graph Evolution Neural Network (REGENN) as a time aware graph network is used [57]. Adaptive graph regularization-based prediction of the drug side effects uses sparse structure learning for acknowledging side effects interrelation and also to explore the local structure in drug data [58]. Optimization of loss function guarantees for better results from the deep networks [60], [61].
For BC diagnosis HIs provide more comprehensive information as compare to the mammography, ultrasound, and MRI images. Diseases are analysed by the lesions-based tissue and cells detection from histopathology images [63]. Another aspect is also related to the grading of BC, which is more efficiently can be done from histopathology images and could provide better sensitivity toward benign and malignant cancer subtypes [64].
It is reviewed that many problem-solving efforts have been undertaken for the binary classification of BC from histopathological images. The accuracy of the classification of HIs reported by these methods is quite inefficient due to inefficient learning of the networks.

A. CONTRIBUTIONS
To address the above mentioned limitations, a novel graph based adaptive regularized learning of deep network proposed to achieve the optimal learning and efficient binary classification of BC. The following points depict the contributions of proposed work: • We have proposed an innovative GARL-Net to adaptively regularize the loss function for improving the training of the network.
• Graph convolution is used to acknowledge the spatial information of extracted features and improve the loss by adding Laplacian basis matrix with the regularization term.
• The complement cross entropy concept (CCE) is estimated as loss function to overcome the problem of misclassification of samples.
• The reported results laid out the effectiveness of the proposed method for the binary-classification of BC.
• Comparison of performance of the proposed model established with the other state-of-the-art (SOTA) methods using histopathological images of BreakHis and BACH-2018 datasets. The remaining part of the paper is organized as follows: Section II presents the literature survey of related works already done on HI image classification and. Section III explains the proposed algorithm. Section IV presents the intensive experimental evaluations. Section V presents the verification of the model performance over BACH-2018 and 9096 VOLUME 11, 2023 mixed datasets. Section VI giving limitations of the proposed work in brief while, section VII is the the conclusion part with some future directions of the current work.

II. RELATED WORK
In today's environment, there is rapid growth seen in the cancer cases. Breast Cancer Histopathological Image Analysis (BCHIA) is the procedure in which ML approaches have a huge impact [4]. BC image segmentation, detection, and classification are being done with the Machine Learning (ML) approaches and are widely in practice [5], [6], [7], [8].
The designing of CNN architecture is a challenging thing and trending innovations in DCNN architectures categorize into seven types such as: spatial exploitation, depth, multipath, width, feature-map exploitation, channel boosting, and attention [10]. CNN models are actually a black box in itself which has lots of opportunities to modify it layer wise. An efficient CNN model is designed for the classification of histopathological images and reports with low error rate of 22% [12]. Ensemble of outputs of the models by using TL is succeeding to the other methods as, giving overall accuracy of 95.29% [37]. The pre-trained networks are performing well in several applications and layer-wise fine-tuning could be a practical approach to achieve better results [13], [15], [17], [18], [36], [45], [46], [47], [48]. The CNN model with an augmented dataset is the most opted technique and gives 94.56% average accuracy [18]. The method of classification using pre-trained DenseNet121 network reports an accuracy of 97.14% for the classification of the 200X magnification BreakHis dataset [8]. Deep Belief Network (DBN) based BC classification framework is suggested and reporting accuracy of 86% using patch based deep learning modelling [69]. Deep feature fusion with enhanced routing (FE-BkCapsNet) was performed and reported average accuracy of 93.7% using histopathology image BreaKHis dataset [71].
The HI classification for BC classification has reported average classification accuracy of 84.89% using AlexNet [65], 93.5% using GoogLeNet, 94.13% using VGGNets and 94.35% using ResNet for image level classification on BreaKHis dataset [66]. Binary classification of BC is done with the existing Deep networks such as Resnet101 and DenseNet121 has reported classification accuracy of 91.43 % and 96.74% respectively using BreaKHis dataset, while it is 91.53% and 96.38% respectively using BACH-2018 dataset [67]. The DenseNet121+GAN has reported 99.13% accuracy [68]. In association to above mentioned results, DenseNets are having its own state of art architecture which provides better feature extraction capability and fast training due to dense residual connections and the weight sharing capability respectively.

B. SUPPORT VECTOR MACHINE (SVM) BASED APPROACH
Support Vector Machine (SVM) is being used for classification and regression problems. The method using HI patch reports 91% of accuracy with SVM as a classifier [5]. Cascaded ensembles of SVM or CNN classifiers provide a noteworthy reduction in error rate and, at the same time, have a positive impact on performance by managing the accuracy-rejection trade-off [7]. Bayesian deep learning is used to estimate the uncertainty in measuring entities [15], [29]. Distinctive feature classification using a combination of pre-trained CNN and SVM as a classifier proceeded with the belief theory-based classifier fusion (BCF) technique is employed to achieve a pretty good average accuracy of 96.91% [36].

C. AUTO-ENCODER BASED APPROACH
The re-encoding is done by the spatial attention module to select fine mitosis features and multi branch subnets to classify into the specified class of mitosis objects [20]. Deep manifold preserving technique learning with autoencoder is used to classify the BreakHis dataset. Manifold learning is used to overcome the [21]. A Network in Network (NIN) approach was implemented to classify the MNIST datasets, in which the combination of convolution with multi-layerperceptron (MLP-Conv) was designed [24].

D. HYBRID LEARNING APPROACH
The hybrid learning based approaches uses the combination of DL and ML techniques. These approaches are used for different applications such as: image enhancement, image denoising, color normalization etc. and feature extraction process to make sure feature wise samples discrimination. Classical and DL based approaches are combinedly the better technique to extract and classify the image samples, but these hybrid methods take more time to train and also bulky models. The mammography image detection by using segmentation and wavelet transform as a combined preprocessing approach for BC detection with transfer learning [19]. I. Hirra et. al. have suggested probabilistic approach of deep belief network (DBN) using multiple restricted Boltzmann machines (RBMs) and reported high accuracy [69]. FE-BkCapsNet used in [71], it proposes deep feature fusion with enhanced routing. Recenlty, a hybrid CNN frameworks of ShuffleNet and ResNet using mammogram image dataset has been proposed. Its an efficient model and reported accuracy of 99.17% [70].

E. LIMITATIONS OF EXISTING WORKS
A wide variety of techniques for BC diagnosis have been proposed by the various researchers. The existing techniques of domain have some limitations. Technique wise limitations of the existing works are discussed below: Cascade classifier ensemble system is developed, which is bulky in size and requires significant time to perform [7]. hyper parameter tuning with different settings for different VOLUME 11, 2023 CNN models for binary classification giving good accuracy but consumes high time due to cumbersome architecture [8]. Wavelet transform based CNN (WCNN) is developed, but its performance is limited in terms of accuracy and time requirement because training of wavelet transform based CNN is tough as that of conventional CNN [19]. Attention based multibranch networks suggested in [20] are bulky in size, complex and computationally inefficient. Deep manifold preserving autoencoder based classifier is a good choice due to lightweight and computationally fast but the performance of inadequate [21]. Inception Recurrent Residual Convolutional Neural Network (IRRCNN) model is a combination of inception, recurrent and residual concepts. The complexity of IRRCNN is high but performance is not efficient in proportion [36]. Multinetwork based framework suggested in [37], [66] can be a good choice for accuracy but their physical implementation is quite difficult due to huge training time. Light weight network, AlexNet is used in [45] for the classification of BC but resulted insufficient performance. Most of the existing methods returns biased classification in case of unbalanced dataset and requires additional efforts for data balancing. The data augmentation approach is considered by planity researchers for data balancing [48]. Unsupervised learning-based approach for mislabelled samples is suggested by Y. Zhou et. al., where GAN and DenseNet121 are combined to perform anomaly detection. The combination makes system bulky and computationally inefficient and reported low accuracy [67].
In light of above, mainly three drawbacks of existing works have been observed; (1) lower performance (2) higher complexity (3) inferior learning and (4) data imbalancing. In the proposed work learning of network has been improved by using proposed Graph based adaptive regularization of estimated loss function. Shannon's based entropy evaluation of misclassified samples is called as complement entropy. It is combined with cross entropy (CE) and termed as complement cross entropy (CCE) loss function. It returns a robust system, even with imbalanced image dataset. The learning with graph based adaptive regularization of estimated loss also results in improving performance.

III. PROPOSED WORK
Deep Convolutional network performs the convolution of image data and specified Kernel in Euclidean space. Output after each of the convolutional operations depends on the considered stride, padding, image size and Kernel size. But TL is the most adapted and time efficient technique used to train the base model by transferring weights from the network trained on large size ImageNet dataset. Figure 2 illustrates the proposed model for the classification of BC using histopathology images.

A. DENSENET121 BASE MODEL
Let X = x 1 , x 2 , · · · , x n ∈ R n×d be the unlabeled training set and Z = z 1 , z 2 , · · · , z n ∈ R n×p be the image label matrix, where 'n' is the number of image samples, d is the feature dimension and 'p' is the number of class labels, for binary class of classification p=2. We represent the vector z k = z k,1 , z k,2 , · · · · ··,z k,l ∈ {0, 1} p , a class label of the kth image sample x k and k=1, . . . .., n. Let w i = w i,1 , w i,2 , · · · · ··,w i,l T ∈ R m×d be the weight matrix between feature vector and the convolutional filter (Kernel), b i ∈ R m is the bias to the corresponding weight and ρ is the nonlinear activation function. Feature vector by using a linear mapping and a nonlinear activation function. Feature vector by using a linear mapping and a nonlinear activation function is is the k th perceptron convolutional output.
where ⊙ Hadamard product, X i be the feature maps of the i th convolutional layer, let there are f convolutional layers in the deep network and i = 1, . . . , f . Let X f be the features of the last deep layer and weight sharing is proceeded by the previous layer.
where, H f is the composite function of batch normalization (BN) followed with a ReLU function and a (3 × 3) convolution layer. In backbone network layers of dense block connected tightly, backbone network shown in Figure 2 (a). Number of skip connections between the layers are calculated as: P(P − 1)/2, where P is the number of layers in dense block, unlike the general convolutional network. The feature maps growth rate in the dense block can be calculated by interrelation g P = g 0 + (P − 1) g. Where g 0 is the number of channels in the input layer and g is the growth rate or feature maps generated every time with H f . In the dense block 1 × 1 bottleneck layers placed before each 3 × 3 convolution layer followed by transition layers to improve the network size by controlling the number of output feature maps. Global Average Pooling (GAP) is more native and more robust for spatial translation and work as a structural regularizer by enforcing correspondences between feature maps and categories [24]. It is suggested that nearly 50% dropout can be the efficient way to overcome variance maximally [27], [28].

B. MODELLING OF LOSS
Predicted class of sample, and finally, it is interpreted into a particular class level (0 or 1) for the binary class of classification. The SoftMax function is mathematically interpreted as in Eqn. (3).
Here ∅ i is the input feature vector after dropout consideration. 'r' is number of classes (for binary class r=2),ŷ j is the predicted value of respective class and elaboratively depicted in Eqn. (4). Let y j is the true label or ground truth then cross-entropy loss: To overcome the problem of misclassification of the samples, a complement entropy concept is suggested. This is the entropy loss modeled with the mean of Shannon's entropy for the incorrect classes [32] expressed in Eqn 5.
Matching the scale between cross entropy and complement entropy, there is a scale balancing factor γ c−1 is added and expressed in Eqn 6.
In Eqn 7. 'c' is the number of classes and γ is the modulating factor, and set as to complement the cross entropy by tunning it.
where 'd' is the number of channels. In general, γ = −1 is selected but can have (γ < 0). Now again the complement cross entropy (CCE) loss is regularized and assumed for multi-channel training of the system, mathematically expressed in Eqn. (8).

C. SPATIAL GRAPH LAPLACIAN BASIS REGULARIZATION
The features X f are extracted from the backbone deep CNN are mapped in graph node vector can be represented as, DenseNet121 with layer by layer and further graph aware computation is adapted to capture the class related feature maps using graph G. Let G = (V , E) the undirected weighted graph. Each node contain the class related feature values and a big pixel in the mapped feature vector ′ x ′ .
where, n i ∈ R C×1 and n i ∈ R C×1 are the i th and j th node vectors respectively, while e ij is the edge aware adjacency matrices. The spatial graph method generates similar structures regarding all images. The spatial feature structure of the relative classes is discriminated with the evaluation of the node's closeness [54], [55], which means Gaussian of Euclidean distance between vertices n i and n j is computed [33], [34], expression is shown in Eqn 9.
where, e ij ensures the structural information and σ s is the softening parameter and has value as the average distance between vertices of graph G.
Node connection = e ij > 0, connected 0, otherwise Adjacency matrix A = e ij |N |X |N | is a symmetric semidefinite matrix, number of nodes in the graph are decided by the spatial size of the extracted feature maps. Let's Laplace matrix is where, D ∈ R n×n is the degree matrix which is diagonal and symmetric with the element d ii = j e ij , p th diagonal element equal to the sum of all elements of p th row of A. The eigen vectors of the Laplacian matrix L p acts as Laplacian basis B p , ensures the spatial information of the features. Symmetric normalized Laplacian matrix is considered in graph convolution network (GCN) as, Here I ∈ R n×n is the identity matrix and in Eqn. 10, is a symmetric positive semidefinite matrix with the eigen values. GCN uses for the representation of graph, orthogonal decomposition of can be represented as, where, is used to represent eigenvalues of matrix , and the matrix U is an orthogonal matrix, i.e., UU T = I . In graph convolution the convolutional operator g θ can be represented as, where, in Eqn. 11, g θ is the convolutional parameter and mathematically it is the random univariate function g(x) is a group of affine functions or basis functions. In order to avoid VOLUME 11, 2023 Step (c) shows the graph based adaptive regularization term estimation module.
Step (d) shows the complement cross entropy loss calculation.
Step (e) shows the calculated adaptive regularized total loss or cost function. calculation of U and in preceding layers, GCN uses the Chebyshev polynomials T k (·) of order 'k' to approximate the function g θ for fast filtering operation. Truncated expansion of order 'K − 1' is adapted for the filter parameterization as, where, θ ∈ R K is a vector of Chebyshev polynomial coefficients expressed in Eqn.12. Here it is assumed the second order truncated expansion so, where,˜ = − I is termed as a deformation of eigen value matrix ∈ [−1, 1] and also to satisfy the parameters, Now, the layer wise graph convolutional operation and obtained the result as in Eqn.13, x l = HxW gcn (13) l is the graph convolution layers, here in this experimentation l = 3 is considered. The number of nodes does not vary; meanwhile, only feature aggregation and massage passing takes place from layer to layer. A node can be regarded as a receptive field for By considering the Spatial Graph Aware function to calculate the feature maps appearance variation, deep feature maps are to be orthogonal to the Laplacian basis B p , Laplacian matrix encodes the class related spatial structural information of feature maps and calculated as in [33] and [34]. Graph based adaptive regularization (GAR) is used for further loss function modification to penalize the loss function during training of the network as, All the fixed parameters are set initially before training of the model, values of the parameters such as Weight decay, Batch size, λ, η, γ , β 1 and β 2 are given in the algorithms. The proposed objective function for minimization of complement cross entropy loss as in Eqn. (15). The weight update rule would be as per the learning optimization and convergence of the learning process [61], [62]. In this work Adam optimizer [56] is used for learning optimization of the top layers. In Equ. (16) η is the learning rate which belongs to R(0, 1) and ϵ is stabilization parameter. Them (t) andψ (t) are the first and second moment estimates defined as;

IV. EXPERIMENTS AND RESULTS
The proposed modified loss-based algorithm Spatially graph based adaptive regularization of the CCE loss (GAR-CCE) is VOLUME 11, 2023 evaluated quantitatively as well as qualitatively on BreakHis histology image dataset.

A. DATA SETS
For the evaluation of the model, most acceptable and easily available BreakHis dataset is considered [62].  Table 2, BACH-2018 dataset is available in https://iciar2018challenge.grand-challenge.org/Dataset/. The mixed dataset is actually the combination of BreaKHis and BACH-2018 datasets and used for evaluating model performance. These datasets are splited into a (70%) training set, (20%) test set, and (10%) validation set used to train, test and validate respectively the proposed model.
Before applying datasets image samples, it is labeled all the image samples and then augmentation is followed with zoom range=0.2, horizontal flip and vertical flip in rotation range of 90 • .

B. EXPERIMENTAL DETAILS
The proposed model is designed exclusively to classify the BC into the Benign and Malignant classes. The proposed model was implemented on the Tensorflow platform with Keras interface using Python programming to train and perform the experimentation of the proposed model. System configurations: Intel core i7 HP and 2.7GHz NVIDIA GeForce GTX 1080 GPU along with 16GB RAM and Windows-10, (64 bit) operating system. Hyperparameters play an important role for fine-tuning the neural network, for that point of view, in this work we have Batch size of 64, Weight decay = 10−4, Initial learning rate (η) = 1 × 10−4, Regularization Parameter (λ) = 0.0004 and β1 = 0.9 and β2 = 0.999.

1) PERFORMANCE METRICS
To evaluate quantitatively the system classification performance there are certain metrics such as accuracy, recall score or sensitivity, specificity and F1-score [14], [23]. These are mathematically expressed as follow as in Eqn. 17-22, shown at the bottom of the page, and 23 respectively: Matthews Correlation Coefficient (MCC) is also a statistical rate, which is more reliable as compared to F-1 score; it gives a high value only when the model performance goes high. So, it's very easy to differentiate the performance of the methods used. Area Under Curve (AUC) is the common evaluation metric, which is useful for choosing optimal models.
9102 VOLUME 11, 2023 where, I b and I m denote the number of Benign and Malignant BC images, respectively, and R i be the rank of the i th Benign image in the ranked list.

2) PROPOSED METHOD EVALUATION a: VISUAL ASSESSMENT
The feature maps are extracted features of samples of the respective class by the TL based base model or CNN network shown in Figure 4.  The feature maps are depicted for the DenseNet121, in which it can be seen for different spatial locations with the variation of the receptive field by varying a pixel from activation to activation. Layer wise features are progressively extracted by the backbone network, and feature selection is progressively completed by the network. Here are some of the feature maps represented for layer 2, layer 50, and layer 107, respectively, in Figure 3. Some of the classified samples by the proposed network are shown in Figure 5. Labeled samples are classified most accurately as benign labeled sample is classified as benign and the malignant labeled sample is classified as malignant. Only nine samples are shown as the classified image samples after experimentation with the proposed GARL-Net model with fine training.

b: QUANTITATIVE ANALYSIS
The BC classification results for the proposed GARL-Net model with an adaptively regularized loss method verify the performance of the model.  Figure 6 shows the validation accuracy for the model with graph based adaptive regularization (GAR) cased CCE lossbased training. Overall, the proposed GARL-Net model gives an accuracy of 98.80 %, which is 0.72 % better than the [8] and 0.08% better than the [18]. The receiver operating characteristics (ROC) is the performance evaluation metric by trading off between the true positive rate and false positive rate for the model with probability threshold measures. The ROC plot for the proposed technique is shown in Figure 7, the training of the model is done by using transfer learning, and only the top layers are trained. At this learning strategy, it gives AUCscore of 0.9886 for GAR+CCE loss based learning. Here in Figure 7, ROC for three losses-based learning is plotted to see the effect of the learning.  performance metrics are shown in Table 3 for model training without fine-tuning. The validation accuracy is achieved 96.40%, 97.79%, and 98.80%, respectively, for CE loss, CCE loss and GAR+CCE loss-based learning. The proposed technique-based training of the model is more appropriate as compared to the simple and basic cross-entropy loss-based training of the network. In Figure 8, confusion metrics are plotted to see the number of classification and misclassification of the samples for corresponding losses. This is plotted between the predicted and the actual labeled samples. All the information about sample classification of respective class is intensively briefed with in the figure caption. The validation accuracy plot is shown in Figure 9, a finetuning is followed with the estimated loss. Fine-tuning, the proposed mosel improves the feature extraction, and overall classification improves.
In Figure 9, accuracy plots of unified (GAR+CCE) loss based learning are plotted respectively. With the fined, it achieved an accuracy of 97.22%, 98.52%, and 99.49% for CE loss, CCE loss, and GAR+CCE loss, respectively. From the results, the effectiveness of the proposed technique is verified. In Figure 10, the ROC plot for these three-loss based learning is shown, and it is seen that the proposed technique is also solving the problem of dataset imbalancing and the proposed technique signified its importance.
In Table 4, the performance metrics are presented as results during the experimentation with the techniques proposed for the training of the model with fine-tuning. In the case of the  fine-tuning of model, training is better as compared to general transfer learning. Fine-tuning sets the model hyperparameters one step ahead of transfer learning, in which the model starts learning and adjusts the weights over the weights transferred from the pre-trained network. Hence proposed method gives a pretty good boost in performance over the state-of-the-art 9104 VOLUME 11, 2023 methods (SOTA). In Figure 11, confusion metrics are plotted to see the sample classification and misclassification of the samples for corresponding losses. This is plotted between the predicted and the actual labeled samples.

V. ABLATION STUDY
In order to verify the effectiveness of the proposed technique, we conducted an ablation study. BACH-2018 dataset is used to verify the effective performance of the technique, dataset distribution is shown in Table 2. The performance measures are evaluated and quantitively represented in Table 5. In this performance for general training of the model is shown. Proposed model is performing well and 95.00% of precision, 95.00% of recall, 94.99% of AUC and 95.00% of accuracy is achieved. This is giving 5.00% more accuracy as compared to CCE loss based learning of the model and 7.5% more accuracy as compared to the CE loss based learning of the model. ROC for the different estimated loss based training of the model is shown in figure 12.
From the ROC plot shown in Figure 12, it is clear that the proposed method performing well as compare to the other two loss based learning of the model, AUC for the different approaches are 0.8750, 0.9000 and 0.9499 for CE, CCE and proposed (GAR+CCE) loss beased learning respectively.
Fine-tuning of the model with proposed technique for BACH-2018 dataset is performing well as compare to other techniques. Quantitative performance is shown in Table 6,  ROC for the different estimated loss based training of the model is shown in Figure 14, from this plot, it is clear that the proposed method performing well as compare to the other two loss based learning of the model, AUC for the different approaches are 0.9000, 0.920 and 0.9750 for VOLUME 11, 2023    CE, CCE and proposed (GAR+CCE) loss beased learning respectively. Confusion matrices for the respective techniques are shown in Figure 15, information about accuratel classified and misclassified samples is shown cleary after experimentation. Compariosion of the performance of proposed technique for BACH-2018 dataset in terms of accuracy is depicted in Table7. It is clear that the proposed technique is performing well as compare to state of the art methods.
Further, more analyzing and verifying the performance of the proposed technique, it has experimented on the mixed  (BreaKHis and BACH-2018) dataset. Details of the dataset is given in Table 2.  Figure 16.
AUC for these different approaches are 0.9577, 0.9672 and 0.9775 for CE, CCE and proposed (GAR+CCE) loss beased learning respectively. Confusion matrices for the respective techniques with mixed dataset are shown in Figure 17   information about accuratel classified and misclassified samples is shown cleary after experimentation. F-1 score. ROC plot for the different estimated loss based fine tuning of model is shown in Figure 18. AUC for these different approaches are 0.9651, 0.9728 and 0.9878 for CE, CCE and proposed (GAR+CCE) loss beased learning respectively. Confusion matrices for the respective techniques with mixed dataset are shown in Figure 19, information about accuratel classified and misclassified samples is shown cleary after experimentation. There is ablation study is done using two different datasets and the model performance is achieved very good. Ablation study validates that the proposed technique is performing well robustly.
In Table 10, a comparison of the accuracy of SOTA methods with the same split ratio is compared. It can be observed that the proposed method has outperformed the existing state-of-the-art approaches with an overall accuracy of 99.49% with this data split ratio. To establish a fair comparison of the proposed technique against the existing works, with (1) considered the dataset, (2) learning technique, and (3) network used. Comparison of accuracy is presented in Table 11 for breaKHis dataset according to three points VOLUME 11, 2023   mentioned above. Many researchers have suggested different methods and techniques for learning from scratch and transfer learning-based approaches. Our proposed model is performing well with higher accuracy and signifies the model performance intensively. 5-fold cross validation of the proposed technique is performed on three datasets to validate the model performance for BC classification. Average performance of the proposed technique is presented in Table 12, average accuracy of 99.36%, 96.50% and 98.80% is achieved for BreaKHis, BACH and mixed dataset respectively.
BreaKHis and Mixed datasets are highly imbalanced because of large difference between number of benign and malignant samples. Classification of BC from these imbalanced datasets leads to biased classification in most of the cases. In CE loss-based technique, classification accuracy of benign and malignant classes is showing significant difference. The performace on imbalance dataset has been  improved in proposed (GAR+CCE) technique and limitation of biased classification is overcomed. Confusion matrices along with explanations are shown in Figure 8, Figure 11, Figure 13, Figure 15, Figure 17, and Figure 19 for respective learning techniques and used datasets. Table 13 shows the quantitative values for benign and malignant sample classification results for BreaKHis, Mixed, and BACH-2018 datasets. In BACH-2018 dataset classification accuracies of benign and malignant classes are mostly equal in model with CE but for BreaKHis and Mixed datasets accuracy is more biased towards malignant class due to imbalancing in dataset. As shown in table, the accuracies of benign and malignant classes for all the three datasets are almost balanced with Computational time analysis is done for all three datastes as shown in Table 14. The computational time (second) for each sample of different datasets reported on the basis of experimentation performed on all thee datasets. computational time is comparatively more for BACH-2018 dataset sample because pixel value of each sample is higher as compare to the sample in BreaKHis dataset.

VI. LIMITATIONS OF THE PROPOSED WORK
Although our work giving pretty good performance, but still improvement can be done. (1) We have used the pretrained DenseNet121 deep network for feature extraction which is quite bulky network in itself, in place of this backbone network light weight attention based efficient and powerful feature extracting network to be designed. (2) Graph convolution network (GCN) having time complexity of O n 3 is used in the proposed work to compute the Laplacian basis matrix from extracted features to adaptively regularize the estimated loss function. But more efficient GCN layer can be designed to get more effective regularization of the loss function. (3) The proposed network is performed for binary classification of BC histopathology images only.

VII. CONCLUSION AND FUTURE DIRECTIONS
In this work, GARL-Net a graph based adaptive regularization learning is proposed for improving the learning of the deep network to improve the performance for BC binary classification. The experimental results demonstrate that the proposed technique is outperforming as compare to the SOTA methods and reports good accuracy of 99.49%, 99.00% of precision, 99.40% of recall, 99.20% F-1 score, and 98.83% MCC score with fine-tuning of the network. Here, transfer learning is adapted to overcome the training time, and finetuning is performed to improve the model's learning further. Performance is analyzed with the performance metrics calculation to verify the effectiveness of the technique used. The proposed technique also overcomes the misbalancing data issue and improves performance. To further verify the performance of the proposed technique, ablation work is done for BACH-2018 and mixed datasets, achieved an accuracy of 97.50% and 98.83%, respectively. In summary, this technique is robust, reliable, and high performing with minimal error. We believe this technique is not only for BC classification but also may help improve the learning of other neural networks for image processing applications.
In future, different types of cancers and tumours classification can be done using the proposed model. This work can be extended for grading of the BC and transfer learning of the model can also be improved further. This model may also be extended to multi class clasificassion. Some light weight attention based CNN may be used as backbone network instead of DenseNet121.

ACKNOWLEDGMENT
Vivek Patel would like to thank the Ministry of Education (MoE), Government of India, for providing research assistantships to carry out the study. Shashikant P. Patole would like to thank Khalifa University for its financial support through the internal fund for high-quality publication.