XAI hybrid multi-staged algorithm for routine & quantum boosted oncological medical imaging

Medical imaging is the process of visualizing the diseased part, inside the patient’s body, with the aid of images. The field of medical imaging depends on several disciplines of science and technology, including physics, biological sciences, engineering, artificial intelligence and mathematics. These disciplines contribute in designing the imaging devices, installation of the devices and the collection and analysis of the images for better understanding and future forecasting of the disease prognosis and prevention. In this manuscript, medical images are analyzed with the aid of the a new hybrid machine learning approach, where the breast cancer images are studied in a novel manner with the help of a newly devised algorithm that is conceptually more sound as compared to already existing algorithms. Step by step stages are followed by the algorithm to process, filter, segment, statistically analyze and to classify the medical images. The results from different classification tools are compared in a novel manner, inspired from the explainable artificial intelligence tools for classification. The algorithm devised during this research can serve as a useful tool, in the evolving field of particle - physics -imaging.

Cancer can be diagnosed either from blood samples, biomarkers, and other approaches or can be diagnosed through medical imaging. There are several types of cancers, and thus, the diagnostic measures vary from case to case and stage to stage. The researchers from all around the world have contributed in developing different cancer diagnostic measures and standards. The Commission on Cancer (CoC) of the American College of Surgeons (ACS) developed the "Facility Oncology Registry Data Standards" ( FORDS) back in 2003. FORDS ensures the consistency in "code set use" and it has been updated with the progress in the field of oncology research. According to the 2016 manual of FORDS, both the cancer identification and cancer staging requires the medical imaging.
Cancer can be diagnosed with the aid of multiple or one of the most relevant diagnostic equipment, including, the computerized tomography approach, the magnetic resonance imaging approach, the X-ray, ultrasound and the positron emission tomography approach, respectively.

Quantum mechanics and imaging
Cancer is an unpredictably invasive disease and its fate is hard to identify, both, radiologically and through other tests. The medical images can not be understood sometimes, even by the expert, therefore, designing a smart AI tool for such images is a great challenge. For this reason, in this manuscript, we have added the filters to the algorithm so that we can improve the image quality, yet it is risky since the filters might remove useful information about the cancer pattern.
To overcome such challenges, the radiologists and physicists have worked on several options and one of such options is the "quantum mechanics" enabled solutions [1][2][3][4].
The physicists have proposed that the use of quantum mechanics can help to improve the magnetic resonance imaging by lighting up the molecules inside the body.
Different laboratories from across the world are motivated to work in this direction. One of such laboratories is the "European Organization for Nuclear Research", that is dedicated to particle physics research, from ionizing radiation to positron emission for the improved imaging.
Similarly, the German research center "Institute of Medical Engineering" is dedicated to explore and improve the physics and mathematics desired to improve the medical imaging, especially with the aid of quantum mechanics, the active research domains includes the elementary particles, nonlinear wave dynamics, experimental fluid mechanics and proton-ion therapy.
Thus with the quantum mechanics, the medical imaging can be improved and better images can be collected, for improved modeling and algorithms. It is anticipated that such hybrid algorithms can be devised in the radiology laboratories, such as at centers for the "computerized axial tomography CAT", Fluoroscopy (with upper GI and barium enema), Magnetic-resonance-imaging (MRI), magnetic -resonance-angiography (MRA) and Mammography.
Very recently, another important sub-field of medical imaging has gained attention. The task of preserving the medical images for future reference is really important, and several quantum physics approaches have been proposed in the recent literature for this purpose [5,6]. The research idea was based on a delicate scheme of masking a watermark (reversible) into the image Quantum-Noise. This noise is dominant in radio-graphic images.
Another success of the "Quantum-Physics" linked "imaging approach" is the "Ghost Imaging" [7]. This approach is a topic of debate since two decades. It was also named as the "quantum-spookiness-manifestation". The approach based on single and multiple-photons can provide novel attributes in the field of medical imaging. Details about the ghost imaging, the quantum-secured-imaging and the open problems can be found [7][8][9] and the references therein.
In the next section, the importance of artificial intelligence in the field of imaging and the current research strategy is outlined.

Artificial intelligence and medical imaging
Artificial intelligence, after its success in other disciplines of sciences such as the smart navigators and forecasting tools, has made the medical imaging more swift. Several success stories are available in the literature where the AI tools are used to improve the results of diagnostic as well as the interventional radiology [10,11].
Basically, the artificial intelligence tools can be used to classify the images. These images can be collected from same patient at different times to monitor the cancer size or can be collected from different patients at the same time, to compare the size and other morphological features (benign or malignant).
The images can not be directly classified, i.e. the user can not apply the classification tools from the explainable artificial intelligence techniques, to classify the images, the images need to be pre-processed well, before these are classified, to avoid error and to improve accuracy. Thus, the pre-processing part is the most effective and important part for getting accurate results in classification.

Explanation of XAI
In the field of artificial intelligence and data science, the observations and computational experiments are visualized mostly by the data analysts, Bioinformaticians and machine learning experts. These results when utilized to develop ready to use models for practical use, require clear understanding of the "ins" and "outs" for the machines and the model. Clear understanding is also required by the users from different fields, who aim to use the ready to use model for their statistical analysis.
So the research in the field of artificial intelligence is two fold, the first step is to design an algorithm that is more accurate for complex datasets and for the forecasting and the next step is to make it interpretable [12][13][14].
For this purpose, three types of interpretations are desired, for an AI tool, to successfully implement it in other domains. These three types are classified as (1) the global Interpretation, that deals with the models and the attributes in depth, (2) the cohort interpretation that deals with the level to which the attributes are contributing to predictions, for a specific subset, and the (3) local Interpretation, that deals with the self-explanatory behavior of the algorithm, and helps to justify the success of model's decision for a particular problem.
The explainable artificial intelligence tools have served positively in the field of imaging and specifically in the field of medical imaging, where the complexity of images requires reasonable resolution via smart tools. For example, the segmentation of the images obtained from optical-coherencetomography was improved with the aid of improved XAI approach [15]. Similarly, breast cancer images were analyzed by the research group [16] using the explainable artificial intelligence approach, where case base reasoning approach was utilized.
During this research, the medical images (taken from the Cambridge University data repository) are first preprocessed in a novel manner; the pre-processed images are then analyzed for classification; the classification analysis was inspired from recent XAI classification tool [17][18][19].
In the next section, we will discuss the materials and methods used; next, some important results are discussed and noteworthy conclusions are drawn.

Materials and methods
The computational approaches have always worked productively to unlock the scientific problems [20][21][22][23][24][25][26][27][28][29][30][31][32][33][34]. Since most of the scientific research projects require several computational programs, models and algorithms, the field of computational research is evolving subject to the need in almost all the disciplines, including biophysics, quantum physics and the joint projects of these two fields.
The computational framework adopted during this research is step wise. Thus, a step by step procedure is followed during this research to analyze the breast cancer medical images. The images were taken from the Cambridge data repository, and the images had the resolution of 50-micron in "Portable Gray Map" (PGM) format(See acknowledgement). There were total 322 images. These images include 64 Benign images, 51 Malignant images, and 207 Normal images.
The preliminary steps included the pre-processing of the medical images. Next, the important features and the statistical parameters were extracted; after extraction, the statistical data were again pre-processed to avoid the outliers.

Image pre-processing
Pre-processing of the cancer images is really necessary, especially for the analysis of the breast cancer images, and it is really important that the dicom images be pre-processed for the clear identification of the infected region. Thus, the researcher in the field of medical imaging and oncology has considered this step of pre-processing as one of the pioneering steps.
For example, it was observed by the research group [35] that pre-processing can help to drive useful results and can further improve the accuracy of classification.
Inspired from similar research strategies, during this research, the images were first converted into the black and white image by a command "ig2bw" with the intensity level of 0.1; it replaces the pixels of the image greater than the value as 1(white) and lower than the level to 0(black). After that, the background was eliminated and only the region of interest was studied.
The area, centroid and boundary box measurements of the breast tissue were obtained by "regionprop" command. This procedure actually multiplied the two images, i.e. the original one and the other made by the data, and the resulting image was comprised of the breast and the muscles. Next, the image was cropped by deleting the background Black pixels out from the breast region by detecting empty Rows and Columns and then deleting them.
Next, the Pectoral Muscles were deleted, by selecting that region, the corner left or the corner right pixel of the image (depending on the image), and then, unnecessary region was selected and deleted [36]. This deleting will help to detect the breast cancer in a better way.
After this, the image was enhanced by applying different filters, this helped to reduce the indistinctness and the noise was eliminated as well. For this purpose, the pioneering work of N. Weiner (1942) was utilized as the Wiener Filter. This filter is also termed as the "Minimum-Mean-Square-Error-Filter". The filter provided the linear estimation of the actual image. This task is achieved with the help of the important parameters including the PSNR and the RMSE defined in detail by Jaglan et'al. [37].
The Algorithm of Wiener filter calculates the local mean and variance around each pixel by where η = N by M local Neighborhood of each pixel. This filter changes each pixel by the estimation where v 2 is the noise variance. After using filter-1, filter-2 is applied to enhance the image and to make the picture more clear. The filer-2 is defined as the "Contrast-Limited Adaptive Histogram Equalization(CLAHE)" and this filter improves the low-contrast medical images in a swift manner [38].
CLAHE uses a special method of clipping limit to solve the noise amplification problem. It controls the limit of the histogram made from the image and reconstructs the image to improve the contrast. The CLAHE is a two-parameter Block Size(BS) and Clip Limit(CL) tool. BS & CL are used to control the contraction. If we increase CL, the brightness of the image increases, and if we want to make more contrast in the image, we increase BS because due to this the dynamic range becomes large.
The CLAHE has the following steps: 1. Divide the image into sub-portions that have nothing in common. The total number of sub-images is Q × R and 8 × 8 is the best value to preserve the data in the image. 2. Calculate the histogram of each sub-image according to the gray level present in the array image. 3. Compute the clipping histogram of the sub-images by Cl values.
N avg = (Nr X · NrY )/N gray , where R avg is the average number of pixels R gray is a number of gray levels in the sub-region. RrX and RrY are the number of pixels in X and Y dimensions respectively of the contextual region The CL expression is where R CL is actual CL and R clip is the normalized CL in the region [01]. if number of pixels > R CL then the pixels will be clipped together. The Total number of Clipped is R clip . Average of remaining pixels to distribute each gray level is For histogram clipped we have the following statements if where P region (i)=original histogram of each region at ith gray level and P region clip (i)=Clipped histogram of each region at i-th gray level 4. Redistribution of remain pixels till all remaining pixel are distributed. The step of distribution is given by Step = R gray /R remains where R remains =remaining number of clipped pixels. The value of step is a positive integer that is Step≥1. If the number of pixels in grayscale < R C L , then the program will distribute one pixel to the gray level.

Now intensity value is increased in each region by
Rayleigh transformation. Then, cumulative probability P input (i) is calculated by a clipped histogram which is used in creating the transfer function. Rayleigh distribution makes the image more natural. The Rayleigh forward transformation is where x min is a lower bond of pixel value. β is the scaling parameter of Rayleigh which is defined on each input image. Here β+0.04. The output probability can be calculated Higher the β more the contrast enhancement in image with increase in saturation value and amplification of noise level. 6. The output of the transfer function in the previous equation is descaled using linear contrast stretch that is where r (i)=input value from transfer function where r min =minimum value of transfer function and r max =maximum value of transfer function 7. Boundary artifact is eliminated by calculating the new gray level value of pixels in sub-matrix contextual region.

Extracting parameters
After the pre-processing of the images, the statistical data were extracted from each image including the contrast, entropy and other metrics. Extraction of different features of the images enables us to use it as parameters in the classification part. After the extraction each feature is stored. These features are also used by other researchers to classify the medical images [39,40]. Below, we have provided the definition of some important statistical measures, for the better understanding of the hypothesis of "image-classification".
• Contrast: The agent is used to improve the contrast between the target organ and the surrounding tissue in order to obtain clearer images. The images may then be interpreted more accurately by radiologists, leading to a more accurate diagnosis and better care. Over the whole image, contrast returns a measure of the intensity contrast between a pixel and its neighbor. Range = [0(si ze(G LC M, 1) − 1) 2 ] A constant image has zero contrast. Variance and inertia are other terms for property contrast. Formula is given as follows: • Homogeneity: Homogeneity returns a value that indicates how near the GLCM's element distribution is to the GLCM diagonal. Range = [0 1]. For a diagonal GLCM, homogeneity is 1. Formula is given as follows: • Energy: Energy returns the sum of squared elements in the GLCM. Range = [0 1] Energy is 1 for a constant image. The property energy is also known as uniformity, uniformity of energy, and angular second moment. For a constant image, the relation is NaN. Formula is: • Entropy: The entropy of the information processed in machine learning is a measure of disorder or impurity. It determines how data are split by a decision tree. • Kurtosis: The degree of existence of outliers in distribution is referred to as kurtosis. Kurtosis is a statistical measure that determines whether data in a normal distribution is heavy-tailed or light-tailed.

• Inverse Difference Movement & Histogram Maximum:
This will take the maximum values of the histogram made from the pixels of the images.

Pre-processing of the imaging data
An important step of the current XAI approach is the preprocessing of the dataset(s). It is of great significance since without this step, the data will not be ready for the reliable and accurate classification. Data pre-processing part here includes two important steps: • Standardization: It is well understood that each parameter of the images has different scales, so we make them in one scale in the standardization part [41]. By taking This will scale the data between the range of -1 and 1. • Filling Outliers: The outliers after detection are averaged rather than direct deletion [42]; then, a functional approach is used to delete the outliers after averaging and identification.

Classification
It is not necessary that all the datasets extracted from the medical images will be normally distributed; these datasets based on the complex features are mostly skewed distributed, and the classification tools recommended for such datasets are the RUSboosted algorithms (where the data sampling and the boosting are combined, for the learning of the imbalanced datasets) and or the Gaussian support vector machine learning algorithms, based on the optimal support vector machine learning parameters, with Gaussian kernel adopted. Next, three types of numerical experiments were conducted; during the first experiment, the protected mode was included; with this inclusion, the classification algorithm avoids the over-fitting. This is achieved by partitioning the data into subfolds and the accuracy is estimated for each of such subfold. The results from this experiment are presented in Fig. 2 top. The classifier that produced best results was the fine Gaussian SVM classifier. Next, the holdout-validation which is more ideal for the large datasets was conducted. The results are presented in middle panel of Fig. 2; here, fine Gaussian SVM classifier achieved highest accuracy among other classifiers, of 61%.

Results and discussion
In the last experiment, the protection against the overfitting was removed. This led to better accuracy at the cost of "over-fitting". The results are presented in the bottom panel of Fig. 2. RUS boosted algorithm provided best results. The accuracy was achiever to be 97%.
The same set of experiments was repeated after preprocessing of the statistical data. Figure 3 presents the results from protected mode (Fine Gaussian SVM provided best results with accuracy 61%), holdout-validation mode (Fine Gaussian SVM 58%) and unprotected mode (61.7%).

Conclusions
Based on these numerical results, following important conclusions are drawn: • The data need to be filtered without loosing important metrics/features. • For classification of the numerical data, extracted from the imaging data, most accurate classifier is desired. • The accuracy of the medical image analysis varies, based on the cancer staging. For benign cancer, the solver provided more accurate results. The reason is that the image quality and the cancer region are more distinguishable in case of the benign cancer whereas the images for the malignant cancer were hard to classify due to the complexity, linked with the spread of the cancer, that exist in nature.