Estimation of Impedance Features and Classiﬁcation of Carcinoma Breast Cancer Using Optimization Techniques

: Breast cancer is the most prevalent form of cancer and the primary cause of cancer-related mortality among women globally. Breast cancer diagnosis involves multiple variables, making it a complex process. Therefore, the accurate estimation of features for diagnosing breast cancer is of great importance. The present study used a dataset of 21 patients with carcinoma breast cancer. Polynomial regression analysis was used to non-invasively estimate six impedance features for the diagnosis of breast cancer, including the phase angle at 500 KHz (PA500), impedance distance between spectral ends (DA), area normalized by DA (A/DA), maximum of the spectrum (Max IP), the distance between impedivity (ohm) at zero frequency and the real part of the maximum frequency point (DR), and length of the spectral curve (P). The results indicated that the polynomial degrees needed to estimate the PA500, DA, A/DA, Max IP, DR, and P features based on tumor size were 2, 2, 3, 3, 2, and 2, respectively. Additionally, we utilized a nonlinear constrained optimization (NCO) analysis to calculate the eight threshold levels for the classiﬁcation of the impedance features. The deduction of eight classiﬁcations for each feature may also be an effective tool for decision-making in breast cancer. These ﬁndings may help oncologists to estimate the impedance features for breast cancer diagnosis non-invasively.


Introduction
Breast cancer is the most common type of cancer and the leading cause of cancerrelated death for women worldwide [1]. In the United States, breast cancer is the second leading cause of cancer death after lung cancer [2]. The most effective way to reduce the mortality rate of breast cancer is through correct and early diagnosis of cancer. Therefore, clarifying any ambiguities in the screening and diagnosis of breast cancer is of great importance. Various factors are involved in the diagnosis and development of breast cancer, including patient characteristics, the presence of proliferative breast lesions with atypia, genetic factors, and lifestyle [3]. These multivariable parameters have a prominent role in the complexities of breast cancer diagnosis. There are various diagnostic tools available to clarify these complexities for accurate decision-making about breast cancer.
Medical imaging is a common diagnostic tool for healthcare professionals. Computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), single-photon emission computed tomography (SPECT), ultrasonography (US), and X-ray mammography (XRM) are the most common imaging techniques used to diagnose breast cancer. The primary method used by physicians to diagnose breast cancer is based on the interpretation of medical images that are qualitative and visual in nature. However, image processing of these images has opened new windows for quantitative diagnosis of breast cancer [4]. There are still many contradictions and conflicts about the accuracy of the diagnosis of breast cancer based on medical images. For example, the XRM method's efficiency in breast cancer diagnosis is only 4-10% [5,6]. Molecular biotechnology examinations are an additional diagnostic tool for breast cancer, utilizing classifications BioMedInformatics 2023, 3 370 at the molecular level. These examinations include a real-time fluorescence quantitative polymerase chain reaction system, acid hybridization system, protein hybridization system, needle biopsy, flow cytometer, and immunohistochemistry. They can work earlier than medical images to diagnose breast cancer [7]. However, concerning the valuable clinical data of medical images, molecular biotechnology examinations can only be an auxiliary method for breast cancer diagnosis. Circulating tumor cells (CTCs) and circulating tumor deoxyribonucleic acid (ctDNA) are the next parameters that can serve as diagnostic biomarkers for early cancer screening and establishing cancer staging [8]. Long noncoding ribonucleic acid (lncRNA) and circular RNAs (circRNAs) are also emerging biomarkers to initiate breast cancer diagnosis. These new biomarkers can be diagnostic tools for breast cancer. However, there are many ambiguities about their actual function and effectiveness for breast cancer diagnosis.

Related Work
Bioinformatics methods, such as machine learning and deep learning, have been shown to be powerful tools for accurately classifying cancer in various types of diseases and have been extensively applied in recent studies. Hrizi et al. demonstrated that the optimized machine learning model outperformed traditional diagnostic methods and has the potential to be used as an effective tool for tuberculosis diagnosis [9]. Ammar et al. showed a hybrid optimal deep learning-based model for tuberculosis disease recognition using MRI images [10]. Yao et al. used a machine learning analysis technique to predict survival in pancreatic cancer patients [11]. The model was trained using gene expression data and clinical features and achieved high accuracy in predicting patient survival. Shan presented a machine-learning approach for predicting lymph node metastasis in patients with early stage cervical cancer [12]. They used a random forest model to improve the performance of the neutrophil-to-lymphocyte ratio.
Given the importance of bioinformatic methods in the diagnosis and management of cancer, several studies have focused on utilizing these methods for breast cancer. Some studies identified crucial genes associated with breast cancer using integrated bioinformatic analysis [13][14][15]. They suggested some novel genes using bioinformatic methods to diagnose breast cancer. Wu et al. also used a machine learning algorithm to classify triple negative and non-triple negative breast cancer types [16]. Omondiagbe et al. used a new reduced feature dataset to support vector machines to classify breast cancer by linear discriminant analysis [17]. Amrane et al. compared the efficiency of Naive Bayes and k-nearest neighbor to find a more accurate classifier for breast cancer [18]. Assiri et al. proposed a novel ensemble classification method for breast cancer using various machine-learning algorithms [19]. They utilized three classifiers and examined five unweighted voting mechanisms. They found that majority-based voting outperformed the others. Islam et al. found that artificial neural networks are the most accurate and machine-learning modeling method for diagnosing breast cancer [19].
In bioinformatics, deep learning has emerged as a powerful tool for the diagnosis of breast cancer in recent years. This approach involves using artificial neural networks with multiple layers to automatically learn and extract relevant features from complex data [20][21][22]. Zhou et al. have provided a comprehensive evaluation of the various methods employed in breast cancer diagnosis through histological image analysis based on different designs of convolutional neural networks (CNNs) [20]. Their findings suggest that CNNs are highly beneficial for the early identification and treatment of breast cancer, resulting in more successful therapy. Jiang et al. strove to evaluate the effectiveness of a deep learning model based on CNN in determining molecular subtypes of breast cancer using US images [22]. The results showed that the CNN model achieved an acceptable accuracy in determining breast cancer molecular subtypes, which is comparable to the accuracy of human radiologists. Allugunti et al. utilized deep learning techniques trained end-to-end to achieve high-accuracy diagnosis and screening the breast cancer [21]. Ghiasi et al. explored the use of deep learning algorithms to classify breast cancer based on uniformity of cell size, bland chromatin, mitoses, and clump thickness [23]. Their results indicate that the proposed methods offer accurate classification and diagnostic performance compared to previous methods. In addition, some studies have focused on biostatistical methods to extract effective features of breast cancer [24]. Terry et al. used concordance statistics to predict breast cancer risk [25]. Despite significant progress in computer-aided diagnosis methods, there are still many unknown points in the diagnosis of breast cancer using these methods [26][27][28]. Hence, breast cancer diagnosis using these methods is still controversial and challenging.

Aims and Objectives
Many researchers have attempted to improve bioinformatic and biostatistical methods to suggest more accurate biomarkers. This study aims to develop diagnostic tools for the management of breast cancer. The approach involves using polynomial regression analysis to estimate six impedance features for diagnosis and nonlinear constrained optimization (NCO) to determine the threshold level of each feature.

Materials and Methods
We utilized the database of a study conducted by Thirumalai et al., which includes 21 patients with breast cancer (Table 1) [29]. It should be noted that the present database belongs to the carcinoma category, which is one of the most common categories, as cancer develops from the epithelial cell lining [22]. We employed polynomial regression analysis to estimate six impedance features for the diagnosis of breast cancer without minimal-invasive electrical impedance spectroscopy. In addition, we tried to dedicate eight classes for breast cancer using the NCO method. Furthermore, using the NCO method, we attempted to classify breast cancer into eight categories. There are three primary stages include preprocessing, regression, and optimization. Figure 1 shows the workflow diagram of the system, illustrating the various components and their relationships. In the stage of preprocessing, the normality test was conducted for the 21 observations by the Shapiro-Wilk test. The result of this test confirmed the normal distribution of input data. In the model selection step, we examined many types of fitting equations, such as exponential, logarithmic, polynomial, and power. The results showed that the polynomial estimation made the best fitting with regard to R 2 values. This procedure is common in previous biological studies [30][31][32].
to estimate six impedance features for the diagnosis of breast cancer without minimalinvasive electrical impedance spectroscopy. In addition, we tried to dedicate eight classes for breast cancer using the NCO method. Furthermore, using the NCO method, we attempted to classify breast cancer into eight categories.
There are three primary stages include preprocessing, regression, and optimization. Figure 1 shows the workflow diagram of the system, illustrating the various components and their relationships. In the stage of preprocessing, the normality test was conducted for the 21 observations by the Shapiro-Wilk test. The result of this test confirmed the normal distribution of input data. In the model selection step, we examined many types of fitting equations, such as exponential, logarithmic, polynomial, and power. The results showed that the polynomial estimation made the best fitting with regard to R 2 values. This procedure is common in previous biological studies [30][31][32].

Polynomial Regression Analysis
Polynomial regression is a specific form of the regression model which explain one variable variation based on another variable. In this regression, the relationship between the independent variable (tumor size = x), and the dependent variable (impedance features = ( )) is curvilinear. The general representation of the model is shown as [33]:

Polynomial Regression Analysis
Polynomial regression is a specific form of the regression model which explain one variable variation based on another variable. In this regression, the relationship between the independent variable (tumor size = x), and the dependent variable (impedance features = f (x)) is curvilinear. The general representation of the model is shown as [33]: where n is the degree of a polynomial function, a i (i = 1 . . . n) is coefficients of the polynomial terms, and ε is the residual error which is the average distance of the data from the regression curve.
The determination of the model degree can be completed by examining the relationship using a scatter plot and testing different degrees of the polynomial until the best fit is achieved. It is worth mentioning that polynomial regression is sensitive to outliers. Preprocessing of data is essential to ensure that the data used in the analysis is accurate, reliable, and properly prepared for modeling. The most important step of regression analysis is model validation. The model is validated by evaluating its performance using various metrics, such as root mean squared error (RMSE) as follows [33]: (2) wheref j and f j represent the estimated and actual values, respectively, and m denotes the number of data points. All of the calculations for regression analysis were performed using IBM SPSS software, version 20.0, IBM Corp., Armonk, NY, USA.

Nonlinear Constrained Optimization
NCO is a mathematical technique used to find the optimal values of a set of decision variables subject to a set of nonlinear constraints. Nonlinear constraint optimization can be mathematically formulated as [34]: where x is the tumor size that is a decision variable; x = (x 1 , x 2 , . . . , x n ). f (x) is the objective function; f : R n → R . The objective function represents one of six estimated impedance features that should be optimized. c(x) represents a vector of constraints that x must satisfy, in which c: R n → R m . n and m are the numbers of decision variables and the number of constraints.
The present study used the generalized reduced gradient (GRG) method to solve NOC. GRG is commonly used for problems with continuous variables and smooth nonlinear functions [35]. This method can be applied to problems that are more general than Equation (3). An appropriate form for this problem is modeled in the following: where l x and u x are lower and upper bounds of the decision variable (tumor size). l c , and u c are the lower and upper bounds of optimized objective functions. The minimization problem can be converted to a maximization problem by multiplying −1 in the objective function. The basic idea of the GRG algorithm is to iteratively solve a series of linear programming problems that approximate the original nonlinear problem while updating the values of the decision variables in a way that reduces the objective function and satisfies the constraints.
In the GRG algorithm, convergence testing is critical to ensure that the algorithm has found the optimal solution [35]. Common convergence tests used in the GRG algorithm include the objective function value, constraint satisfaction, reduced gradient norm, step size, and change in decision variables. The convergence test on the objective functions (six impedance features) is conducted with a precision of 0.0001.

Results and Discussion
Many effective parameters are involved in the diagnosis of breast cancer. Tumor size is one of the morphometric parameters which is available in medical images. Many software can measure tumor size easily, such as Mimics, 3D Slicer, ImageJ, OsiriX, and MIPAV. In addition, there are also other effective and important indicators that are necessary for oncologists in decision-making about breast cancer. For example, the impedance distance between spectral ends (DA) and area normalized by DA (A/DA) are two important features to classify non-fatty cancer tissues [36]. However, we can only measure these features using electrical impedance spectroscopy. We aimed to calculate six impedance features that can evaluate the capacitive characteristics of breast cancer tissues [37]. These six impedance features are highly important clinically to the diagnosis of breast cancer in the early stages [38]. These features include phase angle at 500 KHz (PA500), DA, A/DA, maximum of the spectrum (Max IP), distance between impedivity (ohm) at zero frequency and real part of the maximum frequency point (DR), and length of the spectral curve (P). Some studies raised pieces of evidence about the risks of electrical impedance spectroscopy for the health condition of patients [39]. Hence, the present study strove to use polynomial regression analysis to non-invasively estimate the values of these six impedance features without the electrical impedance spectroscopy method. We estimate these impedance features based on the tumor size that is available data and physicians can measure this size by common imaging methods. Regression analysis is a powerful statistical tool to estimate necessary parameters for decision-making. The results of regression analysis are shown in Figure 2. The present study used polynomial regression analysis to find the relationship between these features and tumor size based on the reported data for our patients (see Table 1). The results of Table 2 showed the estimated equations of PA500, DA, A/DA, Max IP, DR, and P. It should be noted that R-values for all estimations are greater than 0.51.
There are many classifications for breast cancer, such as cancer stages (from 0-IV), and type of tumor (benign and malignant). Estrela et al. suggested a threshold level to classify impedance ratio (IO), one of the other impedance features [36]. One of the important indicators is extracted from the results of a function that is defined based on the proportions of metastases based on the time after treatment [40]. This indicator is a common quantitative factor that dedicates eight classes for breast cancer. Koscielny et al. revealed that there is a correlation between clinical volume and the percentage of metastases diagnosed at the time of initial diagnosis or during later stages of the dis-ease [40]. Additionally, their results showed that there is a shorter median delay between initial treatment and metastases appearance for larger tumors. Their finding suggested eight classes include 1 ≤ size ≤ 2.5 (class 1), 2.5 < size ≤ 3.5 (class 2), 3.5 < size ≤ 4.5 (class 3), 4.5 < size ≤ 5.5 (class 4), 5.5 < size ≤ 6.5 (class 5), 6.5 < size ≤ 7.5 (class 6), 7.5 < size ≤ 8.5 (class 7), and size > 8.5 (class 8). However, the corresponding classifications based on six impedance features (PA500, DA, A/DA, Max IP, DR, and P) are not available. This means that oncologists cannot dedicate a class to these six impedance features. This study used the NCO method to define the corresponding eight classes for the impedance features based on the results of a study by Koscielny et al. [40]. The results of Table 3 indicate the threshold levels of PA500, DA, A/DA, Max IP, DR, and P to classify breast cancer based on the eight classes. It should be noted that minimum and maximum optimization results are defined as lower and upper bounds of the threshold level for each class. There is no comprehensive quantitative guideline in previous studies for clinical applications of the impedance features. For example, there is no quantitative guideline to evaluate the non-fatty level of breast cancer using DA and A/DA features. The results of Table 3 can classify the non-fatty level of breast cancer quantitatively using the defined threshold levels for DA and A/DA features. This approach is also extendable for all impedance features. 2023, 11, x FOR PEER REVIEW 6 of 9  There are many classifications for breast cancer, such as cancer stages (from 0-IV), and type of tumor (benign and malignant). Estrela et al. suggested a threshold level to classify impedance ratio (IO), one of the other impedance features [36]. One of the important indicators is extracted from the results of a function that is defined based on the proportions of metastases based on the time after treatment [40]. This indicator is a common quantitative factor that dedicates eight classes for breast cancer. Koscielny et al.    (4)).
Breast cancer is a complex disease, and its diagnosis depends on various features. The present study focused on impedance features that are common to screen and diagnosing breast cancer. However, for comprehensive diagnosis of breast cancer, it is recommended for future studies to expand our findings by considering the interactional effects of the clinical, cancer morphometrics, and molecular biotechnology features. Currently, the major effect of big data is approved in improving the accuracy and precision of predicted results using increasing the volume, velocity, and variety of data [41]. One potential limitation of the present study is the possibility of biases being introduced due to the small size of the dataset used. To mitigate this, future studies may consider using a larger dataset to improve the accuracy of the estimated equations presented in Table 2. Another limitation of the present study is the challenge of interpreting the results generated by the polynomial regression models. To address this, future studies could explore the use of alternative regression models, such as logistic regression. Additionally, to enhance the quality of estimations and classifications, future studies may consider utilizing feature selection algorithms to identify the most important features in the dataset. By doing so, the predictive power of the models can be improved, and the insights gained may be more informative and actionable. Taken together, this study has established a basis to open a window in the non-invasive measurement of impedance features. Future studies can explore the potential of using a machine learning algorithm with a larger database based on our insight to further enhance estimation and classification accuracy.

Conclusions
The present study utilized a database of women with carcinoma breast cancer to estimate six impedance features for breast cancer diagnosis. Polynomial regression analysis was used to estimate these features. Additionally, NCO analysis was performed to compute the threshold values of each impedance feature, which were utilized to establish eight classifications of breast cancer for every feature. These classifications serve as effective tools for decision-making in breast cancer diagnosis. This finding may provide oncologists with valuable data to help estimate and classify the effective features for breast cancer diagnosis.
Future studies can leverage larger databases and machine learning algorithms to improve estimation and classification accuracy.