Explainable Ensemble-Based Machine Learning Models for Detecting the Presence of Cirrhosis in Hepatitis C Patients

: Hepatitis C is a liver infection caused by a virus, which results in mild to severe inﬂam-mation of the liver. Over many years, hepatitis C gradually damages the liver, often leading to permanent scarring, known as cirrhosis. Patients sometimes have moderate or no symptoms of liver illness for decades before developing cirrhosis. Cirrhosis typically worsens to the point of liver failure. Patients with cirrhosis may also experience brain and nerve system damage, as well as gastrointestinal hemorrhage. Treatment for cirrhosis focuses on preventing further progression of the disease. Detecting cirrhosis earlier is therefore crucial for avoiding complications. Machine learning (ML) has been shown to be effective at providing precise and accurate information for use in diagnosing several diseases. Despite this, no studies have so far used ML to detect cirrhosis in patients with hepatitis C. This study obtained a dataset consisting of 28 attributes of 2038 Egyptian patients from the ML Repository of the University of California at Irvine. Four ML algorithms were trained on the dataset to diagnose cirrhosis in hepatitis C patients: a Random Forest, a Gradient Boosting Machine, an Extreme Gradient Boosting, and an Extra Trees model. The Extra Trees model outperformed the other models achieving an accuracy of 96.92%, a recall of 94.00%, a precision of 99.81%, and an area under the receiver operating characteristic curve of 96% using only 16 of the 28 features.


Introduction
Hepatitis C is a blood-borne virus-related infection triggered by the hepatitis C virus (HCV) that mostly damages the liver. HCV infections are a significant cause of liver diseases, including cirrhosis and hepatocellular carcinoma, making it an important public health concern globally. HCV infections are a prominent cause of liver disease, cirrhosis, and hepatocellular cancer around the globe. As stated by the World Health Organization, 58 million individuals worldwide are infected with hepatitis C at its extreme stages, representing 3% of the global population, and that 1.5 million new cases occur annually. In 2019, almost 290,000 infected patients died from cirrhosis or liver cancer [1]. A 2018 meta-analysis estimated that HCV antibody prevalence in Egypt was 11.9%, making it the country with the highest HCV prevalence worldwide [2]. Therefore, HCV is estimated to be a leading public health dilemma worldwide, which must be addressed with solid program interventions.
Currently, hepatitis C can be cured with direct-acting antiviral (DAA). Nevertheless, many patients still have chronic hepatitis C infection and run the danger of experiencing its associated complications, including cirrhosis and liver cancer. Furthermore, obstacles still stand in the way of attaining widespread access to these medications, particularly in low-and middle-income nations where the hepatitis C epidemic is most severe. According

•
Compare the performance of ensemble learners in diagnosing cirrhosis in hepatitis C patients; • Apply SFS to minimize the number of features required to form the diagnosis; • Utilize XAI techniques to explain the outcomes of the best-performing model; • Utilize XAI techniques to identify the most significant attributes for diagnosing cirrhosis in hepatitis C patients.

Literature Review
Mostafa et al. [16] used supervised ML algorithms, including an artificial neural network (ANN), a support vector machine (SVM), and an RF, for early diagnosis of hepatitis C. The classifiers were trained using an HCV dataset gathered from the (UCI) ML Repository [17]. The authors found that RF was the best-performing model, achieving an accuracy of 98.14%. Despite the promising results, the model could not be generalized to replace expert knowledge to determine diagnostic paths since the data include numerous missing values.
Similarly, Oladimeji et al. [18] proposed ML models automatically classifying hepatitis C using the same UCI dataset. The authors used several classifiers, including decision tree (DT), RF, k-nearest neighbors (KNN), logistic regression (LR), and naive Bayes (NB). After evaluating all five algorithms, the results indicated that RF outperformed other models with a precision-recall curve of 1.00, an F-measure of 0.99, a Matthews correlation coefficient of 0.99, a receiver operating characteristic area under the curve (ROC-AUC) of 0.99, and an accuracy of 98.97%.
Likewise, Safdari et al. [19] used several classification algorithms to categorize individuals with suspected HCV. Six classification algorithms were used, including SVM, Gaussian naive Bayes (GNB), RF, DT, LR, and KNN, and they were trained using the same UCI dataset. After evaluating the six models according to various measures, the authors found that the RF classifier surpassed the others with an accuracy of 97.29%.
Kaunang [20] also attempted to predict HCV using ML approaches on the UCI dataset. The five categories in the original dataset were reduced to two: the blood donor and suspect blood donor categories were combined into a non-hepatitis category, while the hepatitis, fibrosis, and cirrhosis categories were combined into a hepatitis category. The ML algorithms used were KNN, SVM, RF, ANN, NB, and LR. The LR approach surpassed the other algorithms with an accuracy of 97.9%. However, this study required additional analysis because of a data imbalance between the two classes.
Similarly, Li et al. [21] developed an AI-driven model that has the potential to diagnose HCV and detect the disease at an early stage for potential future treatments. By leveraging the UCI dataset, the researchers used a two-stage cascade strategy that combined the RF and LR algorithms. The artificial bee colony algorithm was utilized to establish the ideal threshold needed for filtering and partitioning. The approach was able to predict the probability of HCV incidence across multiple classes, achieving a high level of accuracy 96.19%, precision 96.94%, recall 96.19%, and F1-score 95.92%.
Ghazal et al. [22] presented an effective and efficient method for assisting healthcare professionals in the early detection of HCV using ML algorithms. A Gaussian SVM model was trained using the Egyptian cohort from the UCI repository [23]. The dataset contained 1385 patient entries, each with 29 distinct attributes. The model achieved an accuracy of 97.9%.
Butt et al. [24] proposed an Intelligent Hepatitis C Stage Diagnosis System that uses an ANN to predict the stage of hepatitis C in a patient using the dataset in [23]. Using 70% of the dataset during training and 30% during validation, the proposed system achieved a precision of 98.89% and 94.44%, respectively.
Mamdouh et al. [25] aimed to detect HCV among healthcare staff in Egypt. Two experiments were conducted, one with feature selection and the other without. The features were chosen using SFS. Then, four algorithms, namely, RF, NB, KNN, and LR, were trained in each experiment. The dataset used for this study was developed at Menoufia University based on records obtained from the National Liver Institute. The dataset included 12 different attributes of 859 participants. It was found that using only four features, RF reached the highest accuracy of 94.88%. However, the dataset was limited to Egyptian patients working in risky environments. In addition, the size of the dataset and the features included were not enough to generalize the model to newly infected patients.
Barakat et al. [26] aimed to build an intelligent diagnostic system using ML to predict and assess fibrosis in children affected with chronic hepatitis C. They used a clinical dataset collected from 166 Egyptian children with this condition. The authors used the RF algorithm to predict the type of fibrosis (no fibrosis, mild, or advanced). The system achieved an accuracy of 87.5% and an AUC-ROC of 90.3%. The prediction of mild fibrosis attained an accuracy of 66% and an AUC of 71%. For advanced fibrosis, it achieved an accuracy of 80% and an AUC of 89.4%. However, the dataset had a limited size and suffered from imbalance.
Similarly, Tsvetkov et al. [27] aimed to develop and test an ML model that detects fibrosis in the liver of individuals with chronic hepatitis C using private data collected from routine clinical examinations. The authors examined data on 1240 patients with chronic hepatitis C, of which 686 were males and 554 were females. A total of 689 patient data were used to develop and test the ML model to obtain the liver fibrosis stage level, and only 9 out of the 28 features were considered. The model attained an accuracy of 80.56%, a sensitivity of 66.67%, and a specificity of 94.44%. Although the study did not employ abnormal or unbalanced samples, did not reject data at random, and tested accuracy with two separate test samples, the ML model still needed external validation.
ML algorithms have had a significant impact in aiding healthcare providers to detect HCV at earlier stages [16]. Detecting individuals with early cirrhosis is critical for preventing severe complications. There have been a few studies that attempted to distinguish the cirrhosis stage from the fibrosis stage. However, there is room for improving the results obtained by their models and reducing their computational complexity using feature selection techniques. Additionally, the outputs of these ML models are not easily understood by medical professionals. Accordingly, in this study, XAI techniques were used to ensure that specialists can understand the model's decisions.

Materials and Methods
In this study, Python programming language (version 3.9.12) was utilized. First, pre-processing techniques were conducted before building the models. First, binarization focused on predicting only HCV patients with cirrhosis. Consequently, the cirrhosis class was converted to 1 and all other classes were converted to 0. Next, two outliers were identified using a box-plot graphical representation and the interquartile range (IQR) Computation 2023, 11, 104 5 of 17 method by comparing the lower bound (first quartile) and upper bound (third quartile) of the data. The values below the first quartile and above the third quartile by 1.5 times the IQR were considered outliers and were removed to improve subsequent analyses or modeling accuracy. Thereafter, the data were normalized using the min-max scaler from the scikit-learn library (version 1.1.1). Due to the conversion of the target class, the data imbalance issue has appeared. Therefore, random oversampling was applied using the imblearn library (version 0.9.1). Subsequently, a stratified k-fold cross-validation approach was used to evaluate four ML algorithms, namely, RF, GBM, XGBoost, and ET, using 10-folds. GridSearchCV from the scikit-learn library was used to tune the algorithms' hyperparameters, and SFS from the mlxtend (version 0.21.0) library was used for feature selection. After building the models, their performance was evaluated and compared using four metrics: accuracy, precision, recall, and AUC-ROC. Lastly, XAI techniques were used to explain the outputs of the best-performing model. Figure 1 illustrates the framework of the study. cused on predicting only HCV patients with cirrhosis. Consequently, the cirrhosis cl was converted to 1 and all other classes were converted to 0. Next, two outliers were id tified using a box-plot graphical representation and the interquartile range (IQR) meth by comparing the lower bound (first quartile) and upper bound (third quartile) of the da The values below the first quartile and above the third quartile by 1.5 times the IQR w considered outliers and were removed to improve subsequent analyses or modeling curacy. Thereafter, the data were normalized using the min-max scaler from the sci learn library (version 1.1.1). Due to the conversion of the target class, the data imbalan issue has appeared. Therefore, random oversampling was applied using the imblearn brary (version 0.9.1). Subsequently, a stratified k-fold cross-validation approach was u to evaluate four ML algorithms, namely, RF, GBM, XGBoost, and ET, using 10-fol GridSearchCV from the scikit-learn library was used to tune the algorithms' hyperpara eters, and SFS from the mlxtend (version 0.21.0) library was used for feature selecti After building the models, their performance was evaluated and compared using fo metrics: accuracy, precision, recall, and AUC-ROC. Lastly, XAI techniques were used explain the outputs of the best-performing model. Figure 1 illustrates the framework the study.

Dataset Description and Analysis
The present study is based on the HCV dataset from the UCI ML repository [23]. T dataset includes 29 features of 1385 Egyptian patients, including the target class, who h HCV therapy for approximately 18 months. There are four identifiable stages of hepat C virus (HCV) included in the dataset: portal fibrosis without septa, portal fibrosis wit small number of septa, portal fibrosis with many septa, and cirrhosis.

Dataset Description and Analysis
The present study is based on the HCV dataset from the UCI ML repository [23]. This dataset includes 29 features of 1385 Egyptian patients, including the target class, who had HCV therapy for approximately 18 months. There are four identifiable stages of hepatitis C virus (HCV) included in the dataset: portal fibrosis without septa, portal fibrosis with a small number of septa, portal fibrosis with many septa, and cirrhosis. Figure 2 shows the sample distribution for each category. More details about the dataset are present in [23,28].
Tables 1 and 2 outline the statistical analysis of the numerical and categorical attributes. The tables show that the dataset has a nearly equal distribution of cases for each categorical feature, which may guarantee the model's generalizability utilizing those features. Moreover, some outliers are indicated from the statistical analysis applied to the numerical features, using the IQR method. Tables 1 and 2 outline the statistical analysis of the numerical and categorical attributes. The tables show that the dataset has a nearly equal distribution of cases for each categorical feature, which may guarantee the model's generalizability utilizing those features. Moreover, some outliers are indicated from the statistical analysis applied to the numerical features, using the IQR method.    To detect the presence of cirrhosis in hepatitis C patients, patients with stages 1, 2, and 3 were considered negative (portal fibrosis), whereas patients with stage 4 were considered positive (cirrhosis). The random oversampling technique was used to balance the data. Table 3 shows the data before and after outlier removal and random oversampling.

Description of the Utilized Machine Learning Techniques
Ensemble algorithms involve training multiple models and combining their results. The bagging classifiers combine several independent predictors using weighted averages or majority votes. In contrast, the boosting classifiers are iterative ensemble methods that modify an observation's weight depending on the most recent classification. If an observation was mistakenly classified, it attempts to enhance the weight of that observation. In this study, two bagging (ET and RF), and two boosting (XGBoost and GBM) techniques were used.

Random Forest
The RF classifier was first introduced by Leo Breiman and Adele Cutler [29]. It is a supervised ML algorithm used in classification and regression problems. An RF consists of an ensemble of many distinct decision trees running parallel as a committee [30]. Incorporating such models improves the performance of the RF classifier, making it more effective than models that operate individually [29]. In classification problems, each decision tree selects a class as an output. The final outcome returned by the RF classifier is produced by taking the highest vote among all trees' outputs. The majority voting formula is: where C(x) represents the predicted class and h 1 (x), h 2 (x), . . . , h n (x) are the n classification models of the data sample x [31].

Gradient Boosting Machine
Leo Breiman first introduced GBM in 1998, where adaptive boosting was defined as a gradient descent with a specific loss function. A year later, Jerome Friedman developed GBMs, which generalize boosting algorithms for regression and classification problems [32]. A common framework for GBM typically involves three fundamental components: an optimized loss function, a weak learner that generates predictions, and an additive model that integrates base learners to decrease the loss function and create a prediction model that is both robust and dependable. Boosting techniques differ from standard ML algorithms because optimization is not included in the function space. However, an optimal function F(X) is reached after m iterations [33], where f i (x) (i = 1, . . . , M) represents a feature increment and f i (x) is calculated using where p i is the loss function and g m is the negative gradient for the mth iteration.

Extreme Gradient Boosting
XGBoost is an ML classifier that employs a combination of gradient boosting and ensembling methods, and it is built upon decision trees as its base learners [34]. XGBoost was initially developed in 2016 by Tianqi Chen and Carlos Guestrin as part of a research project at the University of Washington. The boosting strategy employed by XGBoost involves aggregating multiple models to create a group of predictors that work together to enhance the accuracy of predictions, regardless of whether the problem being addressed is related to classification or regression [34]. The prediction outcome generated by XGBoost is the sum of the scores predicted by the individual decision trees [35], where K is the number of trees, f k (x i ) is the score of the kth tree, and F is the set of space functions that include all gradient-boosted trees. XGBoost tackles the issue of overfitting, which can be a considerable concern for ensemble models, by including additional regularization in its objective function. This regularization element punishes the intricacy of the model, enhancing its ability to generalize and decreasing the possibility of overfitting [35]. It is given by: The equation involves the loss function l(y i ,ŷ i ), which quantifies the difference between the target value y i and the predicted valueŷ i , and the regularization term Ω( f k ), which evaluates the complexity of the model.

Extra Trees
The extra trees (ET) algorithm operates by picking a subset of features at random and then using them to train a decision tree. After that, the tree is pruned to include only the Computation 2023, 11, 104 9 of 17 most valuable features for making predictions. ET is a similar algorithm to RF in that it makes a final prediction about which class or category a data point belongs to by using a collection of decision trees. ET differs from RF because it uses the entire original sample rather than sub-sampling and replacing data as RF does. Another distinction is how the nodes are divided. ET chooses random splits, whereas RF always chooses the best possible split. ET and RF are both designed to improve the final output [36]. Decision trees, RF and ET also differ in performance: the variance is high in decision trees, medium in RF, and low in ET [37].

Performance Measures
Four measures assessed classification performance: accuracy, precision, recall, and AUC-ROC. In addition, to assess the performance of the models, a confusion matrix was formed for each model, which evaluates their true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN). TP represents the number of correctly classified patients with cirrhosis-HCV, while FP represents the number of patients incorrectly classified as cirrhosis-HCV. FN is the number of patients incorrectly classified as non-cirrhosis-HCV, and TN is the number of correctly classified non-cirrhosis-HCV patients.
Accuracy is the ratio of accurately classified observations to total observations, Precision is the ratio of accurately classified positive observations to the total number of positively classified observations, Recall is the ratio of accurately predicted positive observations in a class to the total number of observations,

Optimization Strategy
The hyperparameters of an algorithm must be modified in order to generate models that can solve problems optimally. Grid search with stratified 10-fold cross-validation was utilized in this study for this purpose. Grid search is used to define a search space by specifying the hyperparameters and their range of potential values. After defining the hyperparameter grid, the GridsearchCV technique generates every possible combination of the values to identify the optimal set of hyperparameters. To assess the efficacy of each hyperparameter combination, 10-fold cross-validation is conducted to evaluate the performance of the model. Table 4 highlights the optimal hyperparameters for models using the datasets before and after oversampling.

Results and Discussion
This section presents the proposed models' outcomes after applying the GridsearchCV method to the models, using both the original and sampled data. Table 5 assesses the performance of the constructed models through stratified 10-fold cross-validation. The results demonstrate significant differences in the accuracy, precision, and recall before and after applying the random oversampling algorithm. It is important to consider the increase in precision and recall rates, which concentrate on the number of FP and FN, since cirrhosis is a chronic disease that must be detected early to prevent complications. Table 5 shows that the boosting models failed to predict any positive cases without oversampling, attaining a recall and precision rate of 0%. ET produced the best results before random oversampling, with an accuracy of 74.22%, a precision of 45%, and a recall of 1.6%. It also attained the highest accuracy of 96.82% after the random oversampling. Although random oversampling is known to increase the possibility of oversampling, it considerably improved the performance of all models. This method may be useful for ML algorithms that are impacted by skewed distributions and where the model fit may be influenced by several duplicate samples for a particular class. This may include algorithms that iteratively learn coefficients and that seek good splits of data, such as decision trees. In the subsequent experiments, the models trained on the oversampled data were used.

Feature Selection
Feature selection is critical to developing efficient models since it eliminates irrelevant features that might negatively influence their performance. The SFS technique from the mlxtend library was used to reduce the number of features [38]. This algorithm trains a model using the optimal features selected by a particular criterion function. The SFS algorithm selects the feature that enhances the chosen criterion function, which is included in the feature subset in every subsequent forward step. The most suitable feature subset identified by the SFS algorithm is presented in Table 6. As can be seen in this table, the SFS algorithm did not affect the performance of the ensemble boosting models. However, there was a slight reduction in features for the RF model and a significant reduction for the ET model. The accuracy of both classifiers was improved by 0.1% by SFS, despite a considerable difference in the number of removed attributes.

Further Discussion of the Results
After analyzing and applying SFS to all proposed models, it has been shown that ET surpassed all other algorithms, achieving 96.92% accuracy, as shown in Table 7, followed by RF, with 96.58% accuracy. XGBoost achieved the lowest accuracy of 90.99%; however, it achieved the highest recall of 94.81%. ET obtained the highest precision of 99.81%. To further evaluate the results in terms of TP, FP, FN, and TN counts, confusion matrices were constructed. These are displayed in Figure 3.  According to the findings illustrated in Figure 3, XGBoost displayed the lowest count of FN, followed by ET. On the other hand, ET demonstrated the lowest number of FP, with RF ranking second. RF and GBM obtained the highest number of FNs. Meanwhile, XGBoost had the highest FP count of 131. FNs have a significant impact on healthcare, but minimizing the number of FPs is also essential to avoid unnecessary interventions. Misdiagnosis of cirrhosis wastes medicine and time, and it damages patients' mental health. Consequently, to determine the best-performing model, the tradeoff between FPs and FNs should be considered. Overall, ET surpassed other models in detecting cirrhosis in HCV patients.
The study's objective was to evaluate the models' capability to differentiate between patients with cirrhosis and those who did not have the condition, using stratified 10-fold cross-validation to construct an AUC-ROC curve for each model. To assess the ability of the models to differentiate between patients with cirrhosis and without, an AUC-ROC curve was constructed for each model using stratified 10-fold cross-validation. The results, presented in Figure 4, indicate how effectively the proposed models can distinguish between different classes to generate correct predictions. The results indicated that all models obtained an outstanding AUC-ROC of 96%. According to the findings illustrated in Figure 3, XGBoost displayed the lowest count of FN, followed by ET. On the other hand, ET demonstrated the lowest number of FP, with RF ranking second. RF and GBM obtained the highest number of FNs. Meanwhile, XGBoost had the highest FP count of 131. FNs have a significant impact on healthcare, but minimizing the number of FPs is also essential to avoid unnecessary interventions. Misdiagnosis of cirrhosis wastes medicine and time, and it damages patients' mental health. Consequently, to determine the best-performing model, the tradeoff between FPs and FNs should be considered. Overall, ET surpassed other models in detecting cirrhosis in HCV patients.
The study's objective was to evaluate the models' capability to differentiate between patients with cirrhosis and those who did not have the condition, using stratified 10-fold cross-validation to construct an AUC-ROC curve for each model. To assess the ability of the models to differentiate between patients with cirrhosis and without, an AUC-ROC curve was constructed for each model using stratified 10-fold cross-validation. The results, presented in Figure 4, indicate how effectively the proposed models can distinguish between different classes to generate correct predictions. The results indicated that all models obtained an outstanding AUC-ROC of 96%.

Explainable Artificial Intelligence
ML has gained widespread popularity and has been applied to numerous domains and use cases. However, specific measures need to be implemented to ensure that society accepts and trusts ML-powered systems. To build this trust, it is necessary to visualize and explain how ML models make their decisions. XAI can be used to ensure that an algorithm's decision-making process and the data used for training are easily understood by users. This study uses two XAI techniques, SHAP and LIME.

Shapley Additive Explanations
In 2017, Lundberg and Lee introduced the SHAP algorithm, which utilizes an additive feature attribution method with a linear model to calculate the contribution of each attribute to a prediction. Specifically, SHAP presents the Shapley value explanation, which allows for the prediction of an instance. The SHAP explanation approach employs coalitional game theory to calculate the Shapley values. In this method, the feature values of data instances are treated as individual actors participating in a coalition. By treating the feature values of a data instance as actors in the coalition, researchers can use Shapley values to distribute the prediction equally across the features. The explanation is calculated using: This equation involves the explanatory model, g , the coalition vector, z′ ∈ {0,1} , where M represents the maximum coalition size. The feature attribution for a given

Explainable Artificial Intelligence
ML has gained widespread popularity and has been applied to numerous domains and use cases. However, specific measures need to be implemented to ensure that society accepts and trusts ML-powered systems. To build this trust, it is necessary to visualize and explain how ML models make their decisions. XAI can be used to ensure that an algorithm's decision-making process and the data used for training are easily understood by users. This study uses two XAI techniques, SHAP and LIME.

Shapley Additive Explanations
In 2017, Lundberg and Lee introduced the SHAP algorithm, which utilizes an additive feature attribution method with a linear model to calculate the contribution of each attribute to a prediction. Specifically, SHAP presents the Shapley value explanation, which allows for the prediction of an instance. The SHAP explanation approach employs coalitional game theory to calculate the Shapley values. In this method, the feature values of data instances are treated as individual actors participating in a coalition. By treating the feature values of a data instance as actors in the coalition, researchers can use Shapley values to distribute the prediction equally across the features. The explanation is calculated using: This equation involves the explanatory model, g, the coalition vector, z ∈ {0, 1} M , where M represents the maximum coalition size. The feature attribution for a given feature, j, is represented by the Shapley value ∅ j ∈ R. The Shapley values of the ET model are demonstrated in Figure 5, where positive contributions are displayed on the left side and negative contributions are displayed on the right side [39].
Based on the observations made in Figure 5, it is evident that RNA 4, BMI, RNA 12, and AST 1 are the features with the greatest importance. Conversely, the features with the lowest importance are jaundice and diarrhea. It can also be observed that high RNA 4 values have low negative contributions, while the opposite is true for patients with high RNA 12 values. On the other hand, high BMI and AST 1 values have high positive contributions. and AST 1 are the features with the greatest importance. Conversely, the features with the lowest importance are jaundice and diarrhea. It can also be observed that high RNA 4 values have low negative contributions, while the opposite is true for patients with high RNA 12 values. On the other hand, high BMI and AST 1 values have high positive contributions.

Local Interpretable Model-Agnostic Explanations
LIME is a commonly used algorithm that provides the ability to interpret machine learning models by creating a comprehensive explanation for a single prediction. LIME's prediction is based on a simpler interpretable model, such as a linear classifier. In this technique, random perturbation is used to simulate data around an instance, and specific selection techniques are used to determine the importance of certain features. The popularity of LIME and similar local algorithms can be attributed to its ease of use. Still, the generated explanations are unstable because of the random perturbation and feature selection approaches, which can produce many explanations for the same prediction [40]. Figure 6 illustrates the results of LIME for a positive and a negative prediction. Figure 6a explains a positive prediction made by the ET model, in which the positive probability prediction was 77%. It is indicated that the BMI, RNA Base, ALT 48, RNA 4,

Local Interpretable Model-Agnostic Explanations
LIME is a commonly used algorithm that provides the ability to interpret machine learning models by creating a comprehensive explanation for a single prediction. LIME's prediction is based on a simpler interpretable model, such as a linear classifier. In this technique, random perturbation is used to simulate data around an instance, and specific selection techniques are used to determine the importance of certain features. The popularity of LIME and similar local algorithms can be attributed to its ease of use. Still, the generated explanations are unstable because of the random perturbation and feature selection approaches, which can produce many explanations for the same prediction [40]. Figure 6 illustrates the results of LIME for a positive and a negative prediction.

Conclusions and Recommendations
Cirrhosis, caused by extensive liver fibrosis or scarring, is frequently discovered after decompensation when its associated consequences have occurred. The performance of current non-invasive testing for the early detection of advanced liver cirrhosis is poor, with many categories being uncertain. Healthcare professionals can detect the presence of cirrhosis and chronic liver diseases using invasive tests, including liver biopsy. However, ML algorithms can be applied to analyze clinical data to detect the presence of cirrhosis to assist healthcare providers. This study aimed to use ML algorithms to identify cirrhosis in HCV patients. Four algorithms were trained using the Egyptian HCV patient's dataset from UCI, namely, RF, GBM, XGBoost, and ET. The ET classier outperformed the other algorithms using only 16 out of 29 features, with an accuracy of 96.92%, a recall of 94.00%, a precision of 99.81%, and an AUC-ROC of 96%. Although XGBoost achieved a higher recall value, ET had the highest accuracy value. This would result in less clinical testing, possibly contributing to cost savings.
In addition, the use of XAI was implemented in order to guarantee that healthcare experts could comprehend how the algorithm makes decisions and the information utilized to train it. The results of SHAP revealed that the features with the highest importance were RNA 4, BMI, RNA 12, and AST 1. On the other hand, LIME indicated that BMI, RNA Base, ALT 48, RNA 4, and ALT36 contributed to the positive predictions of the model. The results of this study were promising, but more data on patients with cirrhosis are needed to train the model on real data rather than synthetic data. Furthermore, other stages of HCV could be investigated in the future to develop a generalized model that can predict HCV progression earlier.    Figure 6b explains a negative prediction made by the ET model, in which the negative probability prediction was 83%. It can be observed that all features except RNA Base, RNA 12, and RNA 4 contributed to the negative prediction.

Conclusions and Recommendations
Cirrhosis, caused by extensive liver fibrosis or scarring, is frequently discovered after decompensation when its associated consequences have occurred. The performance of current non-invasive testing for the early detection of advanced liver cirrhosis is poor, with many categories being uncertain. Healthcare professionals can detect the presence of cirrhosis and chronic liver diseases using invasive tests, including liver biopsy. However, ML algorithms can be applied to analyze clinical data to detect the presence of cirrhosis to assist healthcare providers. This study aimed to use ML algorithms to identify cirrhosis in HCV patients. Four algorithms were trained using the Egyptian HCV patient's dataset from UCI, namely, RF, GBM, XGBoost, and ET. The ET classier outperformed the other algorithms using only 16 out of 29 features, with an accuracy of 96.92%, a recall of 94.00%, a precision of 99.81%, and an AUC-ROC of 96%. Although XGBoost achieved a higher recall value, ET had the highest accuracy value. This would result in less clinical testing, possibly contributing to cost savings.
In addition, the use of XAI was implemented in order to guarantee that healthcare experts could comprehend how the algorithm makes decisions and the information utilized to train it. The results of SHAP revealed that the features with the highest importance were RNA 4, BMI, RNA 12, and AST 1. On the other hand, LIME indicated that BMI, RNA Base, ALT 48, RNA 4, and ALT36 contributed to the positive predictions of the model. The results of this study were promising, but more data on patients with cirrhosis are needed to train the model on real data rather than synthetic data. Furthermore, other stages of HCV could be investigated in the future to develop a generalized model that can predict HCV progression earlier.