Accurate diagnosis of liver diseases through the application of deep convolutional neural network on biopsy images

: Accurate detection of non-alcoholic fatty liver disease (NAFLD) through biopsies is challenging. Manual detection of the disease is not only prone to human error but is also time-consuming. Using artificial intelligence and deep learning, we have successfully demonstrated the issues of the manual detection of liver diseases with a high degree of precision. This article uses various neural network-based techniques to assess non-alcoholic fatty liver disease. In this investigation, more than five thousand biopsy images were employed alongside the latest versions of the algorithms. To detect prominent characteristics in the liver from a collection of Biopsy pictures, we employed the YOLOv3, Faster R-CNN, YOLOv4, YOLOv5, YOLOv6, YOLOv7, YOLOv8, and SSD models. A highlighting point of this paper is comparing the state-of-the-art Instance Segmentation models, including Mask R-CNN, U-Net, YOLOv5 Instance Segmentation, YOLOv7 Instance Segmentation, and YOLOv8 Instance Segmentation. The extent of severity of NAFLD and non-alcoholic steatohepatitis was examined for liver cell ballooning, steatosis, lobular, and periportal inflammation, and fibrosis. Metrics used to evaluate the algorithms’ effectiveness include accuracy, precision, specificity, and recall. Improved metrics are achieved by optimizing the hyperparameters of the associated models. Additionally, the liver is scored in order to analyse the information gleaned from biopsy images. Statistical analyses are performed to establish the statistical relevance in evaluating the score for different zones.


Introduction
Non-alcoholic fatty liver disease (NAFLD) is a worldwide issue that affects more than 10% of people globally [1].The incidence of NAFLD exceeds 30% in industrialized nations, where corpulence and its accompanying illnesses of diabetes and metabolic disorders are prominent.In both adults and children with fatty liver disease, a liver biopsy examination remains an essential tool in clinical practice and scientific research.The ever-increasing complexity and number of complaints related to liver diseases pose a new set of challenges in efficiently detecting such diseases.NAFLD is characterized by an accumulation of excess adipose tissue in the hepatic region that is not induced by alcohol.NAFLD is a liver ailment marked by steatosis, which is one of the most frequent liver abnormalities in obese people.A fatty liver is characterized as one in which fat constitutes greater than 5% to 10% of the liver [2].A liver biopsy is the benchmark process to quantify hepatic steatosis.A biopsy is an invasive technique (i.e., surgery) that has a significant chance of catastrophic consequences, including discomfort, internal bleeding, infection, and organ harm.The purpose of a liver biopsy is to provide crucial information for patient treatment, clinical trials, and continuing research on specific liver illnesses [3].Though there are no symptoms associated with NAFLD, alcoholic fatty liver disease (AFLD), nonalcoholic steatohepatitis (NASH), and acute fatty liver of pregnancy (AFLP) are distinct forms of fatty liver diseases [3,4].Among these, NAFLD encloses a broad clinical spectrum, spanning from bland steatosis to NASH, that might advance to liver cirrhosis and hepatic carcinoma (HCC).If NAFLD advances to cirrhosis, then fluid retention, internal bleeding, and loss of healthy liver function may occur.
Alcohol retention prevents the liver from properly metabolizing fat, which results in AFLD.NASH develops when there is an abundance of liver fat and hepatic inflammation.It causes inflammation of the liver, which can ultimately result in cirrhosis and liver failure.Hepatitis B and C are two liver disorders brought on by viral infections [5].There are two types of NAFLD detection methods: invasive and noninvasive.NAFLD is frequently detected using invasive techniques such as biopsies as well as non-invasive techniques such as ionizing radiation, computerized tomography (CT), magnetic resonance imaging (MRI), ultrasonography (USG), and liver enzyme tests.To find abdominal bleeding or injury, CT scans are used.This painless, non-invasive method of identifying internal damage may help save the lives of patients [6].
Due to the large sample size, the current procedures for the pathological diagnosis of the hepatic tissue are unable efficient, cost effective, and fast paced, which in turn demands new technology or methods to be imparted for the detection of liver diseases.Efficient liver disease diagnostic techniques using artificial intelligence (AI) is becoming indispensable.Recently, AI and deep learning have found tremendous applications in medical imaging.Sethunath et al. [7] executed a supervised machine learning algorithm to detect different areas in mouse liver biopsy pictures.Owjimehr et al. [8] proposed wavelet packet transforms to identify liver illness in ultrasound pictures.Additionally, computer-aided design (CAD) or deep learning algorithms have been utilized to diagnose NAFLD patients using ultrasound pictures [9][10][11].Automated segmentation of the carotid intima-media thickness in ultrasound images was performed using a fast fuzzy c-mean clustering technique [12].Tsiplakidou et al. [3] introduced a thresholding approach in which the fatty sections of images are monitored and assessed depending on the eccentricity of the area, whereas Liquori et al. [13] demonstrated the recognition of fat zones based on color homogeneity and circular shapes.
To identify hepatic steatosis, certain powerful machine learning methods have been applied.
Along with the aforementioned studies, Tang et al. [22] executed Faster R-CNN and DeepLab for autonomous liver segmentation.Guo et al. [23] employed a deep learning method for hepatic steatosis segmentation to predict steatosis using boundary boxes and classification probability.From medical imaging, AI has always been a key component in illness detection [24,25].Utilizing Mask R-CNN, Podder et al. [26] identified COVID-19 using chest X-ray images with a great degree of accuracy and specificity.Applications of the You Only Look Once (YOLO) algorithm include skin lesion segmentation [27], identification of blood cells from human blood smears [28,29], liver detection [30], and cholelithiasis and gallstone categorization in CT images [30].Therefore, as far as NAFLD is concerned, the present status of the application of AI and deep learning has not been robustly explored; moreover, we have attempted various techniques of AI and deep learning on biopsy images of the liver.This research presents a methodology for accurately diagnosing hepatic steatosis that makes use of SSD (single shot multibox detector) [31], Faster R-CNN [32], and YOLO [33].The application of these techniques is compared for a more effective and practical use.The proposed networks are not specific to solely liver biopsy images.Instead, these networks can also be applied to all kinds of microscopic images from slides.
Semantic segmentation is a deep learning technique that labels or categorizes each pixel in an image.Semantic segmentation, also referred to as image segmentation, is the technique of gathering areas of an image which correspond to the identical object class [34].The use of semantic segmentation is found in medical imaging and diagnostics, self-driving cars, and facial recognition, and is a sophisticated technique for segmenting images that deals with finding instances of objects and identifying their boundaries with an image.In the case of segmentation, each object of interest that occurs in an image is both recognized and separated.Instance segmentation is crucial for autonomous vehicles, medical imaging, disease detection from microfluidic devices [35], and satellite photography.Instance segmentation is supported by the U-Net [36], Mask R-CNN [37], and numerous members of the YOLO family.
A deep convolutional network known as Faster R-CNN contains two stages and uses an end-toend network with high accuracy.The network is capable of predicting the positions of myriad quickly and precisely.Each predictor improves the overall recall while predicting the particular object size, aspect ratio, or category, resulting in an overall improvement in recall.Along with object detection, Faster R-CNN utilizes a region proposal network to produce object proposals.
YOLO utilizes a single neural network to perform detection across the entire image.The network splits the picture into regions and forecasts the probability and bounding boxes for every region [38].The forecast, which has the maximum IoU (Intersection over Union) with the ground truth, allocates a predictor for each prediction of an object.By employing SSD, we can recognize multiple items inside an image with just one shot, as opposed to RPN-based systems such as the R-CNN series, which require two shots-one for making region suggestions and for recognizing the item of each proposal.It uses multi-scale features and default boxes for higher efficiency.Identifying things simply involves predicting their class and placement within a given area.

Materials and methods
An overview of the proposed methods and the biopsy dataset required for fatty liver disease detection is described below.

Training data preparation
The dataset contains 21,435 images of liver biopsies.It is an open-source collection of data and is accessible in the Open Science Framework (https://osf.io/p48rd/)[39].These anatomical images were captured on a Zeiss AxioScan Z1 scanner (Carl Zeiss, Jena, Germany), which is a high-definition color camera containing a 20× objective for bright field microscopy illumination and images with a pixel resolution of 0.22 m/px [40].The images captured were 897 × 897 px 2 in a BiggTIFF format.The sections of the liver were obtained from mice of various ages.The obtained sample was from an axial slice in the center of the liver lobe, which was a 3 µm thin section.It included both healthy and NAFLD/NASH liver.Additionally, Masson's trichrome staining was performed on the slides.
From the dataset, as large as 5,348 biopsy images were chosen for use in training, testing, and validating the Faster R-CNN, SSD, and YOLO algorithms.For this experiment, 4,000 images were taken for training, whereas 800 images were utilized for testing and 548 images were utilized for validation.The biopsy image dataset contained four classes, with 1,000 images each for the training dataset: ballooning, steatosis, inflammation, and fibrosis.On a single graphics processing unit (GPU), tests were conducted (16 GB RAM, NVIDIA GeForce RTX 3080 Ti) for the dataset.These images were annotated by skilled pathologists [41].Data preprocessing included resizing the images to 299×299 px 2 for evaluation.Augmentations of the dataset were performed to boost its size and enhance training effectiveness.In this study, different augmentation techniques such as rotation, flipping, resizing, and gaussian blurring were considered.

Faster R-CNN
Faster R-CNN is an algorithm used in object recognition that predicts the object's location utilizing the RPN (region proposal network).Fast R-CNN utilizes only region of interest (ROI) pooling, which consumes more time as compared to Faster R-CNN (which utilizes RPNs), and thus directly produces region proposals.RPN has been used in Faster R-CNN to supplement the selective search method employed by Fast R-CNN.VGG-16 was used to acquire an accurate valuation of an image.By its pooling layer, the ROI generates a feature map of uniform size [42].The method was tested using the PASCAL VOC 2007 dataset.
The ROI pooling layer receives bounding boxes of numerous forms and dimensions.For each anchor, the ROI pooling layer extracts fixed size feature maps.A fully connected layer with a Softmax activation function and a linear regression layer receives the feature maps.Finally, it separates fatty liver cells and forecasts the bounding boxes for the cells that have been found.The classification and bounding box regression losses are combined in the following multi-task loss function:

SSD
SSD is another useful technique to identify steatosis in the liver.SSD is a form of convolutional neural network dependent on the feed-forward convolutional neural network, in which the nodes do not form a loop, which creates fixed-sized bounding boxes and a score for the liver tissues to be recognized within those boxes.To obtain the final detections, the non-max suppression step is followed.In contrast to CNN, SSD separately detects objects using multi-scale feature maps.
The base network, the additional feature extraction layer, and the prediction layer make up the SSD architecture.The initial layer of any conventional image categorization within the neural network is the base network.The feature maps are derived utilizing VGG-16.
At the conclusion, the convolutional layers take the place of completely linked layers.SSD generates anchor boxes and predicts their categories and offsets using feature maps, which are fundamentally based on many scales, as shown in Figure 2. The loss function consolidates the classification loss and bounding box regression loss [43]: The classification loss is calculated using the subsequent equation: where and N represents all of the prediction-matched boxes.
The regression loss is shown by the following equation: where

YOLO
You Only Look Once, also known as YOLO [44], is a relatively recent strategy that relies on regression.For the entire image under study, YOLO is used to forecast classes and boundary boxes in only one algorithm run.Its most common application is real-time object detection.YOLOv5, YOLOv7, and YOLOv8 all include instance segmentation as an extra feature with a rather high mAP (mean average precision) for each model.
A new era in object identification and segmentation began with the introduction of anchor boxes for YOLOv2 in 2017.On top of the current YOLO model, several enhancements were made.Its successor, YOLOv3, which generated predictions at three different granularity levels, was created in 2018.The newer YOLO models focused on advances such as feature aggregations and architectural improvements enabled by PyTorch in YOLOv4 and YOLOv5, respectively.Other notable mentions in this category of algorithms with performance enhancements include PP-YOLO, Scaled YOLOv4, and PP-YOLOv2.In accordance with modifications in its architecture, the YOLOv6 algorithm also included a decoupled head, which has proven to increase its performance.YOLOv7 has a shorter gradient in the back propagation layers, thereby increasing the efficiency of the algorithm.Currently, YOLOv8 is among the most reliable algorithms in the world of computer vision, alongside an association of a tracking component.The image is initially scaled to 224×224 pixels.Then, the picture is divided into 7 × 7 grid cells, each of which is responsible for estimating bounding boxes.Non-max suppression removes the bounding boxes that has the maximum common area and the boxes with a low likelihood of containing the classes.The anchor box allows the YOLO algorithm to identify several objects that are centered in a single grid cell [39,45].The method employs a single neural network for detection.The architecture of YOLO is illustrated in Figure 3.
At first, the model was tested on the PASCAL VOC detection dataset.The network design includes 24 convolutional layers and two fully connected layers for prediction.We have only one class for identifying hepatic steatosis.For the anchor box, the bounding box parameters and prediction probability are determined.As a result of the presence of five anchor boxes, it is likelihood that the object will be present in the grid cell pc; and therefore, the object's central coordinates (x, y) corresponding to the cell's top left corner of the predicted class, as well as the length and width of the rectangle's enclosing box can be calculated.The bounding box represents the newly identified fatty hepatic tissues.Consequently, if   is the number of filters in the final convolution layer,   is the number of anchor boxes and   is the number of classes, which is summarized in the following equation: The YOLO algorithm, which is used to automatically diagnose fatty liver, uses an acceptable threshold.The typical absolute difference between our guess and the underlying data at different thresholds levels is used to compute the threshold value.The final generated prediction boxes on the photos tally the number of fatty liver tissues in the output.The settings define the boundary boundaries that surround each uncovered tissue.
The total loss consists of classification, localization, and confidence losses combined.The total loss function calculated in YOLO algorithm is given by the following: where either   = 1 specifies a detection comes into view inside the cell , or else its 0.    = 1 shows the th boundary box inside cell. is the cause of the fatty liver tissues being detected, or if not, its 0 [46].  () demonstrates the conditional class likelihood of class c in cell I [47].  indicates the weights for the boundary box coordinate losses. ̂ is the box confidence score of the box  in cell .The loss is reduced when the background is identified using   .For this research,   = 5 and   = 0.5 was considered.

Instance segmentation
In this paper, the instance segmentation algorithms explored for the diagnosis of liver conditions include Mask R-CNN, U-Net, YOLOv5 Instance Segmentation, YOLOv7 Instance Segmentation, and YOLOv8 Instance Segmentation.The family of YOLO algorithm frameworks has shown consistent improvement, not just in image recognition, but also in instance segmentation.Within the duration of a couple of years, YOLO models have gained significant acclaim, not just from the community of people working in the field of computer vision, but also from associations of medical science because of their high accuracy of detection and faster processing of videos.The primary importance of using a YOLO model is its small size, which enables them to deploy in resource-constrained parallel computing edge devices while still allowing faster inference speeds.
The aforementioned algorithms work both on images and videos.Therefore, they may also find applications in various AI-aided medical imaging applications in X-ray, Ultrasound, CT, MRI, positron emission tomography (PET), single photon emission computed tomography (SPECT), and video applications in surgical endoscopy and capsule endoscopy [48].The Mask R-CNN architecture has an extra layer for the prediction of the segmentation on top of the layers in Faster R-CNN.Thus, bounding boxes are generated along with masks for the ROIs.It consists of several layers including the convolutional layer, pooling layer, and fully connected layer.U-Net is one of the most popular algorithms for instance segmentation, and the derived characteristics have grown more abstract as the neural networks have grown even deeper.U-Net consists of several up-sampling and down-sampling steps.

Scoring of liver
A grade is the global measures of liver cells and inflammatory response due to injury which shows potentially changeable characteristics.The stage is an evaluation of the position of fibrosis [49] and constructive alteration; therefore, it is practically reversible.The grade describes the quantity [50], whereas the stage does not.The stage only provides information on the parenchymal location of collagen and matrix buildup, as well as modifications to the vascular/constructive system.Compared to staging, morphological measurement of fibrosis in hepatic disorders necesstitates a particular strategy that yields prominent but more contrasting information.There are three types of grades in scoring of liver: mild, moderate and severe [51,52].However, scoring cannot be applied to stages due to the inclusion of fibrosis location and constructive modifications when present, such as in cirrhosis.
Scoring can be performed after Hematoxylin and Eosin or Masson's trichrome staining for the evaluation of several biopsies from patients in clinical trials.The histological features observed in the human liver are steatosis, lobular inflammation, ballooning, periportal inflammation, and fibrosis.The unweighted total NAFLD Activity Score (NAS) is independently calculated for each lesion.The value of NAS spans from 0−8.It comprises of steatosis (0-3), lobular inflammation (0-3), and hepatocyte ballooning (0-2) [53].Ballooning injury & steatosis mainly contribute to inflammation & fibrosis in the NAFLD score framework [39].Further morphological structures include acidophil bodies, Mallory-Denk bodies and the zonal location of steatosis.The staging of fibrosis progresses from none to portal or periportal.It might advance to bridging fibrosis and consequently cirrhosis in a linear manner.In NASH and NAFLD, the fibrosis progresses from none to perisinusoidal.Then, it advances to periportal or bridging.Bridging fibrosis might lead to cirrhosis of the liver [54].At the moment, the most popular grading scale is the NAS.
The distinct types of scoring frameworks are the Brunt system, NASH-CRN system & SAF system.Brunt et al. [50] divided the micro-inflammatory grades of NASH into grades: 1,2, and 3 [55].Overall, they suggested a fibrosis severity and location-based rating system: zone 3 of stage 1 perisinusoidal fibrosis; stage 2 portal fibrosis along with the aforementioned stage 1; stage 3 bridging fibrosis in addition to stage 2; and stage 4 is cirrhosis.Stage 1, zone 3 is divided into the subcategories 1A, 1B, and 1C, which correspond to mild, moderate, and only portal/periportal, respectively [56].Sorely obese patients and pediatric patients sporadically manifest fibrosis [57].The NASH Clinical Research Network (NASH-CRN) formulated the NAS for clinical exploration.The primary objective of the NAS is to access the etiological changes in the patient's liver with time.Recent work has utilized the criterion value of NAS, particularly NAS ≤ 5, as a substitute for the cytological determination of NASH.The SAF activity score is used to calculate hepatocyte ballooning and lobular inflammation, and a score of ≥ 3 indicates either bridging fibrosis or cirrhosis.

Statistical analysis
The variables are continuous and is evidenced by mean and standard deviation.In order to predict quantitative data, percentages are used with the numbers.A paired sample T-Test is used to contrast the normally disctributed continuous variables.A one-way analysis of variance (UNIANOVA) was executed for the NAFLD diagnosis of different classes [58].Weighted Kappa scores can be of two types: Inter-Rater & Intra-Rater.The intraclass correlation coefficient was obtained from the component of the variance model.The histological characteristics obtained from the diagnosis of steatohepatitis can be performed using the Chi-square test.A Chi-Square Test is an examination for autonomy; it indicates if there is a link but does not indicate the strength of the association.We measured the effect size using Cramer's V. Fisher's exact test and the Chi-Square test were performed using a Yates' correction test for the data.The p-values can be obtained from Mantel Haenszel χ2 test for satisfying 2 × 2 tables.The IBM SPSS Version 27 and Graph Pad Prism software were used for statistical evaluation [55].
In statistics, the intraclass correlation coefficient is a summarization of data that can be applied when numerical quantifications are made on units that are assembled into groups.It expresses the units in the same group that are highly similar to each other.The inter-rater reliability determines the degree of consistency or reliability in a process.The inter-rater reliability is also estimated with Cohen's Kappa.Cohen's Kappa allows us to evaluate the inter-rater reliability when we have nominal or ordinal variables.We want to determine the -inter-rater reliability between these two classes.The Kruskal-Wallis Test is a one-way ANOVA's non-parametric counterpart.The dependent variable must be continuous, observations must be independent, there must be no notable outliers, homogeneity of variance, and an independent variable must exist with two or more categorical groups.Additionally, the dependent variable must have a distribution that is roughly normal at each level of the independent variable.Now, the assumptions for the Kruskal-Wallis test are slightly different.The data points must be independent of one another, there must be five data points in each sample, participants must be chosen at random from the population, and the sample size must be roughly equal.Both the normality of the distribution and the equality of the variances are not requirements.
For the Wilcoxon W, if the asymptotic significance value is 0.05 or less, then there is a significant difference between the two scores.The Z-score is a numerical computation indicating a value's linkage to the mean of a batch of values.Determination of the Z-score is performed with regard to standard deviations from the average.If the Z-score is zero, it implies that the value of the data element is similar to the average value.Multivariate associations with the identification of steatohepatitis were evaluated utilizing multinomial logistic regression models, which can produce an odds ratio with 95% confidence intervals.Additionally, these regression models were utilized to calculate the p-values.After performing a Bonferroni correction for numerous comparisons, a p-value of 0.025 was regarded to be important.

Results
Although the liver biopsy is the gold standard for detecting liver steatosis, it is insufficient in determining the disease's frequency in a given population [59].Steatosis refers to the buildup of triglycerides as macromolecules within the cytoplasm of hepatocytes.This condition must be present for any form of NAFLD to exist.A macrovesicular steatosis pattern will be present in the fat.Large vacuoles, called fat vacuoles, usually contain a macrovesicular steatosis pattern in each cytoplasm and push the nucleus to one side.On one hand, some regions have medium-sized fat droplets, while others have very minute ones.This study uses metrics often utilized for object detection tasks.The four rectangle-shaped coordinates (x,y,w,h) of the identified bounding boxes constitute the model's output, the parameters of which are shown below.True Positive (TP): the accurate recognition outcome if the recognized box corresponds to fatty liver tissue ground truth.False Positive (FP): the improper recognition outcome where the identified box lies outside fatty liver tissue ground truth.True Negative (TN): no recognition for images in which fatty liver cells are not present.False Negative (FN): no recognition for images in which fatty liver cells are present.
To evaluate the performance, these parameters are used to calculate the following metrics [24]: The diagnostic performance of various algorithms for fatty liver disease has been illustrated in Table 1.The model's performance is reflected by its value, which should be as near to 1 as possible.For 8,000 iterations, the average loss are found to be 0.383155 and 1.779745 for YOLOv3 and YOLOv4, respectively.For 100 epochs, the total loss for YOLOv5, YOLOv6, YOLOv7, and YOLOv8 algorithms are 0.0247318, 2.3778, 0.03914, and 0.023472, respectively.The mAP value for YOLOv8 is found to be 99.1%, which is higher than the mAP value of all other algorithms which are used to test the models.The IoU value obtained is 0.999 for YOLOv8, thereby indicating that the predicted bounding box is quite close to the ground truth bounding box.The TP, TN, FP, and FN for the YOLOv8 algorithm are 99.375%,0.125%, 0.5%, and 0%, respectively.The accuracy, specificity, precision, recall, F1-score, and F2-score are 99.875%, 80%, 99.874%, 100%, 99.937%, and 99.975%, respectively.The FP is considerably low while testing the YOLOv8 model.A threshold value of 30% is utilized while testing every algorithm.The batch size of 16 is taken for training all the models.The performance comparison of the instance segmentation models is illustrated in Table 2.
The analysis demonstrates that the suggested method can measure hepatic steatosis and properly identify fat.For quicker processing, the suggested approach has been improved.Table 3 displays the evaluation of the performance of various methods and a comparison of mAP and IoU. Figure 4 displays the results of Faster R-CNN, SSD, YOLOv3, and YOLOv4 for various classes.These models can be deployed effectively and efficiently on a server, mobile device, or edge device.According to the instance segmentation findings, the Mask R-CNN's AC and SP are 97% and 81.905%, respectively, which are much better than all other methods.The hallmark of ballooning in the liver is an enlarged hepatocyte with a rarefied cytoplasm.With hematoxylin and eosin staining, the detection is challenging; therefore, our deep learning methodology is found to be useful for a proper diagnosis.Fibrosis in NASH can be either perisinusoidal or pericellular.The infiltration of mixed inflammatory cells that characterizes lobular inflammation in NASH is often modest, while periportal inflammation is rare in NASH and occurs mainly in other hepatic disorders such as hepatitis C and autoimmune hepatitis [56].In NAFLD patients, liver steatosis manifests as either little or big fat droplets.
According to the correlation values from the paired sample T-test, a patient with low ballooning scores is very likely to have low inflammation scores, and vice versa.Likewise, a patient with low inflammation scores is very likely to have low steatosis scores, and vice versa.However, for a patient with low ballooning score, a high steatosis score is very certain, and vice versa.Similarly, for a patient with low ballooning score, a high fibrosis score is very certain, and vice versa.UNIANOVA is performed to check if there is a difference among the classes, and the post hoc Bonferroni test is performed to discover the degree of difference between the classes.The intraclass correlation coefficient is 0.333, which is also statistically significant.When evaluated for the association of ballooning with inflammation, ballooning with steatosis, ballooning with fibrosis, inflammation with steatosis, inflammation with fibrosis, and steatosis with fibrosis, the Pearson Chi-Square values are 71.498,22.295, 51.678, 78.567, 131.217, and 131.217, respectively.All of them are found to be statistically significant, and therefore, we can conclude that there is a significant association between the classes.The Cramer's value is found to be such that there is a small to moderate effect of each class on the other class.Here it is established with the intraclass correlation coefficient, which is 0.667 when the scores for all four classes are taken into consideration.It indicates that 66.7% of the consistency is noted among the statistically significant scores.We see that for a single measurement, the intraclass correlation coefficient is 0.333, which is also statistically significant.
The highest correlation is observed between the inflammation score and the fibrosis score, which is 0.516; the lowest correlation is observed between the ballooning score and the steatosis score, which is 0.003.Between the four classes, we have a moderate inter-rater reliability.When comparing scores of ballooning with inflammation, inflammation with steatosis, inflammation with fibrosis, and steatosis with fibrosis, the Cohen's Kappa value are 0.205, 0.187, 0.191, and 0.347, respectively, which are found to be statistically significant, and the rest are found statistically insignificant.
The Kruskal-Wallis H for ballooning, fibrosis, inflammation, and steatosis are 71.739,86.237, 121.546, and 76.626, respectively.The Mann-White U-Test is performed and the values are found to be statistically significant for all classes.The Wilcoxon W is 21 for the ballooning score, 29 for the inflammation score, 35 for the steatosis score, 33 for the fibrosis score, and the Z-score is -3.028 for the ballooning score, which is statistically significant, -0.202 for the inflammation score, -0.213 for steatosis score, and -0.601 for fibrosis score, which are not statistically significant.
From the multinomial logistic regression model, we can conclude that when steatosis is from 5% to 33%, we observe inflammation with no foci, < 2 foci, and 2 to 4 foci.When steatosis is greater than 33% to 66%, we see inflammation < 2 foci per 200x field and 2 to 4 foci per 200x field.If steatosis is from 5% to 33%, we observe no ballooning.However, if steatosis is greater than 33% to 66%, a few ballooning cells occur.When there are numerous ballooning cells, we observe < 2 foci per 200x field [60].On the other hand, when there are few ballooning cells, we observe no fibrosis, perisinusoidal or periportal fibrosis, perisinusoidal, and periportal fibrosis.Additionally, when there are a lot of ballooning cells, we observe perisinusoidal or periportal fibrosis, perisinusoidal, and periportal fibrosis, and bridging fibrosis.From the analysis, we observed that when there is no inflammation.When there is no fibrosis and when steatosis is < 5%, we observe perisinusoidal and periportal fibrosis.
For the purpose of detecting the fat automatically in the liver from B-mode ultrasound image sequences, Byra et al. [2] used an Inception-ResNet-v2 deep convolutional neural network that had been previously trained on the ImageNet dataset.They achieved an accuracy of 90.9% and a specificity of 94.1%.By incorporating migration learning into the DenseNet model, Yang et.al. [46] developed a deep learning-based technique to grade liver steatosis.Then, the system's efficiency was confirmed by using it to grade actual cases of liver steatosis.The model has an accuracy of about 88.5% and a specificity of about 80%.A customized CNN deep learning model was developed by Arjmand et.al. with an accuracy and specificity of 95% and 98.3%, respectively [61,62].
A CNN to assess scores of the liver was developed by Heinemann et.al. [39] with an accuracy of 90.63%.Ugail et.al. [63] demonstrated a deep learning algorithm to classify livers suitable for transplantation, while achieving a high accuracy of 99.63%.Gaber et.al. designed a voting-based classifier and machine learning algorithm, which are used to construct a computer-aided diagnosis method that classifies hepatic tissues as either fatty or normal, utilizing attributes extracted from ultrasound images [64] with an accuracy of 95.71% and specificity of 94.44%.A random forest model by Wu et.al. [65] showed a high accuracy and specificity of 86.48% and 85.89%, respectively.In this work, for the YOLOv8 algorithm, the larger model YOLOv8-x is used as a pretrained weight.Different state-of-the-art algorithms such as YOLOv5, YOLOv6, YOLOv7, and YOLOv8 have been used to obtained the following output, as shown in Figure 5.A better architecture could emerge by fine-tuning the hyperparameters associated with an architecture.The architecture, such as Faster R-CNN, YOLOv8, and Mask R-CNN, is fine-tuned with several hyperparameters, and the results of this tuning are shown in Figure 6.The onfidence threshold is set to 80%.YOLOv8 performs better in terms of Average Precision on the MS COCO dataset.The hyperparameters are fine-tuned with PyTorch and it is observed that the losses are reduced (See Figure 6).The batch size, learning rate, number of epochs, anchor boxes, and IoU threshold can be adjusted for each specific application.In this work, the hyperparameters are tuned for an improved accuracy that includes setting the initial learning rate to 0.01, SGD momentum to 0.937, and warmup epochs to 3. The IoU (intersection over union) training threshold is set to 0.2 and the anchor multiple thresholds is set to 4 for better mAP.These results are shown in Table 3.The investigation on the fine-tuning of the hyperparameters yields a higher accuracy and decreased training time.The performance of the instance segmentation algorithms is illustrated in Figure 7 and Figure 8, respectively.Table 4 compares various efficient AIbased models for detecting liver diseases.It is quite possible to evaluate the overall steatosis percentage from the biopsy images according to physican's requirement.To determine the percentage of steatosis on a slide, the average of the marks is obtained and compared to the overall area of the image.Thus, the percentage of steatosis obtained: = (         × 100) %.As an example, for the YOLOv8 Instance Segmentation technique, the area of coverage of the mask is being determined and is illustrated in Figure 9.In addition to this result, it would be interesting to compare the ground truth with the predicted images for the four classes.An investigation on such comparison is performed and shown in Figure 10.In detecting certain images, difficulties are faced, for which, a few inference results are compromised.These facts are explained in Figure 11.The scoring of the liver according to histology is illustrated in Figure 12.The scoring system definitions, scores, and total number of detections done are illustrated in Table 5.Table 6 indicates the Wilcoxon W, Kruskal-Wallis H, and Z-score of the different liver conditions.The different classes for ballooning, steatosis, inflammation, fibrosis, and background classes for YOLOv5 algorithm are shown in the Confusion Matrix in Figure 13.The graphs in Figure 14 show the variation of F1-score vs Confidence, Precision vs Confidence, Precision vs Recall, and Recall vs Confidence for YOLOv5 algorithm.

Discussion
A liver biopsy procedure is commonly conducted on NAFLD patients to either confirm or rule out the diagnosis, identify any associated liver diseases, and determine the degree of liver liver damage, if any, for treatment and prognosis.The biopsy images used in this work are high-resolution (2×magnification) images, which are acquired using a microscope from a pathological laboratory to achieve a higher degree of accuracy.The key benefit of the suggested method over reported ones is that the processing time is same even though we are using biopsy images of a higher resolution.It is to be noted here that obtaining high-resolution images (which are demonstrated in this article) takes time and requires advanced equipment [67].As a result, processing high-quality images takes much less time and requires low computing power.Another benefit of the suggested method is that the entire process is automated without manual involvement.
Recently, there have been significant advancements in the field of computer vision.It is utilized for a variety of practical applications, including disease diagnosis and therapy.Our models are designed to provide a more user-friendly technique for liver disease diagnosis while reducing the loss of efficiency due to the lack of data, as we are using a large number of samples.To evaluate which technique is more accurate and efficient, we made comparisons among the networks under consideration.Compared to other algorithms, YOLOv5, YOLOv6, YOLOv7, and YOLOv8 have faster training, testing and outperform them regarding their mAP and IoU values (note that our algorithms are fine-tuned with the associated hyperparameters).The YOLO algorithms depend on the PyTorch framework.Utilizing a large dataset of liver biopsy images, the full training and testing process is conducted on a single GPU for 100 epochs and the results are found to be robust.
Deep learning frameworks have been the fastest-growing approach for biomedical image analysis.The baseline histological criteria for NASH diagnosis, listed most recently by the American Association for the Study of Liver Diseases (AASLD) suggestions are steatosis, lobular inflammation, and ballooning in the liver [51,68].Our suggested methodology makes it simple to identify hepatic steatosis from liver biopsy images.The overall loss for each method is determined to be quite minimal.An accurate diagnosis of steatosis is crucial for understanding the pathophysiology of the condition and evaluating the effectiveness of therapeutic treatment.A radiologist may take enough time to study a patient's image, depending on how challenging a case is; however, the deep learning model requires only a few seconds.In the future, clinical routines may combine deep learning algorithms and CAD technologies [64].
With Faster R-CNN, the region proposals' bottleneck is removed.In order to improve the robustness of region proposals, the learned RPN is used, which enhances the overall accuracy of object detection.SSD benefits from eliminating proposal generation and uses just one deep neural network.The SSD algorithm's performance is quite dependable because it utilizes default boxes with different aspect ratios for every feature map position.On the other hand, YOLO has the benefit of simultaneously completing the bounding box and class forecasting.The mAP values and accuracy of the YOLO algorithms are found to be higher than those of other cutting-edge algorithms.Given that the processing time is reduced and the images are easily obtained, the suggested technique is simple enough to incorporate into ordinary clinical practice.The algorithms utilized in this study could be applied in other investigations to pinpoint additional stomach problems.

Figure 1 .
Figure 1.Architecture of Faster R-CNN for fatty liver tissue detection.
indicates ground truth boxes,  indicates predicted boxes,    indicates the i th bounding box matched to j th ground truth box, and   and   are offsets of the bounding box b.

Figure 2 .
Figure 2. Architecture of SSD for fatty liver tissue detection.

Figure 3 .
Figure 3. Architecture of YOLO for fatty liver tissue detection.

Figure 4 .
Figure 4. Illustration of classes (ballooning, fibrosis, inflammation, and steatosis) for Faster R-CNN, SSD, YOLOv3, and YOLOv4 algorithms.The text generated by the program are not clearly readable in its present form.So the images with A = ballooning, B = fibrosis, C = inflammation, and D = steatosis.

Figure 5 .
Figure 5. Illustration of classes (ballooning, fibrosis, inflammation, and steatosis) for Faster R-CNN, SSD, YOLOv3, and YOLOv4 algorithms.The text generated by the program are not clearly readable in its present form.So the images with A = ballooning, B = fibrosis, C = inflammation and D = steatosis.

Figure 6 .
Figure 6.Fine-tuning of hyperparamaters shows better prediction accuracy for the classes with A = ballooning, B = fibrosis, C = inflammation, and D = steatosis.

Figure 7 .
Figure 7. Illustration of classes (ballooning, fibrosis, inflammation, and steatosis) for Instance Segmentation using Mask R-CNN and U-Net.The text generated by the program are not clearly readable in its present form.So the images with A = ballooning, B = fibrosis, C = inflammation, and D = steatosis.

Figure 8 .
Figure 8. Illustration of classes (ballooning, fibrosis, inflammation, and steatosis) for Instance Segmentation using YOLOv5, YOLOv7, and YOLOv8.The text generated by the program are not clearly readable in its present form.So the images with A = ballooning, B = fibrosis, C = inflammation, and D = steatosis.

Figure 9 .
Figure 9. Predicted area for the classes with A = ballooning, B = fibrosis, C = inflammation, and D = steatosis.

Figure 10 .
Figure 10.The comparison of ground truth with predicted images for the four classes.

Figure 11 .
Figure 11.Regions for difficult prediction by the YOLOv8 (A) Inflammation predicted with high steatosis, (B) Fibrosis on the corner of the tissue on the slide, (C) Out of focus region, (D) Steatosis covered by high fibrosis, (E) Corner of the tissue sample, (F) Inflammation region undetected, (G) Steatosis with signs of inflammation, (H) Fibrosis region covering the steatosis.

Figure 12 .
Figure 12.Scoring of liver according to histology.

Figure 14 .
Figure 14.Graphs showing the (A) F1-score vs Confidence, (B) Precision vs Confidence, (C) Precision vs Recall, and (D) Recall vs Confidence for the YOLOv8 algorithm.

Table 1 .
Comparative image recognition performance of algorithms for fatty liver detection.

Table 2 .
Comparative image recognition performance of algorithms for fatty liver segmentation.

Table 3 .
Study of state-of-the-art algorithms on biopsy dataset.

Table 4 .
Comparative performance of our work with recent work.

Table 5 .
Scoring system and corresponding detections for YOLOv8 model.

Table 6 .
Comparative performance of various algorithms for fatty liver detection.