Deep Learning e Based Image Analysis of Liver Steatosis in Mouse Models

The incidence of nonalcoholic fatty liver disease is a continuously growing health problem worldwide, along with obesity. Therefore, novel methods to both ef ﬁ ciently study the manifestation of nonalcoholic fatty liver disease and analyze drug ef ﬁ cacy in preclinical models are needed. In the present study

The incidence of nonalcoholic fatty liver disease is a continuously growing health problem worldwide, along with obesity. Therefore, novel methods to both efficiently study the manifestation of nonalcoholic fatty liver disease and analyze drug efficacy in preclinical models are needed. In the present study, we developed a deep neural networkebased model to quantify microvesicular and macrovesicular steatosis in the liver on hematoxylin-eosinestained whole slide images, using the cloud-based platform, Aiforia Create. The training data included a total of 101 whole slide images from dietary interventions of wild-type mice and from two genetically modified mouse models with steatosis. The algorithm was trained for the following: to detect liver parenchyma, to exclude the blood vessels and any artefacts generated during tissue processing and image acquisition, to recognize and differentiate the areas of microvesicular and macrovesicular steatosis, and to quantify the recognized tissue area. The results of the image analysis replicated well the evaluation by expert pathologists and correlated well with the liver fat content measured by EchoMRI ex vivo, and the correlation with total liver triglycerides was notable. In conclusion, the developed deep learningebased model is a novel tool for studying liver steatosis in mouse models on paraffin sections and, thus, can facilitate reliable quantification of the amount of steatosis in large preclinical study cohorts. (Am J Pathol 2023, -: 1e9; https://doi.org/10.1016/j.ajpath.2023.04.014) Nonalcoholic fatty liver disease (NAFLD) is the hepatic manifestation of metabolic Q5 syndrome. 1 The prevalence of NAFLD (eg, nonalcoholic steatosis, steatohepatitis, and liver fibrosis) is increasing worldwide alongside lifestylerelated diseases, such as obesity and diabetes, leading to increased liver-related morbidity and mortality. 1 The hallmarks of NAFLD are microvesicular and macrovesicular lipid droplets accumulating inside the hepatocytes. 2 Microvesicular steatosis is characterized by the presence of small fat droplets with preserved cellular architecture, whereas in macrovesicular steatosis, large droplets that displace the nucleus are formed. 3,4 Microvesicular steatosis is associated with higher grades of steatosis, 5 and when present it may indicate increased disease severity or drug-induced hepatotoxity. 6 The type and amount of liver steatosis is a central quantifiable phenotype in experimental mouse models of NAFLD, frequently utilized in studies aimed at understanding the pathogenesis of disease or exploring new therapeutic options. Traditionally, the analysis of histopathology, such as the evaluation of liver steatosis, has been based on pathologist's visual evaluation, as computational analysis of images has only fairly recently become an option in histopathology. 7 During the past 5 years, deep learning 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 (DL)ebased methods have become popular in such image analysis, because of the development of graphics processing units and more powerful memory resources. 8 In histology especially, where the amount of digitized data is high, higher computer power is needed. 9 DL is a machine learning method that can computationally learn image features for digitized image analysis. DL is based on artificial neural networks that are composed of layers of nodes. In image analysis, convolutional neural networks are the most commonly used DL method. In convolutional neural networks, the convolution operation is used at least on one of the layers of the deep neural network. 10 Histology is the reference standard for staging NAFLD severity, 11 and a widely used histologic scoring for liver steatosis is called the Kleiner score, where the histopathologic features of NAFLD, and its severe form, nonalcoholic steatohepatitis, are assessed. 12 Many novel deep learning models are based on detecting and quantifying these histologic features, which have traditionally been evaluated by pathologists. One approach is to use hematoxylin-eosin (HE)estained liver tissue sections for generating deep learning models to detect lipid droplets, 13,14 inflammation, ballooning, and collagen fibers. 15 Prior work in developing deep learning models to detect fat accumulation in the liver has concentrated on segmenting and quantifying macrovesicular lipid droplets. 13,14 However, there has been less attention on developing methods to quantify microvesicular steatosis, although a few automatic methods have been published. 16 The task of quantifying microvesicular steatosis poses a challenge for automated image analysis, because the size and shape of lipid droplets vary and the discrimination of microvesicular steatosis from general ballooning is challenging. 17 However, quantifying both microvesicular and macrovesicular steatosis is essential to better define the stage of the steatosis in the experimental models.
In this study, we developed a deep neural networkebased model in Aiforia Create that recognizes and differentiates microvesicular and macrovesicular steatosis in the HEstained whole slide images (WSIs) of mouse liver tissue sections. The DL model measures the tissue areas in the section covered by these features and provides a reliable tool to obtain a comprehensive analysis of the severity of liver steatosis in an experimental mouse model of NAFLD. We validated the image analysis result with two experienced human pathologists Q6 , as well as two independent methods of lipid content measurement. Animals were housed in individually ventilated cages (Techniplast, Buguggiate, Italy) with approximately 70 air changes per hour. Constant temperature (21 C AE 3 C) and humidity (55% AE 15%) were maintained together with a consistent 12-hour light-dark cycle, with a light change at 7 AM and 7 PM. Autoclaved aspen chips were used as bedding (Tapvei Ltd, Harjumaa, Estonia). The chow and water were available ad libitum. The genetically modified mice and their littermates were individually identified with earmarks and housed with littermates, one to six mice per cage. The genotyping protocols applied have been documented in previous publications. 18,19 For training the DL model, the authors used mouse models, which had been previously characterized to present with liver steatosis. The training data included HE-stained WSIs of formalin-fixed, paraffin-embedded mouse liver samples from 9-montheold mice deficient in hydroxysteroid (17b) dehydrogenase 13 18 or deficient in hydroxysteroid (17b) dehydrogenase 12, 19 and their wild-type littermates in C57Bl6; 129S5 hybrid (50/50) and C57BL/ 6NCrl genetic background, respectively. In addition, the training data included HE-stained WSIs of mouse liver samples obtained from C57Bl wild-type mice kept on a Western diet (D12079B; Research Diets Inc., New Brunswick, NJ) for 14 weeks. Validation data included HEstained WSIs of mouse liver samples from wild-type mice with C57Bl6; 129 hybrid genetic background and mice with same background kept in high-fat diet (HFD) conditions (D17010103; Research Diets Inc.) 20 for 12 weeks.

Liver Tissue Composition, Triglyceride Content, and Histologic Analysis
Mice were euthanized using CO 2 asphyxiation followed by cervical dislocation. Liver tissue was dissected, and the amount of total fat, free water, and lean mass was measured from the whole liver with an EchoMRI-700 device (Echo Medical Systems, Houston, TX).
For the histology, the medial lobe of the liver was collected, and tissue samples were fixed in 10% formalin for 24 to 48 hours, dehydrated, and embedded in paraffin. Sections (4 mm thick) were cut and stained with HE following standard procedures.

Imaging and Software
To generate the WSIs, the HE-stained microscope slides were scanned using a Pannoramic 250 and 1000 Flash digital slide scanners (3DHISTECH Ltd, Budapest, Hungary) with Â20 magnification with a resolution of 0.24 mm/pixel. The image quality was ensured by the automatic scanner initialization and visual inspection after scanning (ie, verifying that the images are in focus and no artefacts are present). The WSIs were converted from an MRXL format to an ECW format and uploaded into Aiforia Create, an image management and analysis cloud platform version 5.2 (Aiforia Technologies, Helsinki, Finland), which aids the development of a machine learning models with deep convolutional neural networks and supervised learning. After the conversion, image quality was verified to remain the same with visual verification.

Generating a Model to Quantify Hepatic Steatosis in Histologic Sections by Supervised Learning
Hepatic steatosis was quantified from HE-stained images with the DL model generated in Aiforia Create. The model was first trained to detect liver parenchyma, including normal hepatocytes, macrovesicular hepatocytes, and microvesicular hepatocytes. The algorithm excluded background, tissue artefacts, blood vessels, and other features that do not belong to the liver parenchyma. The second algorithm was trained to recognize microvesicular and macrovesicular steatosis in the liver parenchyma, and eventually these two algorithms were combined to generate the final model (Supplemental Table S1). The development of both algorithms followed a similar workflow. The model development started with labeling the training data, including 77 WSIs of HE-stained mouse liver samples from wild-type mice and mice deficient in hydroxysteroid (17b) dehydrogenase 13 and deficient in hydroxysteroid (17b) dehydrogenase 12, as well as the 24 WSIs of HE-stained mouse liver samples from C57Bl mice with a Western diet. The annotated training set (n Z 101) was not used for validation. The representative areas of both morphologic features were selected visually, and the training areas and feature labels were drawn with the pentool in Aiforia by one person. The annotated training areas were selected to represent typical features of microvesicular steatosis and macrovesicular steatosis, annotation areas varied in size and color to offer a variety of examples for training, and annotations were drawn with versatile magnification views. The tissue was considered to contain macrovesicular steatosis if the single lipid droplets were bigger than the average size of nucleus and macrovesicular Q8 steatosis if the lipid droplets were smaller than average size of the nucleus. During the training of the model, unclear regions were compared with Oil Red Oestained sections from the same liver sample to confirm the presence of fat. Annotation areas varied in size between 25 and 250 mm 2 , and all together 846 areas were annotated (Supplemental Table S1). If one training area included parenchyma, microvesicular steatosis, and macrovesicular steatosis, borders were drawn and different areas were labeled only to one class. The DL model was trained on the basis of these annotations, and the resulting model was evaluated visually. On the basis of the evaluation, parameters were optimized, annotations were edited, and new annotations were generated, followed by a new model version. Overfitting was minimized by training the model with a large and diverse data set, by using data augmentation, and by having separate testing data for validation. The training was performed through an iterative process, and the final DL model was a result of model version 11, trained with 2000 iterations (Supplemental Table S2).

Validation of the DL Model Quantifying Hepatic Steatosis in Histologic Sections
To evaluate the performance of DL model, the annotations that formed the ground truth were compared with the analysis results of the artificial intelligence model both statistically in Aiforia and visually.
Eventually, the performance of the artificial intelligence model was validated using 20 HE-stained WSIs, which were not used for training, and the results obtained with the model were compared with the classification provided by two experienced pathologists Q9 . The pathologists provided an evaluation of the histologic features of steatosis by providing the percentage of hepatocytes having features of microsteatosis and macrosteatosis. In addition, a third experienced liver pathologist Q10 performed pixel-level validation in Aiforia Create.

Statistical Analysis
Statistical analyses were performed using GraphPad Prism software version 8.4.2 (GraphPad Software, La Jolla, CA). The Shapiro-Wilk test was used for testing the normality of the data. Differences between the two groups were evaluated using the two-tailed t-test and U-test based on if the data were normally distributed. Correlation was tested by using a Pearson correlation coefficient (r). P < 0.05 was considered significant.

½F1
). As expected, HE staining of liver tissue showed lipid droplets in the steatotic livers of the HFD-fed mice ( Figure 1B), and the model accurately detected and quantified the area of tissue containing lipid droplets, typically including both microvesicular and macrovesicular steatosis ( Figure 1F).

The Model Quantifies Microvesicular and Macrovesicular Steatosis
The verification tool in the Aiforia Create was utilized to detect false-positive and false-negative areas of microvesicular and macrovesicular steatosis tissue and their sum (ie, total error per training area). The results showed that the model learned the ground truth with high precision (>98%) and sensitivity (>99%) ( Table 1   ½T1 ). In addition, a third experienced liver pathologist Q12 performed a pixel-level validation in Aiforia Create. For this, a separate expert drew 61 representative validation areas in 39 WSIs, and the validator classified the tissue area into parenchyma, microvesicular steatosis, or macrovesicular steatosis. When compared with this evaluation by a pathologist, the sensitivity of the model was 97% and precision was 89% with total area error of 8.39%, showing a good performance by the model (Supplemental Table S3).

The Performance of the Artificial Intelligence Model Replicates the Independent Evaluation by Pathologists
Part of the validation data (including 20 HE-stained WSIs of mouse liver samples from wild-type mice, half of which were kept on an HFD for 12 weeks) was further used to compare the performance of the artificial intelligence model with the staging provided by two experienced human pathologists   Figure 2E) and 0.914 (P < 0.001) in microvesicular steatosis ( Figure 2F).

The Image Analysis Results Performed by the DL Model Correlate with Other Measurements of Fat Content in the Mouse Liver
The data obtained with the validated DL model showed a marked increase in both macrovesicular and microvesicular steatosis in mice on an HFD compared with the chow diet ( Figure 3, A and B   ½F3 ). Furthermore, a similar increase in fat content was observed in a whole liver analysis using EchoMRI ex vivo ( Figure 3D) and by measuring the triglyceride concentration in tissue homogenate ( Figure 3C) in corresponding liver samples. Furthermore, the calculated Pearson correlation coefficient between EchoMRI and the DL model was 0.92 ( Figure 3F), and between triglyceride measurement and image analysis, the correlation coefficient was 0.69 ( Figure 3E). A correlation matrix for all analyses performed is presented in Figure 4 ½F4 .

Discussion
In this study, an automatic image analysis method based on deep learning was developed to quantify microvesicular and macrovesicular steatosis in WSIs of HE-stained sections of mouse liver, and the performance of the model was evaluated using specimens with various degrees of steatosis. The DL model detected the parenchyma with high confidence and quantified the tissue areas, including microvesicular and macrovesicular steatosis, as evidenced by the high correlation with the results obtained by experienced pathologists. Further reference data for the method validation were obtained from liver fat content measurements performed with an EchoMRI and from the measurement of the triglyceride concentration in liver homogenates. All the analyses indicated a reliable and robust performance by the model.
The advantage of the developed DL model is that, in addition to identification of macrovesicular steatosis, the

Image Analysis of Mouse Liver Steatosis
The American Journal of Pathologyajp.amjpathol.org 7 far as we know, no such DL-based automatic analysis methods are currently available. However, the present model enables visualizing with pseudocolors the spatial appearance of the areas defined as including microsteatosis and macrosteatosis. The model was developed with a software package including a licensing fee. The software allows users to train their own algorithms independently of the software provider. Also, the data, parameters, and ground truth used can be shared with other investigators for the reproduction of the DL model and replication of the results. When using supervised deep learning for image analysis, annotating the data requires histologic knowledge and the result is, to some extent, operator dependent. Therefore, interoperator and quantitative biochemical validation are of great importance. In addition, in histology-based techniques, only crosssections can be analyzed, and comparison to methods applied to the whole biopsy is important for validation. One approach is to compare the performance of the histologic DL model with well-known biomarkers of the disease. 26 In our study, the image analysis result reliably reflects the concentration of triglycerides measured from the tissue homogenates and the whole liver fat content analyzed by an EchoMRI ex vivo. The EchoMRI analysis used in this study is a novel method for such comparison, and the correlation between the EchoMRI and the image analysis by the DL model was good (Pearson r Z 0.92). The concentration of triglycerides measured in liver homogenates differed slightly from other measurements, which is likely due to the technical challenges in extracting lipids from the liver tissue.
The results of the proposed DL model were in good concordance with the pathologists' evaluation. However, the discrimination of microvesicular steatosis from general ballooning and degeneration is still challenging, and it is notable that quantification of low level of microvesicular steatosis differed between the pathology evaluation and that performed by the DL model. The pathologists did not consider significant microsteatosis to be present in several cases where the DL model identified 0% to 30% of the areas to include signs of microvesicular steatosis. Interestingly, the triglyceride and EchoMRI measurement showed a good correlation with the DL model also in samples with a low level of lipid accumulation, indicating that the biomarkers used and the DL model developed are sensitive markers for detecting low levels of fat accumulation. The high sensitivity of the method developed can be considered a beneficial feature in analyzing, for example, effects of pharmaceutical and genetic interventions in preclinical animal models.
In addition to the diet-induced NAFLD models, various genetically modified mouse models present features of NAFLD. 18,19,27 Thus, we considered it is important to train a model with both genetically modified mice presenting with steatosis on a chow diet, as well as using HFD intervention models. We expect that this approach would increase the suitability of this DL model for various data sets. Furthermore, the method allows us to study large sample cohorts with reduced hands-on time and increase the reproducibility of the data. In conclusion, in the present study, we have developed a DL image analysis model to quantify microvesicular and macrovesicular steatosis in mouse liver, applicable for studying NAFLD phenotype in models with dietary interventions as well as in genetically modified mouse models and a combination of the two. ' evaluation differentiate microvesicular (micro) and macrovesicular (macro) steatosis, but triglyceride and EchoMRI measurements assess the whole fat content. A: In the case of macrovesicular steatosis, both pathologists' evaluation and the EchoMRI measurements correlate well with the deep learning (DL) model, but the correlation with the total liver triglycerides is less strong. B: In relation to the presence of microvesicular steatosis, the pathologists' evaluation and both the EchoMRI and the triglyceride measurements show correlation with the DL model; however, the correlation between the EchoMRI and triglycerides is less strong.