Modeling Based on Ensemble Learning Methods for Detection of Diagnostic Biomarkers from LncRNA Data in Rats Treated with Cis-Platinum-Induced Hepatotoxicity

Background: The first aim of this study is to perform bioinformatic analysis of lncRNAs obtained from liver tissue samples from rats treated with cisplatin hepatotoxicity and without pathology. Another aim is to identify possible biomarkers for the diagnosis/early diagnosis of hepatotoxicity by modeling the data obtained from bioinformatics analysis with ensemble learning methods. Methods: In the study, 20 female Sprague-Dawley rats were divided into a control group and a hepatotoxicity group. Liver samples were taken from rats, and transcriptomic and histopathological analyses were performed. The dataset achieved from the transcriptomic analysis was modeled with ensemble learning methods (stacking, bagging, and boosting). Modeling results were evaluated with accuracy (Acc), balanced accuracy (B-Acc), sensitivity (Se), specificity (Sp), positive predictive value (Ppv), negative predictive value (Npv), and F1 score performance metrics. As a result of the modeling, lncRNAs that could be biomarkers were evaluated with variable importance values. Results: According to histopathological and immunohistochemical analyses, a significant increase was observed in the sinusoidal dilatation and Hsp60 immunoreactivity values in the hepatotoxicity group compared to the control group (p < 0.0001). According to the results of the bioinformatics analysis, 589 lncRNAs showed different expressions in the groups. The stacking model had the best classification performance among the applied ensemble learning models. The Acc, B-Acc, Se, Sp, Ppv, Npv, and F1-score values obtained from this model were 90%, 90%, 80%, 100%, 100%, 83.3%, and 88.9%, respectively. lncRNAs with id rna-XR_005492522.1, rna-XR_005492536.1, and rna-XR_005505831.1 with the highest three values according to the variable importance obtained as a result of stacking modeling can be used as predictive biomarker candidates for hepatotoxicity. Conclusions: Among the ensemble algorithms, the stacking technique yielded higher performance results as compared to the bagging and boosting methods on the transcriptomic data. More comprehensive studies can support the possible biomarkers determined due to the research and the decisive results for the diagnosis of drug-induced hepatotoxicity.


Introduction
The liver is involved in the realization of many chemical reactions, the production of bile, the metabolism of carbohydrates, lipids and proteins, and detoxification [1]. The more accurate and trustworthy model (meta classifier) than a base classifier (model) can achieve [20]. This study aimed to determine possible biomarkers for the diagnosis/early diagnosis of hepatotoxicity by modeling the transcriptomic data of liver tissues from rats treated with cisplatin hepatotoxicity and without pathology with ensemble learning methods.

Dataset
A total of 20 female Sprague-Dawley rats (age: 3 months; weight: 250 ± 20 g) were taken fromİnönü University Experimental Animal Production and Research Center to detect possible biomarkers of drug-induced hepatotoxicity and to classify hepatotoxicity at the clinical level.

•
Control group (C): The group in which cisplatin vehicle solvent was given intraperitoneally on the first day of the experiment. • Hepatotoxicity group (CK): The group in which 7 mg/kg cisplatin was given intraperitoneally on the first day of the experiment.
On the 4th day of the experiment, liver tissue samples from rats treated with ketamine (225 mg/kg i.p.) and xylazine (24 mg/kg i.p.) were collected for transcriptomic and histopathological analyses under high-dose anesthesia.

Histopathological Analysis
Tissues taken at the end of the experimental period were fixed in 10% formaldehyde. At the end of the fixation process, the tissues, which were washed with tap water, were subjected to dehydration and polishing processes, and embedded in paraffin. Sections of 4 µm thickness were taken from the paraffin blocks obtained afterward. The sections subjected to deparaffinization and rehydration procedures were stained using hematoxylineosin (H-E) for general evaluation. Stained preparations were examined with a Leica DFC-280 research microscope using the Leica Q Win Image Analysis System (Leica Micros Imaging Solutions Ltd., Cambridge, UK).

Immunohistochemical Analysis
Sections subjected to deparaffinization and rehydration processes for immunohistochemical analyses were boiled in 0.01 M citrate (pH 6.0) using a pressure cooker for 15-20 min. To block the endogenous peroxidase enzyme activity on the sections to be examined, 3% hydrogen peroxide was applied for 12 min. Sections washed with Phosphate Buffered Saline (PBS) were subjected to protein blocking for 5 min. Afterward, the sections whose protein blocking process was completed were incubated with primary antibody (Hsp60) for 60 min at 37 • C. Biotin-based secondary antibodies were applied to the tissues washed with PBS for 10 min at 37 • C. At the end of this process, the sections were incubated with streptavadin peroxidase for 10 min at 37 • C. Then, after the chromogen application process was applied to the sections, they were stained with hematoxylin and closed with a water-based sealer. Immunohistochemical staining, the extent of immunoreactivity (0: no staining, 1: 1-2%, 2: 26-50%, 3: 51-75%, 4: 76-100%), and severity (0: no, +1: mild, +2: mod-erate, +3: severe) scored semi-quantitatively. The total staining score (H score) and the prevalence x severity was obtained by calculating [22].
Analyses were performed with a Leica DFC-280 research microscope using the Leica Q Win Image Analysis System (Leica Micros Imaging Solutions Ltd., Cambridge, UK).

NGS Library Preparation and Sequencing for lncRNA Sequences
The lncRNA sequences were prepared using the "TruSeq Stranded Total RNA Library Prep Kit" from Illumina. Compatible with a wide range of samples, including low-quality DNA/RNA and formalin-fixed, paraffin-embedded (FFPE) samples, TruSeq Stranded Total RNA couples all the benefits of TruSeq RNA technology with Ribo-Zero ribosomal RNA reduction chemistry. The product enables analysis of coding and multiple forms of non-coding RNA with precise measurement of strand orientation, uniform coverage, and high-confidence discovery of features such as alternative transcripts, gene fusions, and allele-specific expression.
The sequences were prepared as follows: • Ribosomal RNAs (rRNAs) were eliminated from total RNA, and the remaining RNAs were purified and fragmented. • rRNA elimination was validated with the Bioanalyzer. • Then, RNA fragments were reverse transcribed (first strand cDNA synthesis) using random hexamer sequences [23]. • Afterward, the RNA template was eliminated, and the second strand of cDNA (blunt ds cDNA) was synthesized [23]. • A single 'A' nucleotide was added to their 3' ends to prevent the blunt ds cDNA fragments from binding together during the adapter ligation reaction. • Indexing adapters were then added to hybridize the ds cDNA fragments to the flow cell surface. • DNA fragments have been enriched.

•
The libraries of the samples were normalized and pooled. • Samples of 50M readings were made with the Illumina NovaSeq 6000 platform as a paired-end (PE) 2 × 150 base [24].

Data Analysis and Modeling Tasks
Whether the variables showed normal distribution or not was examined with the Shapiro-Wilk test. The data were presented as median (minimum-maximum) or mean±standard deviation (SD). The Mann-Whitney U test was used to compare non-normally distributed data, and the independent sample t-test was used to compare normally distributed data. The p-value < 0.05 was considered statistically significant. IBM SPSS Statistics 25.0 software was used in the analysis. The TMM (Trimmed mean of M values) normalization method was employed for the relevant data. In bioinformatic analysis, False Discovery Rate (FDR) was utilized for the assessments.
The elastic net variable selection method was used as the variable selection method within the scope of the study. R programming language-based RStudio was used in the implementation and variable selection stages of the bagging, boosting, and stacking ensemble learning models planned to be used in the study. This study used Acc, B-Acc, Se, Sp, Ppv, Npv, and F1-score metrics in the model performance evaluation. In addition, the graphics used in the visualizations were made using Excel software and R programming language.

Histopathological Results
The related liver damage was examined for hepatocyte degeneration and sinusoidal dilatation. In the sections where the H-E staining method was applied, the liver in the control group had a normal histological appearance, except for mild changes ( Figure 1A). In the Hepatotoxicity group, hepatocyte degeneration was similar to the control group, while a significant increase in sinusoidal dilatation was observed (p < 0.0001) ( Figure 1B). Histopathological evaluation results are given in Table 1. values) normalization method was employed for the relevant data. In bioinformatic analysis, False Discovery Rate (FDR) was utilized for the assessments. The elastic net variable selection method was used as the variable selection method within the scope of the study. R programming language-based RStudio was used in the implementation and variable selection stages of the bagging, boosting, and stacking ensemble learning models planned to be used in the study. This study used Acc, B-Acc, Se, Sp, Ppv, Npv, and F1-score metrics in the model performance evaluation. In addition, the graphics used in the visualizations were made using Excel software and R programming language.

Histopathological Results
The related liver damage was examined for hepatocyte degeneration and sinusoidal dilatation. In the sections where the H-E staining method was applied, the liver in the control group had a normal histological appearance, except for mild changes ( Figure 1A). In the Hepatotoxicity group, hepatocyte degeneration was similar to the control group, while a significant increase in sinusoidal dilatation was observed (p < 0.0001) ( Figure 1B). Histopathological evaluation results are given in Table 1.

Immunohistochemical Results
Hsp60 immunoreactivity was distinguished by brownish staining in hepatocyte cytoplasm. Accordingly, it was observed that Hsp60 immunoreactivity was mild in the sections of the control group ( Figure 2A). It was determined that cisplatin administration increased Hsp60 immunoreactivity in hepatocytes, and this increase was statistically significant when compared to the control group ( Figure 2B) (p < 0.0001). The immunoreactivity evaluation results of the groups are given in Table 1.

Immunohistochemical Results
Hsp60 immunoreactivity was distinguished by brownish staining in hepatocyte cytoplasm. Accordingly, it was observed that Hsp60 immunoreactivity was mild in the sections of the control group ( Figure 2A). It was determined that cisplatin administration increased Hsp60 immunoreactivity in hepatocytes, and this increase was statistically significant when compared to the control group ( Figure 2B) (p < 0.0001). The immunoreactivity evaluation results of the groups are given in Table 1.

Figure 2.
Hsp60 immunoreactivity in hepatocytes in the control (A) and hepatotoxicity (B) groups is shown with a star. Notably, the prevalence and severity of immunoreactivity were significantly higher in the hepatotoxicity group than in the control group. Hsp60 immunostaining; ×200.
Descriptive statistics for the rats used in the experiment are given in Table 2. Descriptive statistics by the groups (control-hepatotoxicity) are given in Table 3.

Transcriptomic Analysis Results
The isolated RNA samples were examined with Agilent 2100 bioanalyzer system for quality control. The data whose quality control was completed were sequenced, and the quality control of the raw read data (fastqc) obtained as a result of sequencing was performed using FASTQC* software, version 0.11.8.
Quality-controlled raw read sequences were mapped to the reference genome using STAR* aligner software. The reference genome for mapping was taken from Rattus norvegicus (assembly mRatBN7.2) (GCF_015227675.2_mRatBN7.2_genomic.fna) and annotation track GCF_015227675.2_mRatBN7.2_genomic.gff_Lnc_rna. The quantification (count table) of reads mapped to the reference genome was conducted using the HTSeq* tool.

Differential Expression Results
The dataset used in the study contains 16,386 expressions. According to the results of the bioinformatics analysis, 589 (FDR < 0.05) lncRNAs showed different expressions in the groups. Of these, 450 showed up-expression (logFC > 1), and 139 showed down-expression (logFC < −1). Data set for bioinformatic analyses is presented in Supplementary Materials.
Based on the principal components (PCO) analysis, the distribution of the samples was found to be compatible in terms of lncRNA expression levels in the hepatotoxicity Hsp60 immunoreactivity in hepatocytes in the control (A) and hepatotoxicity (B) groups is shown with a star. Notably, the prevalence and severity of immunoreactivity were significantly higher in the hepatotoxicity group than in the control group. Hsp60 immunostaining; ×200.
Descriptive statistics for the rats used in the experiment are given in Table 2. Descriptive statistics by the groups (control-hepatotoxicity) are given in Table 3.

Transcriptomic Analysis Results
The isolated RNA samples were examined with Agilent 2100 bioanalyzer system for quality control. The data whose quality control was completed were sequenced, and the quality control of the raw read data (fastqc) obtained as a result of sequencing was performed using FASTQC* software, version 0.11.8.
Quality-controlled raw read sequences were mapped to the reference genome using STAR* aligner software. The reference genome for mapping was taken from Rattus norvegicus (assembly mRatBN7.2) (GCF_015227675.2_mRatBN7.2_genomic.fna) and annotation track GCF_015227675.2_mRatBN7.2_genomic.gff_Lnc_rna. The quantification (count table) of reads mapped to the reference genome was conducted using the HTSeq* tool.

Differential Expression Results
The dataset used in the study contains 16,386 expressions. According to the results of the bioinformatics analysis, 589 (FDR < 0.05) lncRNAs showed different expressions in the groups. Of these, 450 showed up-expression (logFC > 1), and 139 showed down-expression (logFC < −1). Data set for bioinformatic analyses is presented in Supplementary Materials.
Based on the principal components (PCO) analysis, the distribution of the samples was found to be compatible in terms of lncRNA expression levels in the hepatotoxicity group (C) vs. control group (CK) comparison. Controls and application samples were collected in the same group. In this case, control and treatment samples presented similar expression level changes for similar lncRNAs. The visual figure for PCO analysis is given in Figure 3.
Diagnostics 2023, 13, x FOR PEER REVIEW 7 of 15 group (C) vs. control group (CK) comparison. Controls and application samples were collected in the same group. In this case, control and treatment samples presented similar expression level changes for similar lncRNAs. The visual figure for PCO analysis is given in Figure 3. The heatmap representation of the fifty most expressed lncRNAs in the two group comparisons is given in Figure 4. The heatmap representation of the fifty most expressed lncRNAs in the two group comparisons is given in Figure 4.  In the C vs. CK comparison, it is observed that the application samples exhibited a different expression profile compared to the control, shown in red for the 50 lncRNAs that showed the most variation and in green for the suppressed expression level. The C-4 sample showed a different profile than the application samples.
The volcano plot used to visualize differentially expressed genes is given in Figure 5. According to Figure 5, lncRNAs in red are up-regulated, and those in blue are down-regulated lncRNAs. The lncRNAs in black are the lncRNAs that are not expressed differently for the two groups. In the C vs. CK comparison, it is observed that the application samples exhibited a different expression profile compared to the control, shown in red for the 50 lncRNAs that showed the most variation and in green for the suppressed expression level. The C-4 sample showed a different profile than the application samples.
The volcano plot used to visualize differentially expressed genes is given in Figure 5. According to Figure 5, lncRNAs in red are up-regulated, and those in blue are downregulated lncRNAs. The lncRNAs in black are the lncRNAs that are not expressed differently for the two groups.

Biostatistics Analysis and Modeling Results
The data of 16,386 lncRNAs in the data set were obtained by the TMM (Trimmed mean of M values) normalization method. From these data, 17 lncRNAs were chosen by the elastic net variable selection method from 589 lncRNAs with different regulations (up and down) between the two groups. The data analysis results of these chosen expressions are given in Table 4, which includes the selected expressions and descriptions of the data set, descriptors of the chosen expressions for the target variable under study, statistical significance, and the log fold change (LogFC) per gene for the target variable.

Biostatistics Analysis and Modeling Results
The data of 16,386 lncRNAs in the data set were obtained by the TMM (Trimmed mean of M values) normalization method. From these data, 17 lncRNAs were chosen by the elastic net variable selection method from 589 lncRNAs with different regulations (up and down) between the two groups. The data analysis results of these chosen expressions are given in Table 4, which includes the selected expressions and descriptions of the data set, descriptors of the chosen expressions for the target variable under study, statistical significance, and the log fold change (LogFC) per gene for the target variable. According to the statistical analysis results in Table 4, significant differences were detected between the groups in all lncRNA expressions (p < 0.05).
The results of performance metrics obtained as a result of bagging, boosting, and stacking models, which are ensemble learning models using selected lncRNAs, are given in Table 5. According to the classification performance of the bagging, boosting, and stacking models, the bagging model's Acc was 85%, B-Acc was 85%, Se was 70%, Sp was 100%, PPV was 100%, NPV was 76.92%, and F1-score was 82.4%. Acc of 85%, B-Acc of 85%, Se of 70%, Sp of 100%, PPV of 100%, NPV of 76.92%, and F1-score of 82.4% were obtained from the boosting model. Acc as 90%, B-Acc as 90%, Se as 80%, Sp as 100%, PPV as 100%, NPV as 83.3%, and F1-score as 88.9% were calculated from the stacking model.
The graph of the performance metrics for the stacking model, which gives the best result from the ensemble models used, is given in Figure 6.

Discussion
Liver toxicities based on drugs are known to be one of the leading causes of liver damage [2]. Drug-induced hepatotoxicities lead to various clinical manifestations, such as acute liver failure, cirrhosis, and liver cancer, which are non-specific changes [25]. The liver is an organ exposed to drug toxicities due to its functions and is personally affected [26,27]. Hepatotoxicity due to the amount of drug dose used is responsible for almost 50% of all acute liver failure cases, especially in the United States, England, and some Western countries [27]. It is essential to explain the mechanism of action of emerging liver toxicities and to develop treatment methods accordingly in terms of the mortality risk of patients [28]. Considering these factors, there is a need for non-known biomarkers that

Discussion
Liver toxicities based on drugs are known to be one of the leading causes of liver damage [2]. Drug-induced hepatotoxicities lead to various clinical manifestations, such as acute liver failure, cirrhosis, and liver cancer, which are non-specific changes [25]. The liver is an organ exposed to drug toxicities due to its functions and is personally affected [26,27]. Hepatotoxicity due to the amount of drug dose used is responsible for almost 50% of all acute liver failure cases, especially in the United States, England, and some Western countries [27]. It is essential to explain the mechanism of action of emerging liver toxicities and to develop treatment methods accordingly in terms of the mortality risk of patients [28]. Considering these factors, there is a need for non-known biomarkers that

Discussion
Liver toxicities based on drugs are known to be one of the leading causes of liver damage [2]. Drug-induced hepatotoxicities lead to various clinical manifestations, such as acute liver failure, cirrhosis, and liver cancer, which are non-specific changes [25]. The liver is an organ exposed to drug toxicities due to its functions and is personally affected [26,27]. Hepatotoxicity due to the amount of drug dose used is responsible for almost 50% of all acute liver failure cases, especially in the United States, England, and some Western countries [27]. It is essential to explain the mechanism of action of emerging liver toxicities and to develop treatment methods accordingly in terms of the mortality risk of patients [28].
Considering these factors, there is a need for non-known biomarkers that can explain drug-induced hepatotoxicity, which is a fundamental research direction and can reveal the damage to patients or predict whether the injury will develop. LncRNAs are molecules that interact with DNA, mRNA, protein, and miRNA structures to regulate and contribute to gene expression at epigenetic, transcriptional, post-transcriptional, and translational functional levels [29]. It has been reported that lncRNAs are involved in many regulatory mechanisms in the case of transcription and subsequent gene expression and perform primary functions for quite different biological processes [30]. Due to these mechanisms of action, studies with these RNAs have gained importance [31].
In the present study, transcriptomic and histopathological analyses were performed with liver samples taken from rats treated with cisplatin-induced hepatotoxicity and from rats in the control group.
As a result of histopathological analysis, liver damage was examined regarding hepatocyte degeneration and sinusoidal dilatation. In the sections where the H-E staining method was applied, the liver had a normal histological appearance except for mild changes in the control group. In contrast, hepatocyte degeneration was similar to the control group in the hepatotoxicity group, while a significant increase in sinusoidal dilatation was observed. In a previous study, a difference was observed in sinusoidal dilatation in the CIS group after HE staining was performed in the CIS platinum group [32]. Another study reported that the group in which CIS platinum was applied after HE staining showed massive hepatotoxicity compared to the control group. In addition, liver hepatocytes have been shown to show pycnotic nuclei with irregular nuclear membranes, while their cytoplasm contains vesicular rough endoplasmic reticulum and vestigial mitochondria with undifferentiated cisterns [33].
Heat shock proteins (HSPs) are a set of evolutionarily conserved molecules found in almost all living organisms [34,35]. HSP60 is a chaperone found in all mammalian cells and tissues, including the liver. This HSP performs many physiological functions not limited to its canonical cellular location in mitochondria [36,37]. It promotes mitochondrial protein folding and aids in the proteolytic degradation of denatured or abnormally folded proteins in an ATP-dependent manner [38]. HSP involves many physiological events but can be pathogenic in various conditions, including cancer and neurodegenerative diseases [39,40]. Variations in expression levels of HSP60 have been associated with multiple diseases and cancers, including hepatocellular carcinoma (HCC). Recent reports highlight the role and significance of HSP60 in human cancer development and management, whereby its targeting has produced promising therapeutic results [41,42]. It is known that cells rapidly produce several proteins known as heat shock proteins (HSP) and other different proteins in response to oxidative stress, which is observed as one of the essential mechanisms in liver damage caused by cisplatin [43,44]. According to immunohistochemical analysis, in the current study, Hsp60 immunoreactivity was distinguished by brownish staining in hepatocyte cytoplasm. Hsp60 immunoreactivity was mild in sections belonging to the control group. It was determined that cisplatin administration increased Hsp60 immunoreactivity in hepatocytes, and this increase was statistically significant compared to the control group.
This study used transcriptomic data obtained from liver tissues of rats treated with hepatotoxicity and control group rats for the relevant analysis. There are 16,386 lncRNA expressions in the acquired data set. According to the findings of the bioinformatics analysis, lncRNA with the id rna-XR_001840627.2 had a very high gene expression in the hepatotoxicity group compared to the control group. Similarly, rna-XR_005499199.1, rna-XR_005496955.1, rna-XR_005497501.1, rna-XR_001835923.2, rna-XR_005506070.1, rna-XR_005495875.1, rna-XR_005497501.1, rna-XR_005506070.1, rna-XR_005494195.1, and rna-XR_005503371.1, lncRNAs with id had very high gene expressions in the hepatotoxicity group compared to the control group. In an experimental study, lncRNAs were examined in MC-LR-induced hepatotoxicity, and the expression levels of three lncRNAs were found to be significantly increased in all treatment groups [45]. In another study, lncRNAs are known to play essential roles in chemical-induced adverse effects and liver disease as well as HCC. The change in the expression profile of hepatic lncRNAs was investigated in a mouse model exposed to 1,2-DCE [46]. lncRNAs with id rna-XR_005493896.1, rna-XR_005488982.1, rna-XR_005486345.1, rna-XR_005504727.1, rna-XR_005493990.1, rna-XR_005488390.1, rna-XR_005501078.1, rna-rna5490 -XR_005502677.1, and rna-XR_001837692.2 had low gene expression in the hepatotoxicity group compared to the control group.
According to the results of the biostatistics analysis performed with 17 lncRNAs selected by the elastic net variable selection method used in the study, a significant difference was found between the two groups for all lncRNAs, and the calculated OR values also support that these lncRNAs may be discriminative RNAs for the two groups. When the performance metrics obtained from the ensemble learning models using the selected 17 lncRNAs as input variables and taking the hepatotoxicity status as the target variable were examined, it was found that the stacking technique produced higher results than the other two methods. Performance metrics results show that the stacking method, one of the proposed models, can correctly classify hepatotoxicity when the high-performance metric values obtained in the classification of the two groups are considered. The rna-XR_005492522.1 (LOC120096269), rna-XR_005492536.1 (LOC120096276), and rna-XR_005505831.1 (LOC102557053) lncRNAs, which have the highest three significance values according to the significance of the variables obtained as a result of the stacking modeling, are for hepatotoxicity as a result of extensive studies and can be used as predictive biomarker candidates.
The present study may have some limitations. The results of the current study may illustrate changes in the lncRNA-based regulatory network associated with cisplatin treatment; however, it may have guiding potential for biomarker discovery and personalization of the treatment. From this aspect, it may be a limitation that this research is performed on experimental animals and forms a basis/background for further experimental research. In order to achieve the objectives examined in this study, experimental procedures could be performed on certain experimental animals due to restrictive factors. Therefore, more extensive experimental/clinical studies are necessary for the following periods.

Conclusions
Among the ensemble algorithms, the stacking technique yielded higher performance results as compared to the bagging and boosting methods on the transcriptomic data. For the potential biomarkers discovered from the current study to be used in the case of druginduced hepatotoxicity, the achieved results need to be supported by different extensive studies. Once the accuracy of the obtained biomarker candidates is determined, possible treatment and diagnosis options can be personalized, and potential diagnostic procedures can be performed easily, quickly, and effectively after being confirmed in clinical trials.