The unsupervised machine learning to analyze the use strategy of statins for ischaemic stroke patients with elevated transaminase

Background and purpose: Statins could elevate hepatic transaminase in ischemic stroke patients. There needed to be more evidence on which method stopped statins or adjusting the dose of statins was better for patients. And no evidence showed which way more suit for some patients. Methods: We collected ischaemic stroke patients with elevated hepatic transaminase when they take statins. The outcome was a recurrent stroke rate, transaminase value after stopping or adjusted, mortality, and favorable functional outcome (FFO). We compare outcome events between the stopped group and the adjustment group. We grouped all patients by unsupervised machine learning and analyzed data characters by the different groups. Results: The patients stopping statins had a higher stroke recurrence and rate of FFO (mRS 0 – 2), a lower mean value of transaminase, and mortality. By difference unsupervised machine learning group, the km2 group had the lowest stroke recurrence (p = 0.046), lowest mortality (p = 0.049), and highest FFO (p = 0.023). The patients of the km2 group were younger (p < 0.001), more male (p < 0.001), had lesser National Institutes of Health Stroke Scale (NIHSS) scores (p < 0.001), and had slightly higher values of blood pressure (p = 0.002). The group of unsupervised machine learning could improve models ’ performance. Conclusion: For ischemic patients with elevated hepatic transaminase, stopping statins temporarily was a better choice of treatment strategy. These patients who were younger, male, with a lesser NIHSS score at admission and a slightly higher blood lipid value at admission, could have had a better prognosis.


Introduction
For ischemic stroke patients, using statins could prevent the occurrence and recurrence of stroke [1,2]. And they used statins related to better functional outcomes of ischemic stroke patients [3,4]. However, some studies showed that adverse effects such as muscle problems and elevated hepatic transaminase were raised when patients take statins [5,6].
It was controversial whether continuously use statins for ischemic stroke patients with elevated hepatic transaminase [7]. Especially Asian patients more easily had an elevated hepatic transaminase when they used statins [8]. Asian patients usually take a low dose of statins [9]. These conditions limited Asian patients from adopting the methods of adjusting statins' dose. For Asian patients with an elevated hepatic transaminase when taking statins, the stopped statins were usually used.
However, there was no evidence of which method was better for Asian patients.
Therefore, our study compares which methods of stopping statins or adjusting the dose or type of statins were better for Asian patients. The conventional clinical statistic could compare the difference between the two methods. But we not only want to compare the two methods but also want to know which method suits which part of patients. It was almost an impossible mission for conventional statistics [10]. The problem could be solved by clustering analysis-an unsupervised machine learning method [11]. The unsupervised machine learning methods could classify patients by character difference [11]. Different classes of patients could choose a different therapeutic schedule for a better prognosis.
Our study explored whether stopping statins or adjusting the dose or type of statins was more suit for ischemic stroke patients with elevated hepatic transaminase in Asia with traditional statistics methods. We further researched which methods were better for some patients by unsupervised machine learning methods. At last, we further investigate the relationship between the class of unsupervised machine learning and prognosis.

Patients
The study is a prospective cohort study. We consecutively recruited acute ischemic stroke patients with elevated hepatic transaminase after onset, and they also took statins. All patients were from the Neurology Department and Rehabilitation Department of the Affiliated Hospital of Youjiang Medical University for Nationalities. The patients were enrolled from June 1, 2018, to May 30, 2022, and followed up until August 30, 2022. All patients had neurological image examinations that met the WHO stroke diagnostic criteria.
When the patients' transaminase had elevated, some patients who stopped statins were assigned to the stopped group. And those who adjusted the dose or type of statins were assigned to the adjustment group. The attending doctor takes stops or adjusts statins therapeutic schedules for patients' various conditions. The time range of stopping statins drugs was 15-30 days for patients with different situations. The dose of statins in the adjustment group was 5 mg rosuvastatin, 5 mg simvastatin, or 10 mg atorvastatin. The type of statins in the adjustment group also was altered to another in the three statins (rosuvastatin, simvastatin, or atorvastatin). Considering a 10% loss to follow-up, we initially recruited more than 130 patients based on a previous article [12].
The inclusion criteria were as follows:(1) aged 18 years or older; (2) having received statins and other conventional therapy after admission; (3) elevated transaminase occurred after onset.
The exclusion criteria were as follows: (1) patients with a recent history of elevated transaminase before onset; (2) taken other lipidlowering drugs such as fenofibrate; (3) patients with intracerebral haemorrhage, subarachnoid haemorrhage or severe systemic disease. (4) patients withdraw study or cannot provide outcome events.
The study was performed in accordance with the Declaration of Helsinki and the ethical standards of the institutional and national research committees. The study was approved by the Ethics Committee of the Affiliated Hospital of Youjiang Medical University for Nationalities (KY-2017-02).

Data collected and outcome
We collected data on demographic characteristics and medical history from patients or relatives by structured questionnaires. In addition, we collected clinical character and laboratory data from electronic clinical records. We first collected the value of transaminase at admin. And then, we collected the transaminase value after patients took statins and had an elevated transaminase within one month. TOAST (Trial of Org 10172 in Acute Stroke Treatment) criteria were adopted for the stroke subtypes. Hemorrhage events include intracerebral haemorrhage (ICH) and gastrointestinal haemorrhage.
The primary outcome was a recurrent stroke rate within 90 days after onset. The second outcome included transaminase value after stopped or adjusted statins 15 days, mortality within 90 days after onset, and favorable functional outcome at 90 days after onset. The favorable functional outcome (FFO) was defined as mRs (Modified Rankin Scale) < =2 at 90 days after admission. The unfavorable functional outcome (UFO) was defined as mRs> 2 at 90 days after entry. The score of clinical scale and outcome events were determined by experienced neurologists blind to the patients' group.

Statistical analysis
We did statistical analysis through SPSS 23.0 for Windows. To process baseline data, we analyze continuous variables following a normal distribution by a t-test for two groups and an ANOVA method for multiple groups. We used a non-parametric test to process data when these continuous variables following by abnormally distributed. For categorical data and ranked data, we used a chi-square test to analyze.
We used the chi-square test to compare different groups' recurrent stroke rates, mortality, and FFO. For the transaminase value after stopped or adjusted statins for 15 days, we compare these data by t-test. To screen factors for the machine learning model, we selected factor (P < =0.1 or factor with clinical significance) by univariate logistic regression analysis or univariate linear regression analysis.

Classify data by unsupervised machine learning
We did Unsupervised machine learning by Python 3.80. To classify different characters of patients, we analyzed all baseline data by cluster analysis. First, We standardize all data (StandardScaler module, sklearn library). Then, we used principal component analysis (PCA) to reduce the data's dimensions to two-dimension (PCA module, sklearn library). Then we could draw a scatter plot for the two-dimension data to show the character of patients.
We classify data by a type of cluster analysis model K-means methods (K-Means module, sklearn library). To choose the best number of groups in the K-means method, we draw a line chart by Silhouette score. The line chart suggested a better number of groups. Then we draw a scatter plot by the number of groups. We could further evaluate whether the number of groups in the K-means methods was reasonable by the scatter plot. Finally, we grouped the data by the better number of groups in the K-means methods.
We classify data by another type of cluster analysis model -Hierarchical Clustering methods (AgglomerativeClustering module, sklearn library). First, we draw a heatmap by the character of the data. Next, we choose the best number of groups in Hierarchical Clustering through the hot map. Then we grouped the data by the better number of groups in the Hierarchical Clustering methods.
We compare outcome events by the above difference group. The more events between groups had a significant difference, the better for grouping. Then we further analyzed the different characteristics of factors by the selected group. Finally, for multiple groups， we compare.

Analyzed data by supervised machine learning
We first compare the difference of character between mentioned groups. Then, we used a t-test or chi-square test to examine the different data types.
And we then used machine learning models, including logistic regression methods, support vector machine (SVM), and decision tree methods to train baseline data. After standardizing baseline data, we used three-fourths of the data to train models and one-fourth to test models. First, we compare model performance through precision and area under the curve (AUC) plot, and then we choose a better model for recurrent stroke, mortality, and FFO. Then we used better models to train all data. Then we compare the model performance between baseline data models and all data. To evaluate the importance of different factors in models.

Patients
The study initially recruited 161 participants. Five patients withdrew from the study, seven according to exclusion criteria, and 15 were lost to follow-up. Finally, the data analysis includes 134 eligible patients. The stopped group consists of 95 patients, and the adjustment group consists of 39 patients (The flowchart of the patient selection in supplementary- Fig. 1).
The mean age of all patients was 66.92 ± 14.154 years. The patients include 78 males (58.2%) and 56 females (41.8%). The stopped group had a higher mean value of the international normalized ratio (INR) (1.06) at admission than the adjustment group (0.98). The other baseline data in the two groups had not shown a significant difference ( Table 1).

Outcomes
The stopped group had a higher rate of recurrent stroke (20.5%) than the adjustment group (13.7%) (P = 0.043). The stopped group had a lower mean value of transaminase after stopped or adjusted statins 15 days (43.99) than the adjustment group (45.18) (P = 0.804). The stopped group had lower mortality (13.7%) than the adjustment group (20.5%) (P = 0.021). The stopped group had a higher rate of FFO (54.7%) than the adjustment group (46.2%) (P = 0.366).
In the univariate logistic regression analysis, we could not acquire the relationship between risk factors and the recurrence of stroke events and mortality. When we analyzed the risk factor of UFO, we found that older age (OR=1.050, p = 0.001), hemorrhage events in admission (OR=4.127, p = 0.009), higher NIHSS scores at admission (OR=2.256, p < 0.001) and cardioembolic (OR=2.229, p = 0.023) were related to the UFO at 90 days after admission. A higher value of glutamic-pyruvic transaminase (ALT) (OR=0.960, p = 0.008) at entry and a higher value of glutamic-pyruvic transaminase (ALT) at stopped or adjusted statins (OR=0.960, p = 0.008) were negative correlation with the UFO at 90 days after admission. We would include the six factors in supervised machine learning models for FFO.
In the univariate linear regression analysis, when we analyzed risk factors of transaminase after stopped or adjusted statins 15 days, we found that older age (p = 0.002), female (p = 0.008), and double antiplatelet (p = 0.016), history of smoking (p < 0.001), history of drinking (p < 0.001), history of diabetes (p = 0.015), history of coronary heart disease (CHD) (p = 0.015), history of taking the anticoagulant drug (p = 0.015), the value of glutamic-pyruvic transaminase (ALT) at admission (p < 0.001) and value of glutamic-pyruvic transaminase (ALT) at stopped or adjust statins (p < 0.001) were related to the higher value of transaminase after stopped or adjusted statins 15 days. These factors are all included in Multifactor linear regression models (R 2 =0.798).

Grouped by unsupervised machine learning
The two-dimension scatter plot of data showed the distribution character ( Fig. 1-1). The line chart of the Silhouette score showed that the 3 group was the better choice for k-means methods ( Fig. 1-2). The three k-means groups' scatter plots showed that every group's data had a clear distinction in the plot ( Fig. 1-3). Therefore, the data were grouped into the km1 group, km2 group, and km3 group by k-means. The hot map showed that the 2 group was better for the Hierarchical Clustering methods (Fig. 1-4). Therefore, the data were grouped into the hc1 and hc2 groups by Hierarchical Clustering methods. To analyze the outcome events through the group of unsupervised machine learning, the outcome events in the hc1 group and the hc2 group had no statistical difference. The km3 group had the highest rate of stroke recurrence in the three groups, and the km2 group had the lowest rate of stroke recurrence in the three groups (p = 0.046). The km1 group had the highest mortality of the three groups, and the km2 group had the lowest mortality of the three groups (p = 0.049). The km2 group had the highest rate of FFO in the three groups, and the km1 group had the lowest rate of FFO in the three groups (p = 0.023). The difference showed in Fig. 2. The km1 group's transaminase value after stopped or adjusting statins for 15 days was 40.77 ± 18.926, the km2 group was 46.34 ± 26.531, and the km3 group was 48.21 ± 43.409. The difference had not statistically significant (p = 0.404). Therefore, we further analyze data by km1 group, km2 group, and km 3 group.

Analyzed data by a group of km1 group, km2 group, and km 3 group
As shown in Table 2, the age, gender, rate of stopped statins, admission NIHSS score, systolic blood pressure at admission, diastolic blood pressure at admission, endovascular treatment, double antiplatelet, Cardioembolic Stroke, history of smoking, history of drinking, history of Stroke, history of hypertension, history of diabetes mellitus, history of coronary heart disease, history of antiplatelet drug, history of statins, history of antihypertensive, history of hypoglycaemic, the value of INR at admission, the value of creatinine at admission, the value of triglyceride at admission, the value of total cholesterol at admission, the value of LDL-C (low-density lipoprotein) at admission all had a statistic difference in the km1 group, km2 group, and km 3group.
The model performance of logistic regression methods was better than SVM and decision trees methods. In logistic regression models for stroke recurrence, the value of AUC and precision was better when we added the k-means group factor in processes (Figure 3-1). In logistic regression models for mortality, the value of AUC did not alter, and precision was better when we added the k-means group factor in the methods (Figure 3-2). In logistic regression models for FFO, the value of AUC and precision was better when we added the k-means group factor in the methods (Fig. 3-3). The R 2 (0.799) had not improved in Multifactor linear regression models when we added the k-means group factor in models to analyze transaminase after stopped or adjusted statins for 15 days.

Discussion
We found that the stopped group had a higher rate of recurrent stroke and rate of FFO, a lower mean value of transaminase after stopping or adjusted statins for 15 days, and lower mortality than the adjustment group. The difference between stopping statins and adjusting the dose or type of statins was unrelated to outcome events. The km2 group had the lowest recurrent stroke, mortality, and FFO. The Km1 group had the highest mortality and lowest FFO. The Km3 group had the highest stroke recurrent. Many baseline data had statistical differences in the km1 group, km2 group, and km3 group. In logistic regression methods for stroke recurrent, mortality, and FFO, the performance of  2. Difference rates of outcome events between the km1 group, km2 group, and km3 group. P = 0.046 for stroke recurrence, P = 0.049 for mortality, and P = 0.023 for FFO; all differences had significance. models improved when we added the k-means group factor in the models. However, the clustering group of machine learning could not have a medical meaning. Therefore, we should be cautious about the results.
Our results suggested that stopping statins temporarily had a better prognosis than adjusting the dose or type of statins for Asian ischemic stroke patients with elevated hepatic transaminase. It was reasonable that the stopped group patients had a lower mean value of transaminase after stopping or adjusting statins for 15 days. Because many studies suggested that taking statins is related to elevated hepatic transaminase [5,6]. And our results indicated that stopped group patients had a better functional outcome than the adjustment group. These results were not consistence with other studies [5,6]. These studies showed that continuous statins could be better for ischemic stroke patients with elevated  hepatic transaminase [5,6]. The contradiction could stem from two reasons. First, the days of stopping statins in our study were 15-30 days. The lesser discontinuous days might have a minor effect on the prognosis of stroke patients. Second, the Asian patients more easily had an elevated hepatic transaminase and usually took a lower statin dose in clinical [8,9]. The adjustment range of statins was a limitation for Asian patients. And the effect of adjustment was limited by the character of Asian patients. Therefore, stopping statins temporarily might be a method for Asian patients. But the different strategy of statins was not related to outcome events. Therefore, our results need to be further proved by more studies. At least stopping statins temporarily was a choice for Asian ischemic stroke patients with elevated hepatic transaminase. The km2 group had the best prognosis, the km1 group had the worst prognosis, and the km3 group had the most recurrent stroke. That the models, including the k-means group, had a better performance suggested that the k-means group related to the prognosis of ischemic stroke patients.
The best prognosis group (km2 group) patients had factors such as youngest, lowest NIHSS score at admission, fewest history of medical and taking drugs, and the highest rate of large atherosclerotic stroke. These factors were in accordance with other studies [13,14]. The km2 group had the highest rate history of smoking and drinking for the highest rate of males. Some studies also suggested that male ischemic stroke patients had a better prognosis [9,15]. The km2 group had higher blood pressure at admission, indicating that a higher value of blood pressure at admission could be related to a better prognosis. The results could stem from that slightly higher blood pressure, bringing better blood perfusion for the brain. And these patients with higher blood pressure at admission might not have taken antihypertensive for they had not a history of hypertension. The higher value of blood lipids in the km2 group caused the higher rate of patients who stopped statins. Some studies suggested higher blood lipid values related to lower ischemic stroke mortality [16]. Whether slightly higher blood pressure and value of blood lipid were related to better prognosis of ischemic stroke patients needs more study to prove.
The character of the worse prognosis group (km1 group) was almost opposite to the km2 group. Therefore, it was reasonable that the km1 group had a worse prognosis. The highest rate of cardioembolic patients in the km1 group suggested that cardioembolic patients had a worse prognosis than large atherosclerotic stroke patients [9]. The km1 group had not the highest rate of stroke recurrent of the three groups because the km1 group had the highest rate of patients in the adjustment group. The adjustment group had a lower rate of stroke recurrent than the stopped group for continuously taking statins [5]. Of the three groups, the km1 group had the lowest transaminase value after stopping or adjusting statins for 15 days. But the difference had not a statistic significant. The results showed that the elevated transaminase did not affect the prognosis of patients. It was consistent with other studies [5,6].
The outcome of the km3 group was close to the km1 group. The rate of stopped statins had no significant difference between the km2 and km3 groups. Therefore, other factors had a significant effect on prognosis. The km3 group had the highest rate of history of medicine. The km3 group had a higher age and the highest creatinine value at admission. These all suggested that km3 group patients had worse basic conditions than the other two groups. The NIHSS score at the admission of the km3 group was close to the km2 group. Therefore, the older history of illness could have a more significant effect than the neurological impairment at admission for ischemic stroke patients. The lowest rate of history of drinking in the km3 group showed that drinking might have a relationship with better prognosis of ischemic stroke patients. The results also were in accord with other study [17].
There had several limitations in our study. First, the number of adjustment group patients was lesser for the low-dose statins usually used in Asian patients. But the condition was in accord with the real world in Asia. Second, the univariate logistic regression models were insignificant for recurrent stroke and mortality. Therefore, the difference between stopped and adjustment groups must be cautiously explained. Finally, the number of patients needed to be increased for machine learning, but the machine learning methods further proved the significance of the results.
In conclusion, for Asian ischemic stroke patients with elevated hepatic transaminase when they take statins, stopping statins temporarily was a feasible choice. For these ischemic patients, stopping statins temporarily was a better choice. The patients with stopping statins temporarily who were younger, male, with a lesser NIHSS score at admission, with the fewest history of the medical, with a slightly higher value of blood pressure, and a somewhat higher blood lipid at admission could have had a better prognosis.

Ethics approval and consent to participate
The study was performed in accordance with the Declaration of Helsinki and the ethical standards of the institutional and national research committees. The study was approved by the Ethics Committee of the Affiliated Hospital of Youjiang Medical University for Nationalities (KY-2017-02).

Funding
This study was supported by the Guangxi Zhuang Autonomous Region Health and Family planning Commission (Z-B20221499), Liuzhou Scientific Research Technological Development Programs (2022CAC0118), and Specific Research Project of Guangxi for Research Bases and Talents (AD23026241).

Declaration of Competing Interest
Chaohua Cui, Yuchuan Li, Shaohui Liu, Ping Wang, and Zhonghua Huang declared that they have no potential conflicts of interest that might be relevant to the contents of this manuscript.