Development and Validation of a Personalized Model With Transfer Learning for Acute Kidney Injury Risk Estimation Using Electronic Health Records

This diagnostic study assesses the approaches to estimating acute kidney injury risk in hospitalized patients and proposes a novel model that uses machine learning.

For each inpatient encounter, we extracted 1892 structured EHR variables including demographics, vital signs, medications, medical history, admission diagnoses, and laboratory tests that represent comorbidities correlated with AKI 1 (eTable 1). We did not include SCr/eGFR as predictors because they determine the outcome. Medications were normalized to RxNorm ingredients. Admission diagnoses were represented using All Patients Refined Diagnosis Related Group (APR-DRG). Medical history was captured as major diagnoses in the Clinical Classifications Software (CCS, https://www.hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp) for ICD-9-CM mapping.
The prediction point was 1-day prior to onset for AKI patients and 1-day prior to the last SCr record for non-AKI patients. Most recent vitals and lab values recorded before the prediction point were used. Vitals were categorized using standard ranges and missing values were treated as a unique category. Labs were categorized as "unknown", "present-and-normal", or "present-and-abnormal". Medication exposure was a binary variable, "true" for medications taken within 7-days before the prediction point. Medical history was represented as presence/absence of a major diagnosis before the prediction point. Demographics and admission diagnoses were also binary variables. To achieve transfer learning between global and personalized logistic regression, we proposed a method inspired by widely used Finetune method for deep learning. Specifically, logistic regression is regarded as a special case of neural network, i.e., a neural network with only one hidden layer (Supplementary Text 3). Thus, the transfer learning method is similar to Finetune for logistic regression, i.e., multiplying the sample value of each feature by the regression coefficient of the feature in the global logistic regression model: Here refers to the feature vector of the original sample, ′ refers to feature vector after transfer learning, and is the coefficient vector of features in the global logistic regression model, i.e., the knowledge to be transferred. Proposed transfer learning can provide a warm start for model training, that is tune learning speed and regularization loss for each feature based on its importance in the source domain. We explained these mechanisms in detail and presented experiment results in Supplementary Text 3.
Personalized modeling: Personalized logistic regression model for a patient is trained with the k similar sample selected by k-NN in similar sample matching. As samples more similar to the target patient would carry more valuable information for the risk estimation, sample weighting is considered when calculating the log-loss function. The weight of the training sample depends on its distance from the target sample, i.e.: Here k represents k similar samples selected by k-NN, and , refers to the weight of similar samples j when modeling personalized logistic regression for target patient i. ℎ ( ) is the predicted probability of the personalized logistic regression model for similar sample j.
, _ refers to the distance between the target sample i and its most similar sample. is a very small value to prevent the numerator or denominator from being 0.
Similarity measure optimization: In each iteration of the training process, after personalized models for each randomly selected target patient were built, we evaluated performance of the personalized models based on current similarity measure: = (ℎ ( ) − ) 2 (6) As personalized models are built based on similar samples calculated using the current similarity measure, we assume that Error is generated by the mismatch of similar samples for the target patient. To identify predictors of the mismatch, we calculated the average difference between each target sample and its similar samples for each feature, i.e.: = [ ( ,1 − − ,1 ), ( ,2 − − ,2 ) … ( , − − , )] Here records the average distance between the target patient i and all its similar samples i-similar for each feature, and ( , − − , ) refers to the average distance for the mth feature. For the interpretability of the measure and the complexity of optimization, we assume that the estimated Error for a target patient is linearly related to the average distance between the target and its similar sample, i.e. = ( ) = * (8) Here represents the coefficient of , which reflects how many estimation errors will be generated when average distance on each feature changed. These coefficients can also reflect importance of features in sample matching. Therefore, the final similarity measure can be determined if we can determine . Different from existing metric learning methods for classification tasks 5 , after each iteration, we need to update the similarity measure by rematching similar samples and rebuilding personalized model to reevaluate current similarity measure, which led to high complexity of similarity measure optimization. Thus, an efficient gradient method was used to optimize the similarity measure. The optimization target is: Where ∑ is the sum of the errors of the personalized models over all target samples, 2 2 is the regularization term, and c is the regularization strength. Specifically, in the (n+1)th iteration of training, we would randomly select N samples from the training samples as target samples, and then match similar samples and build personalized models for each target using current similarity measure. After that, based on each target's estimation Error and its average distance to its similar sample, we would update similarity measure using Batch Gradient Descent: Considering the nature of the similarity measure , weight of each feature in similarity measure should fulfill ≥0. Therefore, after each iteration of similarity measure optimization, if weights of a feature m, i.e., <0, we believe that is the result of overfitting. Therefore, at the end of each iteration: Setting of PMTL: Logistic regression was performed using Python version 3.7.4 and scikit-learn package version 0.19.2 with default hyper-parameters. In the process of sample weighting for the weighted logistic regression, minimum was set to 0.01. According to results on the validation set, tuned hyperparameters by the gradient approach for similarity measure learning were learning rate 0.01, batch size 1000, regularization strength 0.05, and times of iteration 50. According to results showed in eTable 4 & 5, initial weights of features in similarity measure are based on their coefficients in the global logistic regression model, i.e., absolute value of each feature's coefficient divided by the sum of absolute value of all coefficients. Considering the size of the total samples and the major types of AKI mechanisms, we evaluated the size of similar samples matched by k-NN as 20%, 10%, and 5% of training sample, and PMTL trained with 10% of training sample perform best in validation set.

eAppendix 3. Mechanism of Transfer Learning
As mentioned in Materials and Methods section, proposed transfer learning (as the figure below) can provide a warm start for modeling, increase learning speed, and tune regularization loss for each feature based on its importance in source domain. Here, we explain these mechanisms in detail.

Finetune in Deep Neural Network
Source Domain:

Parameters learned in source domain and used in transfer learning Parameters learned in target domain
Transfer learning provides a warm start for modeling: It is straight forward that proposed transfer learning approach can provide a warm start for modeling as data in target domain is multiplied by coefficients in source domain (eFigure 2). If coefficients were not updated based on target domain, then sum of features value after transfer learning would be equal to prediction score generated by model in source domain.
Transfer learning tunes regularization strengths of features: Mechanism for tuning regularization loss can be understood through the following case. Suppose a logistic regression model is: = (12) Here is coefficient of predictor when predicting . Then we multiply by , i.e. ' = . To keep the two sides of the equation equal, equation should be modified to: That means new coefficient ′ = 1 . In the case of multiple variable regression, it is: The optimization objective of logistic regression is to minimize: Here measured the prediction performance of model in training set, and measured the complexity of coefficients. L1 regularization is ∑ | |, and L2 regularization (used in this study) is: (16) And tunes the weight of in optimization. Higher complexity of coefficients means more predictors are considered by a model. Ideally, the model can adapt to more complex situation, training performance will increase; but in many cases, it will cause overfitting, many ineffective factors are considered, and model performance decrease significantly in test set. Regularization loss is a common approach to avoid overfitting by punishing complexity of model.
Returning to our transfer learning, suppose vector = ( 1 , 2 , … ) is the optimized coefficients for model in target domain and = ( 1 , 2 , … ) is coefficients of model in source domain. Then, to keep optimized coefficients in target domain, regularization loss will tune to ∑ | 1 | for L1 regularization, and L2 regularization used in this study is: We can observe that if a predictor has a higher absolute value of coefficient in source domain model, its regularization loss will be smaller when modeling in target domain. That means model will pay less attention to coefficient complexity of this predictor.
Transfer learning tune learning speeds of features: To explain how transfer tune learning speed for each feature, we take gradient optimization, a common and classic coefficient learning approach for logistic regression, as an example. In each iteration, change of coefficient of factor based on training on sample is: Here, * ∑ ( − ) =1 is designed to optimized coefficients based on prediction performance according to gradient of logloss, is true classification (i.e. with or without AKI) of sample , and is prediction probability generated by logistic regression, is learning rate for all factors. * is gradient of regularization loss as we have mentioned above. And after transfer learning, i.e. ′ = , the ∆ will change to: We can observe that if a predictor has a higher absolute value of coefficient in source domain model, its learning speed will be higher in optimization, more gradient will be assigned to this factor, and its final coefficient is probably higher.

Effect of transfer learning to overfitting:
To show the effect of transfer learning to overfitting, we tuned parameter of regularization strength (i.e., ) to estimated performance of subgroup models for top-20 high-risk subgroups under different model complexity (as the figure below, coefficients in models with transfer learning are calculated by multiplying coefficients in global model and coefficients in models for data after transfer learning). We observed two phenomena. First, comparing curves of models without transfer, the peak of curves of models with transfer learning moved towards the right upper portion. In other words, the optimized complexity of models and the best performance of models under optimized complexity were both higher in the case when transfer learning is considered. That means after overfitting is mitigated by transfer learning, models can improve their performance by increasing their complexity. Second, when complexity of model excess optimized complexity (i.e. overfitting), models with transfer learning always perform better in the same complexity. The outperformance of model with transfer learning increase with higher model complexity.
Above results show that by mitigating overfitting with transfer learning, model can perform better and become more robust to parameter of model complexity. Considering parameter tuning is always time consuming, this advantage is very important for personalized modeling.
However, Delong test cannot be used in comparison between PMTL and previous models as we do not have access to the raw data from the published studies. However, according to basic concept of Delong test, we can estimate the p value based on Z-test (twoside) in the case AUROC variation of the two model is known. Thus, we estimated AUROC variation of PMTL based on Delong test, while AUROCs and their variation of previous models were based on reported AUROC and its 95%CI in literatures, covariations of model performances were not considered in this case, i.e.: The AUPRC comparisons were based on Z-test (two-sided). variation of AUPRC and covariation of AUPRC between models were calculated by resampling (with replacement) test data and recalculating the model performance 200 time. The final Z-score between two models was calculated as: To justify the choices of the Z-test, we used normal test (null hypothesis: a sample comes from a normal distribution; based on python package: scipy.stats.normaltest) to test variation of AUPRC with resampling (with replacement) test data. We found the null hypothesis cannot be reject in most cases (see Table below, each experiment was repeated 10 times). as an independent study. The study-level effect size of each target variable was calculated based on its coefficients from the personalized models of patients that had the variable recorded. There are two rationales behind this. First, coefficient of a factor and its changes are meaningful only when the factor information is recorded for the target patient. Second, due to serious data imbalance in medical data, coefficient of a factor in a personalized model may "unexpectedly" be 0 just because a factor is missing in similar samples. The remaining variables of target patients were treated as study-level covariates because similar patients are matched based on those factors for the target patients. We did not use average value of the variables in similar patients as covariates because many factors may occur in similar samples but not in the target sample, thus many false positives may result from analysis based on averaging across samples. Further averaging across samples would increase multicollinearity among factors.
We observed that coefficients of diseases in meta-regression are often insignificant when drug information is considered because of the small sample size of admission diagnoses and collinearity between diseases and drugs. Thus, we implemented two strategies for meta-regression analyses. The first strategy was to examine potential interactions between target predictors and diseases. Thus, we excluded the 1271 medication variables, and performed meta-regression on the remaining 621 variables of demographics, vital signs, lab test, admission diagnosis, and medical history. The second strategy was to examine the potential interactions of target predictors with drugs or conditions related to drugs, and all features were considered.
With literature review, not all significant interactions are known and have been studied in existing research. However, different personalized models are not completely independent because they may share subset of similar patients. So, we used subgroup analysis to verify the interactions found by meta-regression. Specifically, we divided patients into different subgroups by controlling moderator found by meta-regression and compared the effect of target predictor between patients exposed to the moderator and the remaining according to its coefficient in logistic regression model (subgroup model in eTable 17 & 18) or odds ratio (OR) calculated from the raw data directly (subgroup analysis in eTable 17 & 18).
A major challenge in subgroup analysis for interaction between disease and target predictors (eTable 17) is the limited number of samples. So, we aggregated similar significant admission diagnoses into large subgroups. To find more significant admission diagnoses and improve the sample size, the threshold of significance was set to p<0.01 for single variable meta-regression analysis and p<0.05 for multiple variable analysis. Although many potential interactions are found, we primarily verified the large subgroups containing many significant results or show high effect (measure by estimates) in meta-regression. In cases where we are not sure to which large subgroup an admission diagnosis belongs, we performed analyses for multiple potential subgroups. In several cases where many similar admission diagnoses were significant in meta-regression, but result is not significant in the subgroup model (probably due to limited sample size), we had to include insignificant admission diagnoses that are similar to the significant ones to increase sample size.
To verified interactions between drugs and target predictors, controlling effects of diseases is necessary (eTable 18). If a drug was frequently used in patients with specific admission diagnoses, we divided patients into different subgroups by controlling the admission diagnoses. In meta-regression with medication information, the threshold of significance was set to p<0.01 for both single and multiple variable analysis. Although many potential interactions are found, as many potential subgroups need to be controlled, we mainly presented top-5 results that show highest effect in meta-regression (measure by estimates) or interesting interactions.

EFFECT OF AGE IN HETEROGENEOUS PATIENTS
Diseases related to coefficient change of age: Top-5 admission diagnoses correlated with effect improvement of age are cardiac surgeries (eTable 17). The result is also significantly verified by subgroup model and subgroup analysis. Meta-regression also showed other admission diagnoses for cardiovascular conditions to be related to effect improvement of age, but its effect was not supported by subgroup model nor subgroup analysis. This may be due to similar samples used for training personalized model for these patients containing many patients who had cardiac surgeries.
Previous research have shown infection induced AKI is more common in older adults 12,13 . However, we found coefficient of age decreased in infection patients. In the meta-regression without medication, 11 admission diagnoses of infection were significant and 4 of 11 are in Top-10. Results were also verified by subgroup model and subgroup analysis even we only consider patients of "Septicemia & Disseminated Infections".
"Bone Marrow Transplant" is the number one admission diagnoses related to decreasing coefficient of age. Other two types of admissions for major hematological disease were also significant. They were also significant in subgroup model and subgroup Moreover, antibiotics were significantly related to decreasing effect of age. Although not all results were significant in subgroup model, directions of the effect change were consistent in different subgroups for different antibiotics. Furthermore, used of glucose also significantly related to decreasing effect of age. It may indicate patients have no serious diabetes, a common risk factor for the older.

EFFECT OF SERUM CALCIUM IN HETEROGENEOUS PATIENTS
Diseases related to coefficient change of serum calcium: It is known that patients with abnormal serum calcium may present with various clinical signs and symptoms include cardiovascular manifestations. In meta-regression, the Top-6 admission diagnoses related to effect improvement of serum calcium are cardiac surgery. It is also significant in both subgroup model and subgroup analysis. Effect of serum calcium also increased in admissions with mechanical ventilation according to the results of all experiments. And a recent study showed patients with lower serum ionized calcium is associated with higher risk of acute respiratory failure 15 . Burn is also related to increased effect of serum calcium. Hypocalcemia is commonly complicated with burn, its severity is related to the severity of burn 16,17 . And serious burn is also related to higher risk of infection, dehydration and hypoxia. Additionally, 7 major surgeries were also significant in meta-regression, and the result was only significant in subgroup analysis.
"Cardiac Catheterization for Ischemic Heart Disease" was found to be significantly related to decreasing effect of serum calcium in all experiments. As a comparison, "PCI w/o AMI" showed significant relation to effect improvement of serum calcium in metaregression. Existing research shows the outcome of acute myocardial infarction is better in patients with higher serum calcium 18,19 . It is observed that the incidence rate of abnormal serum calcium is higher in admissions for liver diseases, orthopedic surgeries, and alimentary tract diseases. That may be because these conditions are related to absorption, decomposition, metabolism, and loss of calcium. However, the ORs of abnormal serum calcium decreased significantly in these subgroups. That means abnormal serum calcium may not increase the AKI risk. In addition, infection is a cause of hypocalcemia, but our experiment showed abnormal serum calcium will not increase AKI risk in infection patients. We found above results were supported by existing research 20,21 .
Medication related to coefficient change of serum calcium: Some cardiovascular medications showed significant relation to effect improvement of serum calcium. When we assessed the result in major cardiac surgery, interesting difference between aminocaproic acid and tranexamic acid were found again. Aminocaproic acid was significantly related to effect decrease in serum calcium, while tranexamic acid was related to effect improvement of serum calcium. However, frequency of abnormal serum calcium is much higher in cardiac surgery patients exposed to aminocaproic acid. Other cardiovascular medications including prochlorperazine, protamine sulfate and atropine were also significant.
Aldesleukin is the most significant medication related to decreasing OR of serum calcium. Among patients who were exposed to aldesleukin and had normal serum calcium, AKI incidence rate is 93% (107/115); while the AKI incidence rate is only 32.2% (47/146) in patients who were exposed to aldesleukin and had abnormal serum calcium. Hypoalbuminemia is one of common side effects of aldesleukin. However, only ionized calcium, not calcium link albumin (protein-bound calcium), is physiologically active. Thus, we hypothesized that this phenomenon may be caused by two factors: 1) if calcium supplement is used to address abnormal serum calcium while taking aldesleukin, it can lead to excessive ionized calcium; 2) if hypoalbuminemia occurred while serum calcium is normal, it may mean ionized calcium is elevated. However, both hypotheses were not supported by subgroup analysis. First, in patients with normal serum calcium and used calcium supplement, AKI incidence rate is 88% (43/ 49); among patients with normal serum calcium and did not used calcium supplement, AKI incidence rate is 97% (64/ 66); in patients with abnormal serum calcium and used calcium supplement, AKI incidence rate is 25% (15/ 61); in patients with abnormal serum calcium and did not used calcium supplement, AKI incidence rate is 38% (32/ 85). Second, in patients with normal serum calcium and albumin, AKI incidence rate is 98% (63/ 64); in patients with normal serum calcium and abnormal albumin, AKI incidence rate is 86% (44/ 51); only 6 patients with abnormal serum calcium and normal albumin, 4 of them have AKI; in patients with abnormal serum calcium and albumin, AKI incidence rate is 31% (43/ 140). Therefore, the interaction between serum calcium and aldesleukin still need further study.
Oxycodone and Fondaparinux were also found to be related to decreasing OR of serum calcium, and their effect were verified in subgroup of patient who underwent joint replacement.

AMINOCAPROIC ACID VS TRANEXAMIC ACID
In above analyses on age and serum calcium, we observed different effects between two types of amino acids antifibrinolytics: aminocaproic acid and tranexamic acid. Here, we aim to study the influence variation of aminocaproic acid and tranexamic acid on AKI incidence. Previous research compared the two drugs 22 and most studies compared the two drugs in cardiac surgery, and no significant difference in AKI incidence rate was found in most cases. However, in our data, among patients admitted for " In "Cardiac Valve Procedures", AKI incidence rates in those used aminocaproic acid and tranexamic acid were 22.4% and 19.8% (p<0.025); In "Coronary Bypass", AKI incidence rates were 23% and 19.1% (p<0.0005). After adjusting sample weight of patients who used aminocaproic acid based on admission distribution of patients who used tranexamic acid, AKI incidence rates in patients who used aminocaproic acid in the above two cardiac surgeries were 21.8% and 23.1%. This result indicates higher all-stage AKI risk in patients who used aminocaproic acid. After excluding patients admitted for major cardiac surgery and orthopedic surgery (1100 joint replacement patients exposed to tranexamic acid but only 1 exposed to aminocaproic acid), AKI risk was still higher in remaining patients who used aminocaproic acid (18.2% or 81/446 vs 9% or 29/321). Furthermore, we adjusted sample weight of patients using aminocaproic acid based on admission distribution of patients using tranexamic acid (if no patient use aminocaproic acid in a subgroup, we will exclude this subgroup in calculation of AKI risk), we found patients using aminocaproic acid still had higher AKI risk than those using tranexamic acid. If we did not exclude patients with major cardiac surgery, the difference was smaller but more significant (21.5%, se=0.96% vs 18.3%, se=1.63%, p≈0.05); while excluding patients with major cardiac surgery, the difference was larger but less significant (18%, se=2% vs 13.5%, se=3.2%, p≈0.12). The significance is probably influenced by small sample size. Drug combination also cannot explain the difference because only 4 patients used both drugs. The different result between this study and existing research may be due to following reasons: (1) Primary outcome of this study is all-stage AKI but it is stage-2 or higher or RRT (renal replacement therapy) in most existing research; (2) Many existing research compared the two drugs based on RCT (randomized controlled trial) while this is a retrospective study, so further analysis is warranted. Additionally, the different interaction effect of these two drugs with age and serum calcium was not studied in previous research and requires further research.

EFFECT OF BLOOD GLUCOSE IN HETEROGENEOUS PATIENTS
Diseases related to coefficient change of blood glucose: Many surgeries were observed to be related to the effect improvement of blood glucose in personalized models. Liver transplant was the number one reason according to meta-regression. And "Major Pancreas, Liver & Shunt Procedures" was also significant. Their results were almost significant in subgroup model and subgroup analysis; thus, caution should be taken and further investigation is needed. Total 9 admission reasons of gastrointestinal surgery were significant in meta-regression, 4 of them were in top-10, their results were also significant in both subgroup model and subgroup analysis. Joint replacement was also significantly related to effect improvement of blood glucose in all experiments. Coefficients of blood glucose decreased significantly in non-surgery cardiovascular diagnoses, according to both meta-regression and subgroup model.
Impact of different insulin on coefficient of blood glucose: Different types of insulin show different interaction effects with blood glucose. In meta-regression, insulin,aspart, human/rdna is the top-1 medication related to effect improvement of blood glucose (estimate=0.03), effect of insulin regular,human buffered was much weaker (estimate=0.014), while effect of insulin, isophane was negative (estimate=-0.011). Results were supported by subgroup model, abnormal blood glucose is more dangerous in cardiac surgery patients exposed to insulin,aspart, human/rdna, and coefficient of abnormal blood glucose decrease significantly in patients used insulin, isophane in many situations. Results indicates blood glucose control strategy in different situations may need further concern.
Other medication related to coefficient change of blood glucose: Use of glucose, fentanyl, lactate, and benzoic acid showed related to effect improvement of blood glucose. Among them, glucose can directly influence blood glucose; the use of fentanyl may indicate postoperative analgesia or severe diseases. Benzoic acid belongs to salicylates which is risky for patients with renal insufficiency. Further, previous researches have shown that benzoic acid and its derivatives may influence glucose metabolism [23][24][25] . And recent studies show combined lactate and glucose levels related to renal dysfunction and mortality 26,27 .
Paracetamol (includes acetaminophen) shows interesting interaction with blood glucose. In general, paracetamol is significant related to decrease coefficient of blood glucose. And existing research shows acetaminophen may influence glucose sensing 28,29 . However, paracetamol significantly related to effect improvement of blood glucose in patients of joint replacement. We failed to find out research studied their interaction.

EFFECT OF BMI IN HETEROGENEOUS PATIENTS
Diseases related to coefficient change of BMI: Respiratory conditions were the most important factors related to increased effect of BMI according to meta-regression: totally 11 admission diagnoses were significant and 5 of them were in top-10. Existing research has summarized the complex interaction between obesity and respiratory diseases 30 . However, our result was not verified by subgroup model and subgroup, which may require further study to verify the interaction. Leukemia showed significant relation to higher effect of BMI, the result was more significant when patients is overweight. Existing meta-analysis studies have also shown obesity/overweight to be significantly related to incidence of leukemia and outcome of leukemia patients [31][32][33][34] .
In meta-regression, top-4 admission diagnoses related to decrease in effect of BMI were "Uterine & Adnexa Procedures", and two other types of "Uterine & Adnexa Procedures" were also in top-10. Existing research reported relationship between obesity and complications after gynecological laparoscopic surgery as not significant 35,36 . And our subgroup model showed the decrease may be due to overweight patient. But its significance is close to threshold. We suggest further studies are needed.
Medication related to coefficient change of BMI: All of our experiments showed that tazobactam can increase the effect of obesity in infection. That may because obesity infection patients required higher dose of tazobactam and faced a higher risk of nephrotoxicity [37][38][39][40] . Insulin regular,human buffered was also related to effect improvement of BMI in cardiac surgery. Combination of obesity and hyperglycemia is a well-known risk factor for cardiac surgery. Rifaximin showed significantly related to decrease in coefficient of obesity. Several researches have studied effect of Rifaximin in liver disease, weight and gut microbiome, but none of these researches can directly explain this interaction.

EFFECT OF PULSE IN HETEROGENEOUS PATIENTS Diseases related to coefficient change of pulse:
In meta-regression, liver disease were top-2 admission diagnoses related to effect improvement of pulse and another 2 liver diseases also in the top-10. The result can be supported by both subgroup model and subgroup analysis when pulse is >100. Generally, increasing pulse is a common symptom in severe liver disease, and cardiovascular dysfunction often occurs as the disease progresses 41 . Cerebrovascular condition is another important factor correlated with increasing effect of pulse according to meta-regression (3 of the top-10). However, results from 3 significant admission diagnoses were not supported by the subgroup model and subgroup analysis. Given that personalized models were built using similar samples from other patients with cerebrovascular condition, we further combined samples from other cerebrovascular related admissions, and found OR of pulse >100 bpm increased significantly. We found two existing studies on the relationship between heart rate and outcome of stroke, but their conclusions were inconsistent 42,43 . While other studies showed that heart rate variability is significantly related to mortality of patient with head injury 44,45 . In cardiac surgery, orthopedic surgery and infection, the relation between pulse and AKI is weak.

EFFECT OF VANCOMYCIN IN HETEROGENEOUS PATIENTS Diseases related to coefficient change of vancomycin:
The coefficients of vancomycin significantly increased in gastrointestinal surgery, orthopedic surgery (exclude joint replacement) and infection according to all experiments. Skin graft was also a significant moderator in meta-regression, its significance (p=0.056) in subgroup model was close to the threshold. It mainly reflects the danger of infection in these subgroups. And in gastrointestinal surgery, significant systemic absorption may occur when intestinal mucosal integrity is compromised, and risk of nephrotoxicity may increase. In admissions for cardiac procedure, cardiac device and joint replacement, coefficients of vancomycin significantly decreased. In these subgroups, we found vancomycin was used in 44% of patients, which suggests vancomycin was possibly used to prevent infection. Medication related to coefficient change of vancomycin: The use of tazobactam showed the strongest correlation with increasing coefficient of vancomycin. A recent meta-analysis study also showed the interaction between the two drugs 46    Meaning of inter-class score difference is introduced in eAppendix 6. "Cumulative % of inter-class score diff." (y-axis) is calculated by current inter-class score difference of a model/ final inter-class score different of the same model.

eTable 13. Details About 55 AKI Prediction Researches for Specific Subgroup Patients Can Be Identified by Our Data
To compare PMTL with models reported in existing literatures, we identified 136 AKI prediction studies published before 2021 in Web of Science and PubMed using keywords related to "AKI", "prediction" and "machine learning", 104/136 used all-stage AKI as the target. We could not confirm subgroups in 49/104 studies (because absence of corresponding features to identify these subgroups in our data). The remaining 55 papers are summarized in this Table. Research presented in Table 1 of the main text are highlighted in bold (at least one study per subgroup was selected, but studies with similar race distribution and AKI definition to ours, large sample size and independent validation are preferred). PCI, AMI, CABG, TKA, and GI stand for percutaneous coronary intervention, acute myocardial infarction, coronary artery bypass grafting, total knee arthroplasty, and gastrointestinal respectively. Bs, CV, DV, EV, EVR, IV stands for bootstrap validation, cross-validation, derivation validation, external validation, external validation research (verify performance of models presented in other researches), and internal validation respectively. For researches studies multiple modeling approaches, we just report AUROC of logistic regression and the best model.

eTable 16. Significant Interactions Between 6 Important Predictors and Disease in Meta-Regression and Their Verification
OR change is calculated by the OR of target predictor when moderators is happened dividing by the OR of target predictor in remaining patients and subtract 100%. In subgroup analysis, when target factor is age, exposed group contains patients with age>65, unexposed group contains patients with age<45; when target factor is BMI or pulse, unexposed group contains patients with BMI:18.5-25 and pulse: 50-80 respectively; when target factor is lab test, exposed group contains patients with abnormal result, unexposed group contains patients with normal result. Moderators significantly (p≤0.05) verified by subgroup model, subgroup analysis and both are marked with #, †, and * respectively. Significant interactions or interactions we suggest need to be concerned are shown in bold. Direction of effect change in meta is based on result of meta-regression in general patients. OR change is calculated by the OR of target predictor when moderators is happened in controlled population dividing by the OR of target predictor in remaining patients of controlled population and subtract 100%. In subgroup analysis, when target factor is age, exposed group contains patients with age>65, unexposed group contains patients with age<45; when target factor is BMI or pulse, unexposed group contains patients with BMI:18.5-25 and pulse: 50-80 respectively; when target factor is lab test, exposed group contains patients with abnormal result, unexposed group contains patients with normal result. Moderators significantly (p≤0.05) verified by subgroup model, subgroup analysis and both are marked with #, †, and * respectively. Significant interactions or interactions we suggest need to be concerned are shown in bold.