Machine Learning Prediction Model to Predict Length of Stay of Patients Undergoing Hip or Knee Arthroplasties: Results from a High-Volume Single-Center Multivariate Analysis

Background: The growth of arthroplasty procedures requires innovative strategies to reduce inpatients’ hospital length of stay (LOS). This study aims to develop a machine learning prediction model that may aid in predicting LOS after hip or knee arthroplasties. Methods: A collection of all the clinical notes of patients who underwent elective primary or revision arthroplasty from 1 January 2019 to 31 December 2019 was performed. The hospitalization was classified as “short LOS” if it was less than or equal to 6 days and “long LOS” if it was greater than 7 days. Clinical data from pre-operative laboratory analysis, vital parameters, and demographic characteristics of patients were screened. Final data were used to train a logistic regression model with the aim of predicting short or long LOS. Results: The final dataset was composed of 1517 patients (795 “long LOS”, 722 “short LOS”, p = 0.3196) with a total of 1541 hospital admissions (729 “long LOS”, 812 “short LOS”, p < 0.001). The complete model had a prediction efficacy of 78.99% (AUC 0.7899). Conclusions: Machine learning may facilitate day-by-day clinical practice determination of which patients are suitable for a shorter LOS and which for a longer LOS, in which a cautious approach could be recommended.


Introduction
Total hip and knee arthroplasty (THA and TKA) procedures are growing in numbers worldwide each year, with proven improvements in patients' quality of life [1].In the USA, as the population progressively ages, the demand for these procedures is expected to grow by 174% for primary THAs and 673% for primary TKAs by 2030.The Italian Arthroplasty Register reported 29,681 THA procedures (94.7% were primary THA and 84.6% were elective procedures) and 19,402 TKA procedures (94.6% were primary TKA) during 2020 [2].The number of THA and TKA procedures has increased on average by 4.2% each year since 2001 [3].The rising number of hip and knee arthroplasties has allowed the development of advanced and less invasive surgical techniques, the improvement of perioperative course in order to achieve the shortest average length of stay (LOS) for hospitalization, and a quicker resumption of daily activities, maintaining a low number of complications.Thus, an emerging interest in "fast-track" postoperative protocols erupted over the last several years [4,5].Frassanito et al. highlighted the impact of the implementation of the enhanced recovery after surgery (ERAS) program for hip and knee replacement procedures, which allowed patients' early discharge and quick return to independence in daily activities [6].Despite the enormous increase of the procedures performed, reimbursement for THA and TKA has been dropping throughout recent years, considering that they are not following the trend of the inflation worldwide [7].The reduction of the reimbursement is in part justifiable by the relatively lower complexity of the younger patients undergoing arthroplasty.On the other hand, whilst the number of patients is increasing, a parallel increase in the complications related to this procedure cannot be accepted.Thus, a public health strategy aimed to reduce costs in economic, social, and health terms is mandatory.
Machine learning has become increasingly applied to medicine and to the orthopedic field, as it represents a natural extension of traditional statistical approaches [8].Clinical decision support tools that use machine learning algorithms such as random forests, artificial neural networks, or support vector machines have been proven useful in medical research [9,10].They have the potential to forecast the episode of care by predicting payment or LOS for any given patient after THA and TKA prior to the initiation of the elective procedures [11].Navarro et al. showed that LOS and cost could be predicted before TKA by a machine learning model using the New York State administrative database [12].Random forest (RF), an intricate tree-based machine learning algorithm, was used to predict LOS after shoulder arthroplasty [13].Bayesian algorithms that use conditional probabilities were used to predict LOS and costs after TKA.Etzel et al. used six different machine learning classification algorithms to predict long LOS of anterior and posterior lumbar fusion patients [14].In recent years, only a few projects have investigated how to facilitate ERAS protocols in the orthopedic field [13,15]; however, machine learning algorithms could be regularly used in clinical practice, employing their potential utility to integrate computerized models into electronic health record systems, where they can be used as point-of-care decision support tools for surgeons.Although a few studies have already investigated the application of machine learning algorithms predicting LOS in patients that received THA and TKA, they were national studies and have all used large administrative datasets.To the best of our knowledge, all these studies investigated patients who underwent only primary THA and TKA [12,[16][17][18][19]. Previous studies showed that a small amount of recent and accurate data is more effective than using larger amounts of older data [19].Therefore, further independent single-center cohort studies are required to confirm these findings.
The purpose of this study was to develop and validate a machine learning algorithmbased prediction tool of pre-operative patient-specific objective criteria, perform multivariable analysis to predict LOS after primary and revision THA or TKA, and elucidate factors correlated with an extended LOS in a high-volume single center.Our hypothesis was that the presented tool can firmly distinguish patients with a predicted "short LOS" if they had LOS less than or equal to 6 (5th postoperative day) and "long LOS" if they had LOS greater than 7 (6th postoperative day), thus giving an advantage in the health management strategies of patients undergoing arthroplasty.

Materials and Methods
The study was conducted in accordance with the Declaration of Helsinki and good clinical practice guidelines.The study protocol for the development of this registry was approved by the Ethics Committee (protocol code 83/23) of Humanitas Research Hospital IRCCS on July 2023.

Dataset
Patient-specific data written in medical records from 2015 to 2019 at the Humanitas Research Hospital were used.Textual data from 1 January 2015 to 31 December 2018 were gathered from the collection of all the clinical notes regarding medical history, comorbidities, disabilities, reason for admission, and lower-limb physical examinations.These training set data were used to develop and train an embedding model.Secondly, clinical and textual data from 1 January 2019 to 31 December 2019 coming from pre-operative laboratory analysis, vital parameters, demographics, and morphological characteristics of the selected cohort of patients were screened and used to develop and train a logistic regression machine learning model predicting LOS (Figure 1).Thus, the two sources of data have been merged to create a unique dataset.

Dataset
Patient-specific data written in medical records from 2015 to 2019 at the Humanitas Research Hospital were used.Textual data from 1 January 2015 to 31 December 2018 were gathered from the collection of all the clinical notes regarding medical history, comorbidities, disabilities, reason for admission, and lower-limb physical examinations.These training set data were used to develop and train an embedding model.Secondly, clinical and textual data from 1 January 2019 to 31 December 2019 coming from pre-operative laboratory analysis, vital parameters, demographics, and morphological characteristics of the selected cohort of patients were screened and used to develop and train a logistic regression machine learning model predicting LOS (Figure 1).Thus, the two sources of data have been merged to create a unique dataset.

Data Extraction
The first data extraction step consisted in querying the data from Data Warehouse (DWH).Oracle SQL TM has been used to gather the relevant data of patients admitted at the orthopedics department.Consequently, a pre-process pipeline has been implemented to clean the text data of unwanted or unnecessary characters, returning a cleaned corpus ready to be processed.The pre-processing phase aimed to normalize the character to ASCII format and remove all present HTML special characters.

Data Selection and Inclusion Criteria
The study included patients undergoing elective primary and revision THA or TKA by senior surgeons experienced in joint replacement surgery, from 1 January 2019 to 31 December 2019, at Humanitas Research Hospital, Italy.Patients were identified from hospital clinical records using International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9-CM) procedure codes (81.51 for THA; 00.70, 00.71, 00.72, 00.73 for revision THA; 81.54 for TKA; 80.06, 81.55, 00.80, 00.81, 00.82, 00.83, 00.84 for revision TKA).Eligibility criteria included all patients aged above 18 years old who underwent elective primary and revision THA or TKA in our orthopedic department.Malunion or nonunion sequelae, traumatology surgeries, not having undergone elective procedures, and malignancy in which LOS could be potentially prolonged in order to achieve histological or cultures assay before patient discharge were grounds for exclusion.Patients who did not have at least 70% of the required predictive features recorded were excluded.Since the management of the postoperative hospitalization and admissions to the rehabilitation unit varied significantly from 2015 to 2019, and since these variations were exogenous, all patients admitted before 2019 in the orthopedic department who underwent primary and revision THA or TKA were further excluded.LOS corresponded to the number of inpatient days during admission: it included the day of patient's admission which always corresponds to the day before surgery, the day of surgery, and the related

Data Extraction
The first data extraction step consisted in querying the data from Data Warehouse (DWH).Oracle SQL TM has been used to gather the relevant data of patients admitted at the orthopedics department.Consequently, a pre-process pipeline has been implemented to clean the text data of unwanted or unnecessary characters, returning a cleaned corpus ready to be processed.The pre-processing phase aimed to normalize the character to ASCII format and remove all present HTML special characters.

Data Selection and Inclusion Criteria
The study included patients undergoing elective primary and revision THA or TKA by senior surgeons experienced in joint replacement surgery, from 1 January 2019 to 31 December 2019, at Humanitas Research Hospital, Italy.Patients were identified from hospital clinical records using International Classification of Diseases, Ninth Revision, Clinical Modification (ICD9-CM) procedure codes (81.51 for THA; 00.70, 00.71, 00.72, 00.73 for revision THA; 81.54 for TKA; 80.06, 81.55, 00.80, 00.81, 00.82, 00.83, 00.84 for revision TKA).Eligibility criteria included all patients aged above 18 years old who underwent elective primary and revision THA or TKA in our orthopedic department.Malunion or nonunion sequelae, traumatology surgeries, not having undergone elective procedures, and malignancy in which LOS could be potentially prolonged in order to achieve histological or cultures assay before patient discharge were grounds for exclusion.Patients who did not have at least 70% of the required predictive features recorded were excluded.Since the management of the postoperative hospitalization and admissions to the rehabilitation unit varied significantly from 2015 to 2019, and since these variations were exogenous, all patients admitted before 2019 in the orthopedic department who underwent primary and revision THA or TKA were further excluded.LOS corresponded to the number of inpatient days during admission: it included the day of patient's admission which always corresponds to the day before surgery, the day of surgery, and the related postoperative days.Total LOS associated with each patient has been transformed to a categorical feature according to the following decision rule: LOS 0 corresponded to the day of hospital admission, LOS 1 corresponded to the day of surgery, LOS 2 corresponded to the 1st postoperative day, LOS 3 corresponded to the 2nd postoperative day, LOS 4 corresponded to the 3rd postoperative day, LOS 5 corresponded to the 4th postoperative day, LOS 6 corresponded to the 5th postoperative day, LOS 7 corresponded to the 6th postoperative day, and so on.Patients were labelled as "short LOS" if they had LOS less than or equal to 6 (5th postoperative day), and "long LOS" if they had LOS greater than 7 (6th postoperative day).Patients with LOS equal to 7 (6th postoperative day) were excluded.Finally, a some predictive features selection was required.The first step was to eliminate all variables with a quote of missing values above or equal to 25%.The second step consisted in selecting the relevant clinical features (labs and vital parameters).Then a p-value test was used to look for significant differences in predictive feature distributions between "long LOS" and "short LOS" classes and select the ones with a p-value lower than 0.05.

Methods
A total of 22 independent variables were collected for each patient and were used for modeling analysis in this study.The patient-related characteristic included age, gender, BMI, marital status (whether the patient was living alone or with family), height, weight, absolute eosinophils count, alanine aminotransferase, anisocytosis index, aspartate aminotransferase, creatinin, erythrocytes, ferritin, hematocrit, hemoglobin, INR, iron, RBC hemoglobin concentration, bilirubin, joint involved, and primary or revision arthroplasty performed.Since the outcome of the study was binary, the problem could be treated as a standard binary classification and be solved using standard supervised learning techniques.In this case, data came from mixed sources, and a certain level of feature engineering was required.In particular, textual data needed to be transformed to numerical vectors to be used in machine learning models (embedding).For this reason, a custom neural network architecture was built.This architecture was able to transform text data while maintaining their local and global structure.This step allowed us to bring data to a common structure and join all different sources of data into one unique dataset.After the embedding procedure, the final data were used to train a logistic regression model to predict whether a patient was more likely to have a long or short LOS following primary or revision THA and TKA.All procedures described above have been performed using Python 3.9.In particular, the following libraries have been used:

Text Pre-Processing
A pre-trained deep learning architecture able to extract relevant information from clinical texts was not available.The majority of pre-trained language models based on deep learning algorithms were trained on generic corpuses [27] and could not be used on specific texts like the ones here considered, because this would likely result in poor latent representation.Therefore, a custom neural network architecture to encode data coming from anamnesis, previous surgeries, and reason for admission into 300-dimensional numerical vectors was developed.From an architectural point of view, the network was designed as a transformer autoencoder, with both encoder and decoder composed of 3 three-headed attention layers (Figure 2) [28].
The model was trained by the Adaptive Moment Estimation (ADAM) [29] algorithm using all the clinical notes regarding anamnesis, comorbidities, disabilities, reasons for admission, and lower-limb clinical examinations, written from 2015 to 2018, with a train and validation set consisting of, respectively, 36,489 and 9153 clinical sentences, with the aim of minimizing a loss of function based on binary cross-entropy.At the end of the process, the encoded texts were reduced, using principal component analysis, to a dimensionality able to explain 90% of the variance, resulting in 48-dimensional vectors for anamnesis, 58-dimensional for previous surgeries, and 16-dimensional for reason for admission.The model was trained by the Adaptive Moment Estimation (ADAM) [29] algorithm using all the clinical notes regarding anamnesis, comorbidities, disabilities, reasons for admission, and lower-limb clinical examinations, written from 2015 to 2018, with a train and validation set consisting of, respectively, 36,489 and 9153 clinical sentences, with the aim of minimizing a loss of function based on binary cross-entropy.At the end of the process, the encoded texts were reduced, using principal component analysis, to a dimensionality able to explain 90% of the variance, resulting in 48-dimensional vectors for anamnesis, 58-dimensional for previous surgeries, and 16-dimensional for reason for admission.

Classification
The sources of data for the analysis were different: 79.2% of features came from textual data, while 20.8% came from labs and morphological and demographic features.The study was structured in order to understand the impact of all the different sources, defining three different models using three different sets of features: the first which used only laboratory exams and demography features (structured data), the second which used only text-derived features (unstructured data), and a third which used both structured and unstructured data.Subsequently, models' performances were compared using standard classification scores and AUC.From an architectural point of view, all models were structured as three-layered pipelines with a z-score-based standardizer as first layer, an iterative imputer based on chained equations [30] to impute missing values as second layer, and a logistic regression classifier as last layer.Hyperparameters for all models were chosen using a randomized search algorithm, and the training and testing procedure were performed using the hold-out strategy, in which data were randomly split according to the following decision rule: 70% for the training phase and 30% for the testing one.

Statistical Analysis
The statistical analysis was mainly focused on understanding the impact of the selected covariates on the outcome distribution.Since text-embedding vectors were built using deep learning, all the interpretability was lost in the process, and the inference part

Classification
The sources of data for the analysis were different: 79.2% of features came from textual data, while 20.8% came from labs and morphological and demographic features.The study was structured in order to understand the impact of all the different sources, defining three different models using three different sets of features: the first which used only laboratory exams and demography features (structured data), the second which used only text-derived features (unstructured data), and a third which used both structured and unstructured data.Subsequently, models' performances were compared using standard classification scores and AUC.From an architectural point of view, all models were structured as three-layered pipelines with a z-score-based standardizer as first layer, an iterative imputer based on chained equations [30] to impute missing values as second layer, and a logistic regression classifier as last layer.Hyperparameters for all models were chosen using a randomized search algorithm, and the training and testing procedure were performed using the holdout strategy, in which data were randomly split according to the following decision rule: 70% for the training phase and 30% for the testing one.

Statistical Analysis
The statistical analysis was mainly focused on understanding the impact of the selected covariates on the outcome distribution.Since text-embedding vectors were built using deep learning, all the interpretability was lost in the process, and the inference part could only be done on laboratory exams, demographic data, and morphological features.With respect to univariate analysis, the distribution was divided according to LOS, as mentioned before, and a Mann-Whitney U test [31] or t-test [32] was used for continuous variables according to the result of the Shapiro-Wilk test [33] used to assess normality.For categorical features, a proportion Z-test (or two classes of Chi-squared [34]) was used, with the aim of assessing significant differences in the features distribution.A multivariate analysis was performed using logistic regression [35], to compute risk factors (odds ratios) and their relative confidence intervals.To assess the significance of the odds ratio, a t-test was performed and p-values of the Wald statistics were obtained.Finally, all p-values below 0.05 were considered as statistically significant.

Classification
All the features used in the model were information available at pre-admission level.This includes gender, age, BMI, height, weight, body part, marital status, and revision flag (a flag indicating whether a surgery is a revision or a primary arthroplasty).In addition to this, all laboratory analysis described in Table 1 was included.The complete model including all sources of features was the best-performing one, with an area under the curve (AUC: 0.7899), followed by texts model (AUC: 0.7228) and labs and demos model (AUC: 0.7198).Apart from AUC, the dominance of the complete model was confirmed by all the selected classification scores (Tables 2 and 3) as well as the AUC order.

Multivariate Analysis
Multivariate logistic regression results can be found in Table 4 and Figure 3.The model was fitted using the BFGS algorithm and without adding any regularization term.This allows us to obtain unbiased estimators of the LR coefficients [36].

Discussion
The main finding of our study was the capability of the machine learning algorithm in predicting LOS in patients undergoing elective primary or revision THA or TKA.This tool could forecast patients as candidates for "short LOS" if they had LOS less than or equal to 6 days and "long LOS" if they had LOS greater than 7 days with great accuracy, taking into consideration data extracted from pre-admission routine.Thus, patients suitable for ERAS protocols could be identified at pre-admission.From a methodological point of view, the most interesting result of the study was the comparison between different sources of data.As a matter of fact, it was possible to understand the information added by each subset of considered features and by their combination (Tables 2 and 3).As previously stated, it is straightforward to see the dominance of the complete model, which relies on information provided by clinical texts, labs, demographics, and morphological features.Secondly, it is also interesting to focus on the performances of the two control models: Tables 2 and 3 and Figure 4 show similar results, especially for the AUC.Moreover, ROC curves intersect at about 0.8, 0.42, making these more difficult to interpret.Overall, the results reported here showed that much clinical information could be extracted from texts, and this relates to physicians' experience and the quality of physical examination performed.On the other hand, the significant boost in all classification performances given by adding labs and demographic and morphological features to the model indicates that documents written by clinicians were not able to capture all the information needed to perform a correct classification.Taking into account the clinical relevance of the present study, the evaluation of comorbidities is very important given the type of patients undergoing arthroplasty, especially if an ERAS protocol is advocated.When Moldovan investigated bone cement implantation syndrome (BCIS), a sporadic and potentially lethal complication after THA, in our high-volume single center for prosthetic surgery, no BCIS occurred and no significant difference in LOS after cemented and uncemented hip arthroplasty was recorded, and therefore stem cementation was not considered as a relevant feature [37].
The main finding of our study was the capability of the machine learning algorithm in predicting LOS in patients undergoing elective primary or revision THA or TKA.This tool could forecast patients as candidates for "short LOS" if they had LOS less than or equal to 6 days and "long LOS" if they had LOS greater than 7 days with great accuracy, taking into consideration data extracted from pre-admission routine.Thus, patients suitable for ERAS protocols could be identified at pre-admission.From a methodological point of view, the most interesting result of the study was the comparison between different sources of data.As a matter of fact, it was possible to understand the information added by each subset of considered features and by their combination (Tables 2 and 3).As previously stated, it is straightforward to see the dominance of the complete model, which relies on information provided by clinical texts, labs, demographics, and morphological features.Secondly, it is also interesting to focus on the performances of the two control models: Tables 2 and 3 and Figure 4 show similar results, especially for the AUC.Moreover, ROC curves intersect at about 0.8, 0.42, making these more difficult to interpret.Overall, the results reported here showed that much clinical information could be extracted from texts, and this relates to physicians' experience and the quality of physical examination performed.On the other hand, the significant boost in all classification performances given by adding labs and demographic and morphological features to the model indicates that documents written by clinicians were not able to capture all the information needed to perform a correct classification.Taking into account the clinical relevance of the present study, the evaluation of comorbidities is very important given the type of patients undergoing arthroplasty, especially if an ERAS protocol is advocated.When Moldovan investigated bone cement implantation syndrome (BCIS), a sporadic and potentially lethal complication after THA, in our high-volume single center for prosthetic surgery, no BCIS occurred and no significant difference in LOS after cemented and uncemented hip arthroplasty was recorded, and therefore stem cementation was not considered as a relevant feature [37].   2 and 3: the ROC curve for the complete model dominates the others.
Previous studies used machine learning to predict LOS after TKA, THA, and TSA with c-statistics of 0.78, 0.87, and 0.77, respectively.The present model had the potential to be integrated into the electronic medical record to provide a personalized assessment of a patient's potential need for a longer or shorter LOS in the hospital after undergoing total joint arthroplasty, hospital readmission, or reintervention [12,13,17].Podmore et al. [38] in 2021 included 640,832 patients who had a primary hip or knee arthroplasty between April 2009 and March 2016 in a study evaluating the impact of 11 comorbidities on the safety risks (including LOS and 30-day readmission rate) of hip and knee arthroplasty surgery.The present model included all the comorbidities.Their study highlighted the impact of the examined comorbidities on clinical and socioeconomical fields.Alternatively, they concluded that the increased risk is small compared with the large improvements in functional outcomes, even in patients with multiple comorbidities.Thus, a prediction based on pre-admission evaluation of comorbidities and labs could help to individualize the path of the patients from admission to complete recovery.Zhu et al. in 2017 [6] performed a large meta-analysis of RCTs and CCTs available on literature about ERAS protocols in arthroplasty surgery.They concluded that ERAS significantly reduces LOS and incidence of complications in patients who have undergone THA or TKA.One of the most interesting aspects that emerged throughout their study was the need for improvement of perioperative management of the patient over the surgical technique.In this scenario, the correct selection of patients eligible for the ERAS protocol was the crucial aspect to enhance clinical outcomes.Furthermore, ERAS protocols have shown positive effects in early rehabilitation: Masaracchio et al. in 2017 [39] summarized the beneficial effects of an early administration of rehab protocols.Early rehabilitation reduced LOS and socioeconomical cost of the procedure.Despite these beneficial effects, early mobilization could lead to complications like falls if addressed to patients with a certain risk profile (i.e., cardiovascular or neurological disease).To avoid such complications, a quantitative, individualized risk assessment through artificial intelligence could be beneficial.Focusing on the economical aspect, reimbursement for THA and TKA dramatically dropped over the last 20 years, especially considering inflation [7].The amount of reimbursement is strictly linked with patient volume, patient satisfaction, a healthier patient population, and government ownership of a hospital, as stated by Padegimas et al. in 2016 [40].A predictive tool for the enhancement of selection of patients eligible for ERAS protocol could help in the path towards a more sustainable arthroplasty surgery in the context of limited resources.A similar machine learning approach was evaluated by Anis et al. in 2020 [9].Their study was focused on predicting LOS using a Poisson regression model.One similarity with our approach was the feature selection process; in fact, they chose to focus on laboratory analysis as well as patients' anamnestic details.However, their study was prospective, and features were specifically selected for the task.Such features were demographics and specific clinical scores from previous examinations.Our study, on the other hand, was retrospective, and the main feature selection process was more general and focused on features routinely collected during daily examination activities.Considering that we did not have direct access to all the specific clinical information, we used the transformer architecture to automatically extract proxies of this information from the selected texts.A rigorous comparison of the two studies cannot be assessed since they are based on different modeling strategy.However, our approach may have a significant boost in simplicity of data collection, and thus can be more easily implemented as a routine clinical service since, as previously stated, our features can be easily retrieved from hospital daily practice.

Limitations of the Study and Future Plans
This study had several potential limitations.First, it relied on textual data, and information coming from clinical records needed to be heavily pre-processed to be used by language models, resulting in a more difficult prediction process.Second, some interpretability was lost in the embedding process: the vectorization of the documents made causality between the presence of a token (a word or a sentence) and the selected outcomes difficult to assess, making data more suitable for prediction tasks.Third, our language model was trained and validated only on internal data, and an external validation was required as a benchmark.Fourth, all the clinical texts were written in Italian, so the model is well-suited only for one language.
Future research activities will involve collaborating with external hospitals to create a larger and more diverse cohort of patients.This will help confirm the reported findings and develop more inclusive models, reducing potential biases that may be inherent in the data collection process.Incorporating external data will also enhance the text embedding model, which, in this instance, was trained only on data from a single facility.Including text data from different institutions could improve the model's understanding, leading to a better representation of the clinical state of patients.

Conclusions
This study demonstrated the reliability of an artificial intelligence model to distinguish fit patients suitable for a shorter LOS, thus eligible for ERAS protocols, and patients with an expected longer LOS.The promising results suggest the potential utility of integrating computerized algorithms in electronic health record systems, where they can be used as a point-of-care decision support tool to assist the surgeons in patient selection.As these decision support tools become part of regular practice, however, they should not replace the clinical judgment of the surgeon, but rather supplement the informed consent process and contribute to shared decision making.Further, prospective studies are needed to validate our findings and the feasibility of this technology in clinical practice.

Figure 1 .
Figure 1.Project setup, how data were used to create the Language Model and to feed the final Classification Model.

Figure 1 .
Figure 1.Project setup, how data were used to create the Language Model and to feed the final Classification Model.

Figure 2 .
Figure 2. The autoencoder structure was composed by an encoder and a decoder.The encoder compressed the input information, usually unstructured information like images or texts, in a numeric format producing the embedding.The decoder took as input the embedded data and tried to reconstruct the data in its original format.During the training process, the two parts cooperated to compress and reconstruct the input data as accurately as possible.In this project, after the training, only the encoder was used during the final classification to encode the data in numeric format to feed the logistic regression performing the classification.

Figure 2 .
Figure 2. The autoencoder structure was composed by an encoder and a decoder.The encoder compressed the input information, usually unstructured information like images or texts, in a numeric format producing the embedding.The decoder took as input the embedded data and tried to reconstruct the data in its original format.During the training process, the two parts cooperated to compress and reconstruct the input data as accurately as possible.In this project, after the training, only the encoder was used during the final classification to encode the data in numeric format to feed the logistic regression performing the classification.

Figure 3 .
Figure 3. Multivariate logistic regression odds ratios and CIs.(Note: INR has been removed from the plot due to a scale problem).hemoglobin concentration

Figure 3 .
Figure 3. Multivariate logistic regression odds ratios and CIs.(Note: INR has been removed from the plot due to a scale problem).

Figure 4 .
Figure 4.This confirms the results shown in Tables2 and 3: the ROC curve for the complete model dominates the others.

Figure 4 .
Figure 4.This confirms the results shown in Tables2 and 3: the ROC curve for the complete model dominates the others.
Author Contributions: Conceptualization, M.L. and G.G.; methodology, M.L.; software, P.M.; validation, M.L., T.T. and V.D.M.; formal analysis, P.M.; investigation, T.T.; resources, G.G.; data curation, P.M.; writing-original draft preparation, V.D.M.; writing-review and editing, M.L. and V.D.M.; visualization, P.M.; supervision, M.L.; project administration, T.T.; funding acquisition, V.S.All authors have read and agreed to the published version of the manuscript.Funding:The research received funding from the Italian Ministry of Health (5x1000 program).Institutional Review Board Statement:The study was conducted in accordance with the Declaration of Helsinki and good clinical practice guidelines.The study protocol for the development of this registry was approved by the Ethics Committee of Humanitas Research Hospital (protocol code 83/23) in 27 July 2023.Informed Consent Statement:All the included patients have already provided a signed written informed consent to the use of their clinical data for research purposes.

Table 1 .
Demographic, clinical, and morphological characteristics of the cohort.

Table 2 .
Model classification scores for the two classes.

Table 4 .
Multivariate logistic regression results.(Note: missing values correspond to negative variance estimation).