Concordance and generalization of an AI algorithm with real-world clinical data in the pre-omicron and omicron era

All viruses, including SARS-CoV-2, the virus responsible for COVID-19, continue to evolve, which can lead to new variants. The objective of this study is to assess the agreement between real-world clinical data and an algorithm that utilizes laboratory markers and age to predict the progression of disease severity in COVID-19 patients during the pre-Omicron and Omicron variant periods. The study evaluated the performance of a deep learning (DL) algorithm in predicting disease severity scores for COVID-19 patients using data from the USA, Spain, and Turkey (Ankara City Hospital (ACH) data set). The algorithm was developed and validated using pre-Omicron era data and was tested on both pre-Omicron and Omicron-era data. The predictions were compared to the actual clinical outcomes using a multidisciplinary approach. The concordance index values for all datasets ranged from 0.71 to 0.81. In the ACH cohort, a negative predictive value (NPV) of 0.78 or higher was observed for severe patients in both the pre-Omicron and Omicron eras, which is consistent with the algorithm's performance in the development cohort.


Introduction
The World Health Organization designated Coronavirus disease-2019 (COVID-19) a pandemic in March 2020 after the epidemic disease first appeared in China in December 2019 [1].The Severe Acute Respiratory Syndrome-Coronavirus-2 (SARS-CoV-2) virus has been responsible for an estimated 591 million cases of global health problems, including over 6 million deaths [1].SARS-COV-2 infection can have a variety of symptoms ranging in severity from influenza to death.About 15 % of cases caused by human coronavirus strains are the common cold in its mild form [2]. To better manage the clinical situation, it will be helpful to identify specific laboratory indicators that could differentiate between severe and non-severe cases or between the high and low risk of death [3].
Improving the early screening, diagnosis, and prognosis of the disease are critical steps for reducing COVID-19 deaths during this pandemic.Various studies have been carried out using Artificial Intelligence (AI) approaches to optimize these procedures in terms of quality, accuracy, and, most importantly, time in clinical settings since the WHO declared the COVID-19 outbreak a pandemic [4][5][6][7][8].On Nov 26, 2021, a novel variant of concerns (VoCs) for SARS-CoV-2, Omicron, was documented about 23 months after the initial reported case of COVID-19 [9].In comparison to earlier VoCs, the Omicron (B.1.1.529)variant displayed a faster doubling time than Delta, a longer infectiousness period, and greater rates of reinfection [10].Omicron's key concerns are if it is more severe or contagious than other VoCs.Numerous studies have used machine learning to diagnose and predict the prognosis of different COVID-19 variants [9,[11][12][13].As a result, the question of whether AI algorithms trained to predict COVID-19 severity using pre-Omicron variant data remains valid for use during the Omicron variant era.
The use of AI, which can be defined simply as the imitation of human intelligence, is increasing in laboratory medicine, similar to other branches of medicine.With the COVID-19 pandemic, the need for AI technologies to control the expanding burden in healthcare has become even more obvious.AI applications have been used in many areas, from estimating the epidemiological course of the disease to the development of different diagnostic tools and from modeling the virus to treatment algorithms to control the pandemic.A number of models have been created for the screening, diagnosis, and prognosis of COVID-19 with AI tools developed using clinical laboratory results obtained from patients with SARS-CoV-2 infection [5][6][7].AI techniques using deep learning (DL) methods have shown great success in the field of medical imaging due to the advanced feature extraction capability of DL.Aside from the field of medical imaging, numerous studies have used AI techniques to screen, diagnose, and predict the prognosis of COVID-19 using clinical, laboratory, and demographic data [11][12][13][14][15].Despite these advancements, we have yet to see a practical system that could be used universally with evidence of generalizability to aid in the early identification of patients who develop severe clinical outcomes.Along with all these developments, some limitations regarding the use of AI were also seen during routine clinical practice.It has been revealed that there are deficiencies in accessing appropriate data, validation of developed models, and multidisciplinary approach [5].
A prognostic DL model was developed and externally validated by Siemens Healthineers (Siemens Healthineers, Erlangen, Germany), the working partner of Ankara City Hospital Medical Biochemistry Laboratory services, using data from hospitals in the US and Spain, based on laboratory tests to predict the need for care, risk of death, severity assessment or length of stay in the hospital [16].The schema of the used method (i.e., deep profiler) is based on deriving a patient fingerprint from various demographic, clinical, and laboratory parameters and utilizing it to predict severity scores.Deep profiler consists of three main parts (i.e., networks): an encoder network for extracting prominent features represented in a latent space, which is also referred to as the patient fingerprint, a decoder network for reconstructing the input data to ensure data fidelity of the latent feature representation, and finally a severity classifier network, which is trained to estimate the severity score [17].The prognosis was predicted using this model based on commonly assessed laboratory characteristics consisting of clinical chemistry, complete blood count (CBC), and coagulation tests, which are requested and performed within the first 72 h of admission.However, the effectiveness of this previously developed algorithm has not been determined by comparing its predictions of severity to the outcomes of actual clinical data collected in Turkey.
This retrospective cohort study's objective was to assess the consistency between prognostic information predicted by the previously created and validated DL algorithm and the actual clinical course of COVID-19 patients with varying disease severity before and after the Omicron variant with a multidisciplinary approach under the same laboratory conditions.In this way, it is to evaluate whether the DL algorithm can be generalized as a reliable tool for prognosis prediction.

Materials and methods
COVID-19 patients who were hospitalized to the Infectious Diseases Clinic with confirmed diagnosis by detecting SARS-CoV-2 RNA in oro-nasopharyngeal swab samples in Polymerase Chain Reaction (PCR) analysis as the reference standard in COVID-19 diagnosis were included in the present study.The diagnostic criteria of the World Health Organization's (WHO) interim guidance were used [18].All consecutive COVID-19 patients who were admitted to Ankara City Hospital from March 15th, 2021 to April 30th, 2022, were enrolled (ACH Cohort).Two-time frames were used, based on the COVID-19 pandemic and Omicron variant declaration date of WHO, and defined by the date of COVID-19 diagnosis: pre-Omicron era (March 15th, 2021-November 15th, 2022) and Omicron era (December 15th, 2022-April 30th, 2022).These two time frames were used to split the patients into two groups.The patients' data were obtained from Infectious Diseases Clinics and Intensive Care Units of Ankara City Hospital, a 4190-bed academic medical center in Ankara, Turkey.Demographic, clinical, and laboratory data were gathered from hospital and laboratory information systems (HIS and LIS) electronic medical records and case record forms.Only initial laboratory results of the patients within 72 h of admission were recorded.The severity of the patients included in the study was analyzed using the same criteria of the algorithm to examine the concordance of the existing algorithm with real-world data.
To examine the concordance of the AI algorithm with real-world data, a previously established and externally validated deep learning approach for predicting the severity of COVID-19 patients using a sparse set of laboratory markers was applied [16].In this study, a deep neural network-trained model was used to predict clinical outcomes using different cohorts of outcome-matched patient data from four COVID-19 epicenters.A number of experiments were carried out to investigate the contributions of the different data inputs to the assigned surrogate outcomes and to gain a better understanding of the redundancies by minimising the interactions between the input parameters.Finally, a predictive tool that can maintain accuracy with limited input data was developed.Comparisons with other methods in the literature were made on the performance of the proposed models.This predictive algorithm was a trained and tested AI-based network using COVID-19 patient records from three healthcare systems from the US and one from Spain (Development Training Cohort and Testing Cohort).The model provides a severity risk score along with the likelihood of various G. Yilmaz et al. clinical outcomes, namely ventilator use, end-organ failure and mortality.The predictor laboratory tests included in the algorithm are standard blood tests, namely creatinine, CRP, D-dimer, eosinophil (%), ferritin, INR, LDH, lymphocyte (%), and Troponin I, and they were routinely measured in laboratory features for COVID-19 patients in Ankara City Hospital.Prognostic and predictive parameters obtained as a result of the algorithm and compared were disease severity (0-4), ventilator use, end-organ failure, and risk of mortality within 30 days of hospital admission.

Selection of participants
In our hospital, oro-nasopharyngeal swab samples for RT-PCR were obtained for all suspected patients in addition to routine blood tests during the pandemic period (after March 2020).Physicians of this study at the Infectious Diseases Clinic identified COVID-19 using consistent clinical signs, such as fever and respiratory symptoms, evidence of pneumonia on computerized tomography (CT), and/or positive SARS-CoV-2 PCR results in accordance with the WHO interim guidance [18].The study included patients who required intensive care when admitted or at any time during their hospital stay, as well as those who did not require ICU care.Patients who died within the first 24 h following their hospital admission were excluded from the study.The additional exclusion criteria are not having a positive SARS-CoV-2 PCR result, being pregnant, being under the age of 18, not being a case from the determined time frames, and not having laboratory results included in the model within 72 h of admission.The data of COVID-19 patients whose laboratory tests were requested and carried out within 72 h of hospital admission were retrospectively examined and analyzed.

Clinical laboratory analyses
Analyzers performed all laboratory parameters following the manufacturer's instructions.All reagents, controls and calibrators for laboratory parameter measurements were obtained from Siemens Healthcare Diagnostics (Erlangen, Germany).All devices: Atellica Solution Immunoassay and Clinical Chemistry Analyzer (Siemens Healthcare Diagnostics, Erlangen, Germany) for serum creatinine, CRP, ferritin, LDH, and Troponin I; the ADVIA 2120 Hematology System (Siemens Healthcare Diagnostics, Erlangen, Germany) for eosinophil (%) and lymphocyte (%); and Sysmex CS-5100 System (Siemens Healthcare Diagnostics, Erlangen, Germany) for D-dimer and INR, were employed in accordance with Westgard's quality control rules.The following methods were used for clinical chemistry and immunoassay tests: the kinetic alkaline picrate method for creatinine, the lactate to pyruvate forward reaction method for LDH, and the immunoturbidimetric methods for CRP, chemiluminometric assay for ferritin and Troponin I.

Statistical analysis
Performance evaluation was conducted in a real-world data cohort.For the retrospectively obtained data, descriptive statistics were generated, and the distribution of the data was examined.The log-rank test and Kaplan-Meier survival analysis were used to assess the performance of the DL algorithm for prognostic analysis.In addition, the compatibility between the prognostic data predicted by the DL algorithm and the actual clinical course before and after the Omicron variant were evaluated by creating different models with logistic regression analysis.Additionally, sensitivity, specificity, and positive and negative predictive values were determined using ROC analysis for each data's separate cut-off values.Statistical methods were presented quantitatively and visually using the SPSS (IBM version 26.0) program.Statistical significance was defined as P < 0.05.

Clinical, demographic characteristics and laboratory features
A total of 7028 and 3554 patients, making up the data sets for the development training cohort and the development testing cohort, respectively, were recruited from the USA.The study included 638 hospitalized patients (ACH cohort) with COVID-19 diagnosis according to WHO criteria (13).In addition to clinical symptoms and findings consistent with COVID-19, all patients had PCR confirmation.All SARS CoV-2 PCR-positive patients were grouped based on severity scores of the algorithm as Severity 0, Severity 1, Severity 2, Severity 3, and Severity 4. The description of severity levels, criteria, and prevalence in DL data sets are shown in Table 1, and the distributions of the predictor laboratory tests in the sensitivity analysis of the cohort data set are shown in Table 2.The severity scores were assigned based on the worst condition of the patient during the course of the hospital stay on an ordinal scale from 0 to 4. The definitions of the various scales were primarily based on Berlin criteria and Sequential Organ Failure Assessment (SOFA) scores [20,21].A total of 10 variables comprised of laboratory tests and demographics for patients in DL data sets were depicted in Table 3.The table shows the statistical significance (p-value) of the difference in the mean values between the sub-groups with mortality and those with discharge.All the p-values of less than 0.05 indicate the potential of the variable to be an independent predictor of mortality.As shown, while all variables were significant independent predictors for mortality in the pre-Omicron group except INR, in the Omicron group, creatinine, INR, LDH, % lymphocytes and Troponin I were significant independent predictors.

Performance of individual features and model performance in the testing data set
In the ACH cohort, of 638 eligible COVID-19 patients, 366 patients were grouped in the pre-Omicron variant pandemic time frame, and 272 patients were grouped in the Omicron variant pandemic time frame; and named as pre-Omicron group and Omicron group, respectively.
Table 4 summarises feature performance characteristics in the development and ACH cohorts, stratified according to Omicron pandemic status.PVs of 0.78 or higher were observed for severe patients in both the pre-Omicron and Omicron periods.This is consistent with algorithm performance in the development cohort.
Fig. 1 depicts the Kaplan-Meier curve in the developing cohort and in total and subgroups of the ACH cohort data set, which contains COVID-19 patients with PCR-confirmed positive (no PCR negatives).The concordance index values, which were computed based on the predicted severity scores and ground truth values based on patient outcome [22], were 0.71, 0.80, 0.79, and 0.81 for the development testing cohort data set-pre-Omicron era, ACH data set-pre-Omicron + Omicron era, ACH data set-pre-Omicron Era, and ACH data set-Omicron era, respectively.The Kaplan-Meier curves were presented for predicted cohorts as well as patient disease severity (Severity 0-4) in all data sets.Kaplan-Meier curves showed a clear separation between the low vs. high severity levels for all time frames of COVID-19 (Fig. 1(A-D)).
In Fig. 2, the AUC was presented in the ACH cohort data set.AUCs ranged from 0.64 ± 0.01 to 0.81.Analysis showed consistent concordance among the pre-Omicron and Omicron cases.
Fig. 3 depicts the ROC for time to mortality at 3, 7, 15, 22, and 30 days from admission.AUCs ranged from 0.57 to 0.90.Analysis showed consistent concordance among the pre-Omicron and Omicron cases.The accuracy of prediction is higher for patients who are likely to experience higher levels of severity within the first three days and 3-7 days after admission, with AUC values of 0.90 and 0.91 respectively.
With respect to the predictive accuracy of mortality (severity level = 4), the DL algorithm's negative predictive value (NPV) was

Table 2
The features included in the algorithm of the ACH Data Set.

Discussion
To our knowledge, this is the first study that demonstrates the previously developed and validated DL algorithm utilizing age and nine laboratory indicators accurately predict severity in patients with COVID-19 both in the pre-Omicron phase and Omicron periods [16].Although there are studies on the AI topic of COVID-19 patients performed in Turkey, none of them has compared different variant periods [23,24].
During the COVID-19 outbreak, the need for rapid identification and stratification of patients for supportive clinical care became urgent because of the heavy workload at the beginning of the outbreak.So far, many studies about AI use in COVID-19 using different approaches have been published.Different models, including the use of demographics, radiological data, symptomatology, and laboratory features with different obtained AUCs, have been reported.From the variables used in the models, symptomatology might not be obtained with structured, validated questionnaires, and outside of a research environment, it might be challenging to record these symptoms reliably.Furthermore, because they are not structurally recorded in modern medical records, these symptoms cannot be incorporated into automated risk assessments.Structured data collected during the clinical examination are the easiest to integrate and may have the least amount of variability from one institution to another.These include vital signs, demographic data, laboratory results, and radiological images.It is important to note that another model developed from chest computed tomography images had an AUC of 0.994 for the discrimination of COVID-19 from atypical or viral pneumonia [25].However, due to the increased risk of infection spread from additional visits to radiology suites, national organizations advise against using radiological imaging for the diagnosis of COVID-19.Gülbay et al. from Turkey have demonstrated that a machine learning algorithm made of clinical and DL-segmentation-based radiological criteria, trained with a balanced data set, can successfully predict COVID-19 patients who may need intensive care [24].Radiological image data might be very useful in models.However, they can be hard to find and have high costs in clinical settings, especially in low-capacity hospitals and in low-income countries.These approaches are also unlikely to be quickly implemented due to the difficulties in conducting optional radiological examinations during this pandemic.
The models, which are easy-to-use and based on the few numbers of routinely measured laboratory markers, could be more practical than the others and thus more widely used.The model used in this study includes nine blood biomarkers and age, which capture a range of underlying biological processes known to be early independent predictors of disease severity.These processes include immune response (i.e., lymphocytes and eosinophils), kidney and liver function (i.e., creatinine and LDH), cardiac function (i.e., Troponin I), inflammation processes (i.e., CRP and ferritin), and coagulation process (i.e., D-dimer and INR) [16].The model includes standard laboratory tests that are widely available with rapid turnaround time.The main time limitation is related to phlebotomy and sample processing, as the machine learning model can be completed almost immediately.Also, all these parameters were carried out routinely throughout the management of COVID-19; thus, we can easily compare the results obtained with the algorithm by leveraging a retrospective cohort.Previous studies have explored the use of laboratory data, along with non-radiological structured clinical data or demographics, in the diagnosis of COVID-19 through machine learning techniques.
On November 11, 2021, a sequenced Omicron case was first reported from Botswana.A few days later, a traveler from South Africa was also reported from Hong Kong with a sequenced Omicron case [26].Concerns over the potential effects of this novel VoC on clinical presentation have grown since then.Febrile children with COVID-19 have a much greater incidence of seizures, which has led to an increase in hospitalizations, especially in children younger than five years in the Omicron era than in the pre-Omicron era [27,28].In adults, the Omicron period was independently associated with a lower risk of inpatient mortality [29].Therefore, we found fewer variables to be predictors of mortality in the Omicron period than in the pre-Omicron period.In the present study, the concordance index values were 0.71, 0.80, 0.80, and 0.81 for the development cohort data set, total testing data set, pre-Omicron testing data set, and Omicron testing data set, respectively.Although outcomes of COVID-19 inpatients evolved throughout the pandemic and were affected by changing virus variants, the DL algorithm remained valid for predicting outcomes for hospitalized patients who were admitted to hospitals with different disease severity.The algorithm using age and frequently measured laboratory parameters continues to estimate the disease's severity accurately.
COVID-19 outcomes among hospitalized patients may have changed due to new variants, therapies and vaccine availability.However, the previously developed and externally validated deep learning algorithm is still valid in the era of the Omicron variant, accurately predicting disease severity [16].Because the DL algorithm uses blood markers, different therapies may be expressed in these markers, which reflect the physiological response to vaccination-induced immunity.The model only includes the characteristics of the blood test performed as a routine part of the hospital visit.Furthermore, the fact that these tests came from several hospitals suggests that the model is resistant to variations in specimen collection, handling, and instrumentation.This demonstrates the model's adaptability to institutions using various specimen handling and laboratory processing techniques.To further describe the model's performance and its therapeutic applicability in efficiently managing a focused, particularly immunized patient population, a prospective validation study is required.
Chieregato et al. recommended a severity prediction model that combines machine learning and deep learning as an adjunct to patient risk assessment in clinical practice [30].The authors deemed their model more suitable for predicting the severity of ICU admission outcomes in a clinical setting rather than mortality.In this study, we concluded that the previously developed and validated DL algorithm is useable for both mortality or severity predictions for Omicron or pre-Omicron variants and guides clinicians in the predictability of disease outcomes [16].
According to Chi et al., their study found that a deep learning model was able to accurately predict short-term mortality or the outcome of hospice care on the second day of admission in the general inpatient population [31].We also showed that both pre-Omicron and Omicron prediction accuracy was higher for potentially more severe patients in the first 3 days and 3-7 days post-admission.
In another study performed by our group, the concordance index values (CI) of the predicted severity levels on the internal testing dataset and external validation dataset were 0.71 and 0.64, respectively [16].In the present study, concordance index values were 0.80, 0.79, and 0.81 for the ACH data set-pre-Omicron + Omicron era, ACH data set-pre-Omicron Era, and ACH data set-Omicron era, respectively.Wang et al. conducted a study in which deep learning models were utilized to predict the progression of COVID-19 patients to critical disease, using baseline CT and clinical data.A concordance index value of 0.80 was attained by the models [32].Cheng et al. demonstrated that the inclusion of imaging improves the clinical-only model significantly.This resulted in an increase in the AUC from 0.653 to 0.727 (p = 0.039) and the accuracy from 0.657 to 0.732 [11].Singh et al. performed external validation of the model using a publicly available Mt. Sinai dataset, achieving a performance of AUC of 0.74 + 0.01 to predict mortality [16].The prediction accuracy of severity 4 (i.e., mortality) has an AUC of 0.81 for the full model using ten parameters in this study, and imaging was not included in the deep learning model.However, although we have included different variants of COVID-19, our concordance index values and AUCs were similar to those of these studies.
Some of the limitations are as follows: First, only one DL algorithm was evaluated, and this model's performance was not compared with other machine learning models.Second, the dataset comes from a single center.However, while this provides us with data homogeneity, it caused a decrease in the generalization power of this algorithm.The study examined the issue of missing values and outliers, in addition to the limited size of the dataset that was employed.Another limitation of the study is that we have not taken into account the vaccination status, therapies of patients or comorbidities.However, just like the variants, the vaccination statuses and medications could have modulated the markers so that the model could predict the severity without needing to have information about the vaccination statuses or medications.

Conclusions
Throughout the pandemic, outcomes for COVID-19 inpatients changed due to shifting demographics, novel viral strains, and vaccination.Based on our findings, age and blood test results routinely measured on admission are valid to predict disease severity both in the pre-omicron and omicron era.A machine learning model that incorporates frequently used and simple-to-use laboratory markers with discriminatory accuracy can be utilized as a clinical decision-support tool to aid physicians in making clinical judgments for patients hospitalized with COVID-19.

Ethical statement
This retrospective study was approved by the ethics board of Ankara City Hospital (No. E1-22-2442).

Fig. 1 .
Fig. 1.The Kaplan-Meier Statistics of DL data sets.Kaplan-Meier curves comparing the shaded regions identify the 95 % confidence interval.A. The Kaplan-Meier Statistics of The Development Cohort Testing Data Set; B. The Kaplan-Meier Statistics of the ACH Cohort Data Set-PreOmicron + Omicron Era C. The Kaplan-Meier Statistics of the ACH Cohort Data Set-PreOmicron Era; D. The Kaplan-Meier Statistics of the ACH Cohort Data Set-Omicron Era.

Fig. 2 .
Fig. 2. The ROC curve of the ACH cohort data set.

Fig. 3 .
Fig. 3.The ROC curve of time to mortality Event of the ACH cohort data set.

Table 1
Description, criteria, and prevalence of severity levels in data sets.
Legend: n, number of cases.G.Yilmaz et al.

Table 3
Cohort statistics of laboratory variables included in the algorithm.
Legend: mean (confidence interval low, high) (number of patients for whom the value was recorded.G.Yilmaz et al.