Added value of chest CT in a machine learning-based prediction model to rule out COVID-19 before inpatient admission: A retrospective university network study

Purpose During the coronavirus disease 2019 (COVID-19) pandemic, hospitals still face the challenge of timely identification of infected individuals before inpatient admission. An artificial intelligence approach based on an established clinical network may improve prospective pandemic preparedness. Method Supervised machine learning was used to construct diagnostic models to predict COVID-19. A pooled database was retrospectively generated from 4437 participant data that were collected between January 2017 and October 2020 at 12 German centers that belong to the radiological cooperative network of the COVID-19 (RACOON) consortium. A total of 692 (15.6 %) participants were COVID-19 positive according to the reference of the reverse transcription-polymerase chain reaction test. The diagnostic models included chest CT features (model R), clinical examination and laboratory test features (model CL), or all three feature categories (model RCL). Performance outcomes included accuracy, sensitivity, specificity, negative and positive predictive value, and area under the receiver operating curve (AUC). Results Performance of predictive models improved significantly by adding chest CT features to clinical evaluation and laboratory test features. Without (model CL) and with inclusion of chest CT (model RCL), sensitivity was 0.82 and 0.89 (p < 0.0001), specificity was 0.84 and 0.89 (p < 0.0001), negative predictive value was 0.96 and 0.97 (p < 0.0001), AUC was 0.92 and 0.95 (p < 0.0001), and proportion of false negative classifications was 2.6 % and 1.7 % (p < 0.0001), respectively. Conclusions Addition of chest CT features to machine learning-based predictive models improves the effectiveness in ruling out COVID-19 before inpatient admission to regular wards.


Introduction
With the emergence of the coronavirus disease 2019 (COVID 19), pandemic health care facilities face the challenge to timely identify patients who are infected with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Rapid rule out of COVID-19 before inpatient admission is still crucial to prevent spread within high-risk transmission settings such as hospitals. Shortly after the pandemic onset, the reverse transcription-polymerase chain reaction (RT-PCR) test was available to identify SARS-CoV-2 from respiratory specimen. However, initially, diagnostic turnaround time amounted several days. Although, meanwhile, response time could be reduced to less than two hours with rapid diagnostic tests and to about 24 h with laboratory-based tests, sensitivity still depends on the viral load and thus, does not reliably rule out SARS-CoV-2 [1,2]. Therefore, the COVID 19 pandemic provoked joint interinstitutional research efforts to develop a diagnostic models as part of an infection prevention strategy that can rapidly and reliably rule out COVID-19 using data from routine clinical examination, laboratory tests, and chest CT. Models should accelerate and secure identification of patients with high probability of COVID-19, independent of RT-PCR test results, and thus, support decision making of physicians at the emergency department and other points of triage in favor of or against isolation of patients. Classification by such models could be both incidental and upon suspicion. An artificial intelligence approach using supervised machine learning for large datasets may become an efficient instrument to improve prospective pandemic preparedness [3][4][5][6][7].
The purpose of this study was to develop a predictive model using supervised machine learning, based on a university network database to identify COVID-19 in patients before inpatient admission. It should be assessed whether addition of chest CT features to clinical examinationand laboratory test features improves the performance of the diagnostic model. This approach could serve as a template to prepare for future health pandemics.

Study design
This study aimed to develop a machine learning-driven predictive model based on a large university network database to rule out COVID-19 among patients at emergency departments before admission to hospitals' regular wards to prevent in-hospital spread of SARS-CoV-2. The retrospective study was designed to show whether inclusion of chest-CT features to findings from clinical examination and laboratory tests improves the diagnostic performance of the model.

Study population and sites
A pooled database was retrospectively constructed from patient data, acquired at 12 German university hospitals between January 2017 and October 2020 (Supplemental Fig. 1). Participating sites included 4437 consecutive patients of at least 18 years of age (60.5 ± 23.2 years) who underwent chest-CT for any reason. A total of 692 (15.6 %) of the participants were SARS-CoV-2 positive according to the RT-PCR test (median proportion of COVID-19 positives in the centers: 13.2 % [IQR: 30.2-6.3 %]). Participants who were included before March 2020 were considered COVID-19 negative without RT-PCR test. Data were anonymized for analysis.
All 12 participating sites were part of the radiological cooperative network of the COVID-19 pandemic (RACOON) consortium. The RA-COON consortium had been founded by 36 German university hospitals to establish an infrastructure to collect, transfer, and pool radiological data on COVID-19 for strengthening preparedness and responsiveness for pandemics. Data were acquired according to sites' routine standard of care from clinical examination, laboratory tests, and chest CT evaluation and collected using standardized RACOON electronic data capture templates. Investigators reported radiological findings using the mint Lesion™ software (Mint Medical, Heidelberg, Germany).

Model establishment
We constructed three prediction models to calculate individual participants' likelihood of being diseased with COVID-19 and to classify participants as COVID-19 positive or negative using a supervised machine learning algorithm. The first model included only features from clinical examination and laboratory tests (model CL), the second model included only chest CT findings (model R), and the third model included variables from clinical evaluation, laboratory tests, and chest CT evaluation (model RCL). RT-PCR test was considered as reference standard.
For model construction, experienced specialists for internal medicine and radiologists of the participating sites manually selected 126 relevant candidate variables from the RACOON template as potential predictors for COVID-19 (19, 19, and 88 input features from clinical examination, laboratory tests, and chest CT, respectively [Supplemental Table 1]). To prevent overfitting, we conducted variance thresholding using a cut-off threshold of zero (Scikit-learn machine learning library, version 1.1.2, https://scikit-learn.org/stable/) [8]. Subsequently, we used recursive feature elimination [9] with cross validation to select variables to be included into the following multivariable logistic regression analysis. During this process, the model was trained repeatedly while iteratively reducing the number of included features by removing the least essential features during each iteration. For evaluation of the model performance within an iteration, a stratified k-fold cross-validator with 5 folds was used, i.e., the model was split into 5 equally sized subsets and trained on 4 subsets while testing on the remaining subset. This process was repeated 5 times with a different subset being used as test set. The Scikitlearn machine learning library [8] was then used to run iterative logistic regression a hundred times using randomly generated training-and test subsets. With each run, 70 % of the data were randomly assigned as training dataset and 30 % as test dataset. The machine learning algorithm analyzed the training datasets to learn which variables are predictive of COVID-19 using the truncated conjugate gradient newton method to solve the optimization problem [10]. Weights were adjusted inversely proportional to class frequencies (balanced class weights).
Thirteen clinical examination-, 10 laboratory test-, and 7 chest CT covariables were identified as relevant features of the model RCL. The model CL identified 13 clinical examination and 10 laboratory test features, and the model R identified 8 chest CT features as relevant for classification Fig. 1 shows the association of every feature with COVID-19 determined with each of the models. The algorithm classified the event of COVID-19 as occurring if the probability according to logistic regression was ≥ 0.5. Classification was run with both the training datasets and the test datasets. The test datasets were analyzed to assess how accurately the algorithm predicted COVID-19 in the remaining 30 % of participants.

Statistical analysis
Diagnostic performance of the three models (R, CL, and RCL) was characterized by sensitivity, specificity, accuracy, negative predictive value (NPV), and positive predictive value (PVV). Receiver operating characteristic (ROC) analysis was performed and areas under ROC curves (AUC) were compared. Performance of the models was compared with z-test for paired samples. A difference of p < 0.05 was considered significant. Association of selected variables with COVID-19 was measured in odds ratios (OR) with 95 % confidence intervals. Statistical analysis was performed with Python software (Phyton Software Foundation, Beaverton, USA, version 3.10.7). Classification was based on a supervised machine learning algorithm that included iterative multivariable logistic regression of training-and test datasets (70 % and 30 % of the data, respectively, n = 4437, 100 repetitions). Performance outcomes are given as means with standard deviation. Model R was run with chest CT features only, model CL was run with features from clinical evaluation and laboratory tests, and model RCL was run with features from chest CT, clinical evaluation, and laboratory tests. NPV = negative predictive value, PPV = positive predictive value.

Results
Overall accuracy in classification was 0.77, 0.84, and 0.89, respectively with model R, CL, and RCL. Sensitivity was 0.87, 0.82, and 0.89, and specificity 0.75, 0.84, and 0.89, respectively with model R, CL, and RCL (results from the test datasets). The performance of model RCL that added chest CT features to the analysis, was superior regarding accuracy, sensitivity, specificity, NPV, and PPV compared to model CL that included only clinical examination-and laboratory test features (p < 0.001 for each of the outcomes referred) (  (Fig. 3).

Discussion
Supervised machine learning was applied to construct diagnostic models from a large, pooled university network database to predict COVID-19 upon suspicion or incidentally and thus, assist in clinicians' diagnosis before inpatient admission. A model that included clinical examination-and laboratory test features identified COVID-19 with satisfactory accuracy. Addition of chest CT features improved the model performance significantly.
Although RT-PCR tests provide an almost one hundred percent specificity, sensitivity, particularly in case of upper respiratory specimen, is not sufficient to rule out the disease [2]. Moreover, molecular laboratory-based RT-PCR test results are unlikely to be available before 24 h. Turnaround time of rapid point-of-care antigen tests is much shorter. Results should be available within 2 h of sample collection. However, sensitivity decreases with absence of symptoms, during the second week after symptom onset, and with no suspected epidemiological exposure. In addition, sensitivity varies with brands and probably varies with mutations that affect the virus nucleoprotein. Overall, sensitivity ranges from 34 % to 91 % in symptomatic and from 29 % and 78 % in asymptomatic individuals. Depending on prevalence, one in two to five true positives will be missed with rapid antigen tests [11]. Moreover, a recent Cochrane review revealed that absence or presence of individual signs or symptoms have only poor diagnostic accuracy to rule out COVID-19 [12].
Therefore, the syndromic presentation of COVID-19 as combination of signs and symptoms is better captured in a model based on a large dataset with a wide range of clinical, laboratory, and radiologic features as constructed in this study. Artificial intelligence can handle and analyze large datasets and shortens the procedure of model construction considerably. Machine learning algorithms develop real time prediction models that adapt to growing databases [3,[5][6][7]. Thus, diagnostic models that are based on machine learning may serve as important prerequisite to achieve readiness for surges of COVID 19 and other emerging or re-emerging pathogens. The diagnostic model constructed in this study, is intended to rule out COVID-19 in a high-risk SARS-CoV-2 setting of hospitals. To protect vulnerable individuals at risk of severe disease, true positives must not be missed. Moreover, as care facilities can become amplifiers of infectious disease outbreaks, efficacious infection prevention and control is paramount (https://www.who.int/publications/i/item/WHO-2019 -nCoV-Policy_Brief-IPC-2022.1). This gives reason to establish a highly sensitive test to be applied before inpatient admission [13]. On the other hand, false positives are not that critical because they can be identified with subsequent laboratory-based RT-PCR tests. Nevertheless, to avoid infection of false positives within the quarantine, patients who initially were diagnosed as positive by the model should be isolated separately until diagnosis is confirmed, so that false positives can be released from quarantine.
The most accurate model created in this study included features of all three categories (clinical, laboratory, and chest CT) and achieved a considerably higher sensitivity as established as minimum performance requirement (≥0.80) by the World Health Organization (WHO) for rapid diagnostic tests. Specificity of the model was sufficient for the desired purpose of "rule out", however, fell below the WHO requirement for rapid diagnostic tests (≥0.97) (https://www.who.int/publications/i/it em/WHO2019-nCoVAntigen_Detection2021.1).
A recent Cochrane test-accuracy review that included 69 chest CT studies revealed a pooled sensitivity of 0.87 (range: 0.45-1.0) and a pooled specificity of 0.78 (range: 0.1-1.0).
These findings led authors conclude that chest CT is appropriate to rule out COVID-19 but not to differentiate SARS-CoV-2 infection from other respiratory diseases [14]. Results concern only suspected cases. A French university study that reported on chest CT for rapid triage in multiple emergency departments also found favorable sensitivity (0.90) and specificity (0.88) [15]. However, due to radiation exposure even with a low-dose mode, chest CT should be justified by clinical indication. It should only serve as diagnostic approach in patients who require chest CT due to suspicion of COVID-19 or for whatever other clinical reasons. Of note, even in patients without characteristic symptoms, COVID-19 can manifest as pneumonia and signs can incidentally be recognized with chest CT [13,16,17]. Whether ultrasonography of the lungs may serve as alternative that might be applied more widely, even as screening tool without exposure to radiation, remains to be proven. The review mentioned above already found a similar sensitivity and a somewhat lower specificity for suspected cases (0.89 and 0.72, respectively, pooled from 15 ultrasonography studies) [14]. Thus, ultrasonography features may be considered in future predictive models. Another diagnostic approach could be radiomics methods. Based on artificial intelligence, imaging features can be converted into data for analysis and subsequently integrated into predictive models. Although initial findings are promising, to date, the approach of radiomics is not ready for clinical implementation [18].
Major strength of this study is the large database pooled from multiple nation-wide university centers that included consecutive patients who were admitted due to various diseases. COVID-19 positive participants could have been asymptomatic or symptomatic and there were also no restrictions regarding symptom onset. Therefore, we can claim a high degree of generalizability. Nevertheless, this study also has limitations. First, this study presents a static snapshot of the learning algorithm constructed. However, the algorithm may be continuously updated as additional data is acquired. Diagnostic performance is expected to improve with growing data volume over time. Moreover, in the course of time, the model can adapt to new virus variants. Second, we used RT-PCR test as reference standard for SARS-CoV-2 infection. However, positive RT-PCR test results do not constitute infectiousness. Third, we only included patients who underwent chest CT. Although chest CT was conducted not necessarily due to suspicion of COVID 19 but rather due to various diseases, patient selection gives rise to sample selection bias. Finally, the test datasets originate from subdivisions of the pooled dataset. No external validation was conducted.

Conclusions
Chest CT features improve the performance of diagnostic models to predict COVID-19 before inpatient admission. An artificial intelligence approach of COVID-19 prediction can inform medical decisions right at the beginning of patients' diagnostic pathways in a timely manner. The approach might serve as an example of how to make use of large, pooled data bases to address future pandemics right from the beginning.

Ethics approval and consent to participate
The study was approved by the Friedrich-Schiller-University ethics commission (Reg. No. 2021-2128). Data were anonymized for retrospective analysis and thus, ethics commission waived the requirement for informed consent.

Consent for publication
Not applicable.

Funding
This work was supported by the German Federal Ministry of Education and Research (BMBF) as part of the University Medicine Network (Project RACOON, 01KX2021).
Authors' contributions FG and UT conceptualized and supervised the study. MK constructed the models and analyzed the data. MI and MK contributed to writing the initial draft with input from all authors. FG, UT, MK, and MI discussed and interpreted outcomes. The participating RACOON consortium provided data infrastructure and data acquisition. All authors reviewed and approved the final manuscript.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability
The datasets generated and/or analyzed during the current study are not publicly available due ethical considerations and the multi-center nature of the study.